NORM
NORM
transforms what is matched by the attribute into the entry of one of its ancestors.
Its action is based on the principle that every syncon in the knowledge graph belongs to at least one chain of concepts (is linked to one or more syncons). As a standard (for example with the ANCESTOR
attribute), various links are navigated, going from the most general level to the most specific one. Using NORM
, it is possible to "invert" the direction and explore the available links upwards in the hierarchy. In this way, a generalization may be achieved using the extracted data.
The syntax for extraction rules is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
@field[attribute]|[NORM ID:levels:linkName]
}
}
The syntax for tagging rules is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
@tag[attribute]|[NORM ID:levels:linkName]
}
}
where:
ID
refers to the univocal identifier that belongs to each and every syncon contained in the knowledge graph. Results of theNORM
option could be observed, only if the element used in the extraction part of the ruleAttribute
and the syncon ID specified with theNORM
option are semantically linked (they belong to the same chain of concepts).levels
refers to the number of levels that must be navigated upwards in the concepts hierarchy when searching for the syncon that will be returned in the output. The number of levels ranges from 0 to 99, where 0 is the farthest ancestor in the chain and 99 is the default representing the concept extracted by the rule itself (the action of the optionNORM
is nullified).linkName
refers to the link to be navigated when looking for ancestors. Valid links are those available in the knowledge graph, including any custom link added to the knowledge graph for a specific project.
Consider the following example:
SCOPE SENTENCE
{
IDENTIFY(PLACE)
{
@Place[ANCESTOR(16226414:99:syncon/geography) + TYPE(NPR)]|[NORM 16226414:1:syncon/geography]
}
}
The purpose of this rule is to to extract proper nouns (TYPE(NPR)
) of places which are found in Oceania (syncon 16226414) and which are linked to each other with the semantic relationship representing the geographic inclusion on the world map (syncon/geography). If this condition is verified, the NORM
option will go back up one level in the hierarchy (:1
) starting from the concept matched by the ANCESTOR
attribute in order to return a value which is more general than the matched one. In other words, the value to be returned will be the father of the matched value.
Consider the extraction output if the rule above is run against the following sample text:
An Air New Zealand flight from Wellington to Sydney this morning followed an unusual flight pattern after turning back for a "technical stop".
Air New Zealand flight NZ845 left Wellington Airport bound for Sydney but appeared to turn back off Nelson before circling off the Kapiti Coast and then heading for Auckland.
The Airbus A320 was scheduled to depart Wellington at 6:40am, Wellington time, and arrive in Sydney at 8:20am local time - a three-hour flight.
The text contains several terms matched by the sample rule: Sydney, Auckland and Wellington, each repeated several times. These terms are recognized as places in Oceania, the ancestor concept specified in the extraction rule, and become extraction "candidates". The NORM
option, however, normalizes and aggregates the values to be returned: Auckland and Wellington (two values) are transformed into New Zealand (one value) and Sydney is transformed into Australia. In other words, the rule would extract all places in Oceania, but the transformation forces it to only return the names of the countries in which these places are located.
The NORM
option provides a greater or lesser degree of generalization. Consider the sample rule with a modification to the NORM
transformation:
SCOPE SENTENCE
{
IDENTIFY(PLACE)
{
@Place[ANCESTOR(16226414:99:syncon/geography)+ TYPE(NPR)]|[NORM 16226414:0:syncon/geography]
}
}
The purpose of this transformation is to return the most general common ancestor for all elements recognized by the extraction rule. If the rule is applied to the same sample text, it will only extract Oceania.
The examples presented so far use only one type of link for both the extraction rule and the NORM
transformation. Different links, however, may also be used. Consider the following rule:
SCOPE SENTENCE
{
IDENTIFY(PLACE)
{
@Place[ANCESTOR(16226414:99:syncon/geography)+ TYPE(NPR)]|[NORM 78660:2:supernomen/subnomen]
}
}
The purpose of this rule is extract proper nouns (TYPE(NPR)
) of places in Oceania (syncon 16226414) that are linked to each other by the semantic relationship representing the geographic inclusion on the world map (syncon/geography). If this condition is verified, the NORM
option will navigate the supernomen/subnomen chain ("type of" relationship) starting from the concept of geographic place (syncon 78660), then it will go up two levels in the hierarchy (:2
) starting from the concept matched by the ANCESTOR
attribute and will not return the value which is more general than the matched one - as in the previous examples - but a word that can be considered as a class ("type of" link) to which the extracted entity belongs. In other words, the value to be returned will be the grandfather of the matched value in a concept hierarchy, different from the one specified in the extraction sub-condition.
If the rule above is run against the same sample text, Sydney and Wellington will be transformed into urban area. Inside the supernomen/subnomen hierarchy, in fact, Sydney is a capital, which is in turn a type of city, and a city is a type of urban area, so urban area, which is two levels up in the chain starting from Sydney, is the concept returned as the final extraction output.