ANCESTOR attribute
Overview
The ANCESTOR
attribute identifies a chain of concepts by specifying the numeric ID of a syncon and considering it as the starting point of the chain. A token will be recognized in a text if, during the disambiguation process, it is associated with one of the syncons in the selected chain.
The syntax is:
ANCESTOR(ID1[, ID2, ...])
where:
ANCESTOR
is the attribute name and must be written in uppercase.ID#
refers to the unique ID assigned to each and every syncon contained in the Knowledge Graph. It is always a whole number made up of one or several digits.
As with the SYNCON
attribute, the ANCESTOR
attribute allows you to specify the ID of a concept (syncon) contained in the Knowledge Graph. However, where SYNCON
considers the specified syncon ID as the only concept to be considered, ANCESTOR
considers the ID as the starting point for a chain of concepts. In fact, syncons in the Knowledge Graph are linked to each other through semantic relations called links. When using the ANCESTOR
attribute, the selected chain is navigated downwards (never upwards) following the branch structure of the chain from the most general (the syncon ID specified in the rule) to the most specific (the last selected node of the chain) concept.
Since a chain consists of several levels, it is possible to limit the number of levels to be navigated. This is done by adding a colon (:) after the syncon ID, followed by the number of levels. The syntax is:
ANCESTOR(ID1:levelNumber[, ID2:levelNumber, ...])
where levelNumber
ranges from 0 to 99, where 0 is the starting point of the chain which only considers the first level and 99 is the default value that considers all levels. If no value is specified, the whole chain is considered (same as declaring level 99).
It is also possible to specify the link to be navigated when looking for descendants. This can be done by adding another colon (:) after the level number followed by the name of the link. The syntax is:
ANCESTOR(ID1:levelNumber:linkName[, ID2:levelNumber:linkName, ...])
Valid links are those available in the Knowledge Graph, including any custom link added for a specific project. If no link name is specified and the given ancestor is a noun, the supernomen/subnomen ("part of" type of relation) link will be navigated by default. If no link name is specified and the given ancestor is a verb, the superverbum/subverbum ("way of " type of relation) link will be navigated by default. Any other links must be specified in order to be considered in the rule.
When the ANCESTOR
attribute is used in a rule, a token is identified in a document only if it is disambiguated as an instance of one of the syncons that is part of the chain. A syncon can be made up of one or more lemmas representing the same concept (synonyms), therefore all synonyms, variants, abbreviations etc. that are part of the same syncon will be matched if found in a document. Synonyms and variants are also recognized both in their base form and inflected forms.
Please be aware that a token in the document that matches one of the lemmas of one of the syncons in the chain is not a sufficient condition for the token to match the ANCESTOR
rule. In fact, many lemmas contained in the Knowledge Graph are polysemic, which means that they have several meanings or they represent different concepts. For an ancestor rule to be verified, the disambiguator must associate the token in the text to one of the meanings represented by the syncons chain used in the rules. This means that the ANCESTOR
attribute not only considers the form of a word but also its contextual meaning.
Consider the following examples:
ANCESTOR(17200)
The above rule starts at the concept of house meaning a building for human habitation and includes all concepts below it. Since neither a level number nor a link name was specified, this rule considers syncon 17200 and all of its descendants in the supernomen/subnomen chain. Therefore, it will recognize not only the concept of house, but also the concepts for different types of houses. For example, the first level of the supernomen/subnomen chain starting from syncon 17200 contains concepts for apartment building, detached house, duplex house, townhouse etc. Further down the chain, apartment building contains second level concepts such as apartment block and cooperative.
Instead, in this example:
ANCESTOR(78449:2)
syncon 78449 refers to plant with the meaning of any living organism of vegetable origin. The above rule considers only the supernomen/subnomen link up to the second level of descendants. In other words, the concept of plant is recognized in a document along with its direct children (flower, tree, bush, etc.) which are found in the first level of the hierarchy. The type of flowers and trees are also recognized (tulip, orchid, shade tree, high-trunk tree, etc.) because they are found in the second level of the hierarchy. The types of roses however are not be considered since they are in the third level of the hierarchy descending from plant.
Going further into syntax complexity, consider the following statement:
ANCESTOR(12622858:99:syncon/geography)
Syncon 12622858 refers to United Kingdom and this statement considers all levels down the chain following the syncon/geography link (99 is the default level number that indicates maximum depth in a concept chain). In other words, the chain linking United Kingdom along with all the administrative places associated with this country (constituent nations like Wales, counties like Derbyshire, cities like Manchester etc.) will be matched if found in a document.
Matching a virtual supernomen
As a standard, the ANCESTOR
attribute will recognize in a text one or several words contained in the Knowledge Graph when the syncon ID of one of their ancestors (father, grandfather, great-grandfather etc.) is specified. However, certain unknown elements can also be matched by rules using the ANCESTOR
attribute. In fact, the disambiguator is able to apply a heuristic approach when faced with unknown elements, and guess from the context if an unknown entity can be virtually linked to a concept in the Knowledge Graph. In other words, when the disambiguator comes across an unknown element in a text, it uses the known words surrounding the unknown element to assign it to a virtual supernomen.
A typical example of unknown elements which are elevated to the rank of "entities" are units of measurements, such as meters. In fact, all possible values indicating a length measure are not contained in the Knowledge Graph; yet it is possible to specify the syncon ID for the concept of "meter":
ANCESTOR(100011573)
to match in a text any value relating to meters, whether it is contained in the Knowledge Graph or not. This is possible because the disambiguator is able to connect an unknown entity to a known concept thus creating a virtual kinship. In the same way, in the following sentence:
[..] It is a type of pattern seen in the tiled Islamic mosaics at the Alhambra Palace in Spain and the Darb-i Imam shrine in Iran, but which had never been thought could exist in nature.
the disambiguator recognizes that Darb-i Imam is the proper name of a shrine and will be therefore connected to the syncon ID 20444 referring to the concept shrine, which becomes its virtual supernomen.
Another example of an unknown entity being linked to a virtual supernomen is in the following sentence:
Shechtman was born in Tel Aviv in 1941 and received his PhD from Technion, the Israel Institute of Technology in Haifa, in 1972.
Here, Israel Institute of Technology is recognized as a virtual son of the concept of Institute intended as an educational institution (syncon ID 148106). This process can be potentially applied to any unknown element in the text, however the correctness of the output will always be strictly related to the quality and quantity of the contextual information available to the disambiguator.
With UNKNOWN
It is also possible to use the ANCESTOR
attribute with the UNKNOWN
value in the place of a syncon ID, as shown below.
ANCESTOR(UNKNOWN)
In this case, only the elements with no virtual supernomens will be matched in a text. This means that only elements which are not contained in the Knowledge Graph and for which the disambiguator was not able to assign a supernomen will be taken into consideration. The disambiguator searches for the virtual fathers by navigating the supernomen/subnomen link, therefore when the UNKNOWN
value is used no other links or levels can be specified.
Warning
If used by itself, the UNKNOWN
value, can be extremely powerful and hyper generative. In fact, it is designed to be used in combination with other attributes.
Double link ancestor
Another feature regarding the use of the ANCESTOR
attribute is the possibility to specify a second link to be navigated. This can be performed by adding another colon (:) after the first link name and then adding the name of the second link. The syntax is:
ANCESTOR(ID:levelNumber:linkName1:linkName2)
This syntax performs a complex task of concept identification, the process is comprised of the following steps:
- Navigate the first type of chain starting from the specified ancestor ID.
- In the first chain, identify all syncons that are also part of the second type of chain.
- Navigate the second chain starting from each of the identified syncons.
- In the text, identify the syncons that are found in any of the chains belonging to the second type of link and starting from any of the syncons found in the first chain.
Consider the following sample text, which contains information about natural geographic features related to the United Kingdom:
England is a country that is part of the United Kingdom. It shares land borders with Scotland to the north and Wales to the west; the Irish Sea is to the north west, the Celtic Sea to the south west, while the North Sea to the east and the English Channel to the south separate it from continental Europe. Most of England comprises the central and southern part of the island of Great Britain in the North Atlantic. The country also includes over 100 smaller islands such as the Isle of Sheppey and the Isle of Wight.
For demonstration purposes, the goal is to categorize any document against a taxonomy of Countries identifying any geographic concept mentioned in the text. Using the "double link" ancestor syntax presented above, it is possible to obtain the result.
This categorization rule:
SCOPE SENTENCE
{
//United Kingdom: natural elements
DOMAIN(dom1:NORMAL)
{
ANCESTOR(12622858:99:syncon/geography:omninomen/parsnomen)//United Kingdom
}
}
is triggered by the sample text because four concepts related to the Country are found in the text. They are natural elements that are part of the United Kingdom: Irish Sea, English Channel, Isle of Sheppey and Isle of Wight.
Starting from the ancestor syncon ID which corresponds to the concept of United Kingdom (12622858, down 99 is the default to indicate maximum depth into the concepts chain), the syncon/geography link navigates down the chain of concepts and links all the administrative places in which the Country is organized (constituent nations, counties, cities...). The second link, omninomen/parsnomen ("part of" type of relation), identifies in the text all the natural elements linked to every administrative place which belongs to the first chain. In the sample text above, some of the natural elements are linked to the UK itself (the starting point of the chain) while others are linked to England (child of UK in the administrative chain). In other words, this syntax identifies all the natural elements associated to the territory of the Country using just a single rule.
This is just one of the possible applications of this syntax. Another example is to use the supernomen/subnomen link in combination with the omninomen/parsnomen link to identify the components of different types of motor vehicles. Let's consider the rule below:
SCOPE SENTENCE
{
// Motor vehicle part
DOMAIN(dom1:NORMAL)
{
ANCESTOR(78327:99:supernomen/subnomen:omninomen/parsnomen)//motor vehicle
}
}
and a new sample text:
Almost all trucks share a common construction: they are made of a chassis, a cab, an area for placing cargo or equipment, axles, suspension and roadwheels, an engine and a drivetrain. Pneumatic, hydraulic, water, and electrical systems may also be identified. Many also tow one or more trailers or semi-trailers.
Given this text, our rule would recognize not only the concept of chassis, which is a component of any motor vehicle (therefore chassis has a direct omninomen/parsnomen link with motor vehicle), but also the concept of trailer, which is a component of a trailer truck, which in turn is a type of motor vehicle (trailer has a direct omninomen/parsnomen link with trailer truck, which has a supernomen/subnomen link with truck and, higher in the chain, with motor vehicle).
Using the - filter
When navigating two links of an ANCESTOR
chain it is very important to have a clear idea on how concepts are represented inside the Knowledge Graph. For example, syncon/geography is a relation between proper nouns expressing geographical inclusion, the most inclusive concept sits at the top of the chain while all other concepts will be listed hierarchally downwards. The same happens for adjective/geography, where the starting point of the chain is a geographical adjective and its child will be the country to which the adjective is related. Using the double-link syntax,a rule can match on geographical adjectives just by specifying the geographic place and the two links to be navigated, as in the rule below:
SCOPE SENTENCE
{
//adjective/geography
DOMAIN(ITALY)
{
ANCESTOR(100000046:1:syncon/geography:adjective/geography) //@SYN: #100000046# [Europe]
}
}
Unfortunately, since we can only navigate downward, we will have matches for the first link—geographical places—but we won't be able to have matches on adjectives because the downward navigation does not suit the way in which in geographical places and adjectives are arranged with the Knowledge Graph. In order to have a match we need to change the navigation direction for the adjective/geography link from downward to upward.
This is possible by adding -
before the link for which we want to change the navigation direction.
Consider the rule below:
SCOPE SENTENCE
{
//adjective/geography
DOMAIN(ITALY)
{
ANCESTOR(100000046:1:syncon/geography:-adjective/geography) //@SYN: #100000046# [Europe]
}
}
The starting point of the chain in the rule is the concept Europe. The chain is navigated downward through the link syncon/geography, allowing matches on all countries in Europe. By adding -
before the second link adjective/geography, the link will be navigated upward and, in this case it can be a match on the adjectives linked to the country.
Note
If the second link contains white spaces or dashes, write it with quotation marks ("
).