SMARTENTRY
SMARTENTRY
is similar to ENTRY
. In fact, SMARTENTRY
can be described as an evolution of the ENTRY
option.
The syntax for extraction rules is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
@field[attribute]|[SMARTENTRY]
}
}
The syntax for tagging rules is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
@tag[attribute]|[SMARTENTRY]
}
}
The purpose SMARTENTRY
is to transform what is matched by the attribute into its most significant base form. But whereas ENTRY
is useful only for attributes capable of recognizing concepts found in the knowledge graph, SMARTENTRY
can be used with any attribute without the need to evaluate if a concept is found in the knowledge graph or not. The SMARTENTRY
option automatically recognizes if a concept is contained in the knowledge graph or not, and will return a syncon "main lemma", if the concept is known, or will return its guessed base form, if the concept is unknown.
SMARTENTRY
is useful in many cases, especially for named entities processing. To understand the behavior of this option, consider the following use case:
Named entities are classified by the semantic disambiguator as proper nouns. The knowledge graph contains a number of proper noun syncons, but not all existing proper nouns in the language are part of the semantic network. For example, the knowledge graph may contain a syncon for the concept of Lehman Brothers, but it does not contain a syncon for Transocean Ltd.. The disambiguator, however, is able to recognize that Transocean Ltd. is the proper noun of a company even though this concept is not present in the knowledge graph. Transocean Ltd., in fact, is recognized as a virtual child of syncon 37887 (limited liability company).
Before the introduction of the SMARTENTRY
option, if a rule was required to extract proper names of companies from texts and to normalize the output value with a constant form when the company name is contained in the knowledge graph, different rules using different transformation options should have been used based on the company being "known" or "unknown" to the knowledge graph.
The transformation option used to extract known syncons is ENTRY
which finds a concept in a text, identifies it in all its possible forms and variations (contained in the knowledge graph) and returns a constant form corresponding to the "main lemma" (previously set in the knowledge graph).
Consider the following example:
SCOPE SENTENCE
{
IDENTIFY(COMPANY)
{
@COMPANY_NAME[ANCESTOR(37475) + TYPE(NPR) - SYNCON(UNKNOWN)]|[ENTRY]
}
}
The purpose of this rule is to extract a chain of proper noun concepts (+ TYPE(NPR)
) starting from syncon 37475 (company), only if the identified concepts are not "unknown" to the knowledge graph (- SYNCON (UNKNOWN)
); in other words, extract proper nouns of companies if they are contained in the knowledge graph. If this condition is verified, then the ENTRY
transformation option will ensure that every form that a company name can take will be transformed into the syncon main lemma. This allows for a concept to have one consistent extraction value even though the concept appears in several different forms in a text.
Consider for example the extraction output if the modified sample rule is run against the following sentence:
The equities index is 20 percent above its level on Sept. 15, 2008, the first trading day after Lehman Brothers Holdings Inc. filed the world's biggest bankruptcy and prompted a 46 percent drop through March 9, 2009.
Lehman Brothers is having a great year. The bank, which almost destroyed the global economy four years ago this week, recently emerged from bankruptcy, resolved a third of its debts and executed the largest U.S. real estate deal of the year
The text contains two values matching the sample rule: Lehman Brothers Holdings Inc. and Lehman Brothers, which are both analyzed as companies. The disambiguator also recognizes these two names as the same company associated to syncon 317862; in the knowledge graph, this syncon contains five different forms referring to the same concept. The extraction panel shows that the value extracted is its main lemma, Lehman Brothers, while the text record shows the two instances found in the text: Lehman Brothers Holdings Inc. and Lehman Brothers. This means that the extracted values have been transformed and normalized into the main lemma thanks to the ENTRY
option.
The transformation option used to extract unknown syncons is TEXT
which extracts the exact value identified in the text. Consider the following example:
SCOPE SENTENCE
{
IDENTIFY(COMPANY)
{
@COMPANY_NAME[ANCESTOR(37475) + TYPE(NPR) + SYNCON(UNKNOWN)]|[TEXT]
}
}
If this condition is verified, the TEXT
transformation option will ensure that every extracted value will be kept in its original form as it appears in the text.
Consider the extraction output if the sample rule is run against the following sentence:
Transocean Ltd. announces Definitive Agreements to Sell 38 Shallow Water drilling Rigs to Shelf Drilling.
The text contains the value Transocean Ltd., which is recognized as a company, but it is not present in the knowledge graph as a syncon. It is however recognized as a company with the virtual supernomen limited liability company. The extracted value was taken as it appears in the text thanks to the TEXT
transformation, therefore the output is Transocean Ltd..
If an ENTRY
option was used for both known and unknown concepts in the knowledge graph, unknown company names would be transformed into the main lemma associated to the token virtual parent.
In the the example, Transocean Ltd. would be transformed into the main lemma associated to the token's virtual parent, therefore limited liability company.
Instead, if only one rule with the TEXT
option was used, the option of normalizing concepts found in the knowledge graph would be lost.
The SMARTENTRY
option provides the possibility to write just one rule for concepts which are both known and unknown to the knowledge graph and to obtain clean and correct results which were previously obtained using two separate rules. SMARTENTRY
recognizes when an entity is contained or not in the knowledge graph and consequently normalizes the values.