TEXT
TEXT
transformation fills extraction fields and/or tags with the original text matched by the rule operands.
This is the default transformation for the KEYWORD
and the PATTERN
attributes, while the default for all the other attributes is BASE
.
The syntax for extraction rules is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
@field[attribute]|[TEXT]
}
}
The syntax for tagging rules is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
@tag[attribute]|[TEXT]
}
}
This option is useful in cases where strict correspondence to the text is necessary.
For example, consider a project with a customized Knowledge Graph containing the concepts of many companies linked to the parent syncon with ID 37475 (company). For some companies, several variants of their names have been defined.
To extract the exact citation of a company from a text, we can use this rule:
SCOPE SENTENCE
{
IDENTIFY(COMPANY)
{
@COMPANY_NAME[ANCESTOR(37475) + TYPE(NPR) - SYNCON(UNKNOWN)]|[TEXT]
}
}
The operand associated with the @COMPANY_NAME field matches concepts that derive from the syncon 37475 excluding the common names of types of companies (+ TYPE (NPR)
) and the companies heuristically recognized as such by the disambiguator and that therefore have the syncon 37475 as their virtual supernomen.
Now consider the following sample text:
The equities index is 20 percent above its level on Sept. 15, 2008, the first trading day after Lehman Brothers Holdings Inc. filed the world's biggest bankruptcy and prompted a 46 percent drop through March 9, 2009.
Lehman Brothers is having a great year. The bank, which almost destroyed the global economy four years ago this week, recently emerged from bankruptcy, resolved a third of its debts and executed the largest U.S. real estate deal of the year.
The operand above matches Lehman Brothers Holdings Inc. and Lehman Brothers as they are lemmas of a syncon that has Lehman Brothers as its base form.
If no transformation or the BASE
transformation were specified, the base form would be extracted. With the TEXT
transformation the literal value matched by the operand is extracted instead.