Extraction rules syntax
Overview
The syntax of an extraction rule is:
IDENTIFY[[ruleLabel]](templateName)
{
condition
}
where:
IDENTIFY
is a language keyword and must be written in uppercase.ruleLabel
is a label that helps identify the rule.templateName
is the name of the template for which extraction records are to be generated.condition
is the rule's condition.
Note
The parts between brackets ([...]
) are optional.
The rule must be contained in a scope specifier:
SCOPE scopeOption
{
IDENTIFY[[ruleLabel]](templateName)
{
condition
}
}
For example, given the following template:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Telephone,
@Address
}
this extraction rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
}
}
is activated by person names found in a sentence scope and, for every activation, it generates an extraction record with the PERSONAL_DATA template filling the field @Name with the person name.
More rules can be put inside the same scope specifier:
SCOPE scopeOption
{
//Rule #1
IDENTIFY[[ruleLabel]](templateName)
{
condition
}
//Rule #2
IDENTIFY[[ruleLabel]](templateName)
{
condition
}
...
}
Condition peculiarities
The structure of the condition is the same for categorization rules and extraction rules, however there's a fundamental peculiarity in the condition of extraction rules: the condition also specifies which template's fields are to be filled and how. The field names must be followed by operands contained in square brackets, like so:
@field[operand]
Note
In this case brackets are mandatory, they do not indicate an optional part of the syntax.
The meaning of this syntax is: the text of the token matched by the operand will be used to fill (set the value of) the field @field
.
Field-prefixed operands can be combined with other simple or field-prefixed operands by means of Boolean or positional sequence operators to create complex conditions:
operand
operator
@field1[operand]
operator
@field2[operand]
operator
operand
There must be at least one field-prefixed operand in the condition, meaning that every rule must extract at least one field.
The overall condition is evaluated to determine if the rule must be activated, but fields "receive" their value only from the single operands they are associated with.
It's not possible to include operators, and therefore define expressions or use sub-rules, within a field-prefixed operand.
Conditions featuring more than one field determine the so called "by-rule aggregation", that is the generation of records with multiple fields.
Note
When writing an extraction rule, it is possibile to repeat the same field more than once.