Skip to content

Extraction rules syntax

Overview

The syntax of an extraction rule is:

IDENTIFY[[ruleLabel]](templateName)
{
    condition
}

where:

  • IDENTIFY is a language keyword and must be written in uppercase.
  • ruleLabel is a label that helps identify the rule.
  • templateName is the name of the template for which extraction records are to be generated.
  • condition is the rule's condition.

Note

The parts between brackets ([...]) are optional.

The rule must be contained in a scope specifier:

SCOPE scopeOption
{
    IDENTIFY[[ruleLabel]](templateName)
    {
        condition
    }
}

For example, given the following template:

TEMPLATE(PERSONAL_DATA)
{
    @Name,
    @Telephone,
    @Address
}

this extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]
    }
}

is activated by person names found in a sentence scope and, for every activation, it generates an extraction record with the PERSONAL_DATA template filling the field @Name with the person name.

More rules can be put inside the same scope specifier:

SCOPE scopeOption
{
    //Rule #1
    IDENTIFY[[ruleLabel]](templateName)
    {
        condition
    }

    //Rule #2
    IDENTIFY[[ruleLabel]](templateName)
    {
        condition
    }

    ...
}

Condition peculiarities

The structure of the condition is the same for categorization rules and extraction rules, however there's a fundamental peculiarity in the condition of extraction rules: the condition also specifies which template's fields are to be filled and how. The field names must be followed by operands contained in square brackets, like so:

@field[operand]

Note

In this case brackets are mandatory, they do not indicate an optional part of the syntax.

The meaning of this syntax is: the text of the token matched by the operand will be used to fill (set the value of) the field @field.

Field-prefixed operands can be combined with other simple or field-prefixed operands by means of Boolean or positional sequence operators to create complex conditions:

operand
operator
@field1[operand]
operator
@field2[operand]
operator
operand

There must be at least one field-prefixed operand in the condition, meaning that every rule must extract at least one field.

The overall condition is evaluated to determine if the rule must be activated, but fields "receive" their value only from the single operands they are associated with.

It's not possible to include operators, and therefore define expressions or use sub-rules, within a field-prefixed operand.

Conditions featuring more than one field determine the so called "by-rule aggregation", that is the generation of records with multiple fields.

Note

When writing an extraction rule, it is possibile to repeat the same field more than once.