SEQUENCE

SEQUENCE is an extraction transformation option that can be described as a completion feature of the matched value rather than a normalization. It adds elements surrounding the original matched data to the final extracted value.

Its action is based on the concept of "sequence", one of the classes of operators available in the Rules language. Used in extraction rules, sequences define conditions that imply stricter constraints compared to basic extraction rules. The SEQUENCE option returns all the elements included in a rule's sequence along with the value matched by an attribute enclosed in the extraction syntax.

The syntax of the SEQUENCE option is the following:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        attribute1
        sequenceOperator
        @field[attribute2]|[SEQUENCE]
    }
}

sequenceOperator refers to one of the positional or logical sequence operators available. The operators and the attributes, other than the one enclosed in the extraction syntax, can be positioned before or after the extraction syntax and as many operators and attributes may be used as needed. By definition, the SEQUENCE option must be used in a rule containing at least one sequence.

Basic extraction rules are commonly made up of an attribute (or combination of attributes).

In some cases, a condition implies that other elements must be taken into account in addition to the attribute specified within the extraction syntax. A rule may in fact contain one or more constraining attributes, as well as the attribute enclosed in the extraction syntax, all of which are combined using one of the available positional or logical operators, for example:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        KEYWORD("flu", "flulike", "flu-like")
        >>
        @field_1[LEMMA("epidemic")]
    }
}

The purpose of this rule is to extract only the lemma epidemic, in singular or plural form, if it appears in a text strictly preceded (double greater than sign, >>) by one of the keywords flu, flulike or flu-like.

The SEQUENCE option must be used when it is required for a sequence rule to extract not only the attribute enclosed in the extraction syntax but also any other element included in the sequence.

Consider the same rule above with the addition of the SEQUENCE transformation:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        KEYWORD("flu", "flulike", "flu-like")
        >>
        @field_1[LEMMA("epidemic")]|[SEQUENCE]
    }
}

Now, if the condition is verified, the SEQUENCE transformation option will ensure that the extraction value will be expanded to include all elements pertaining to the sequence specified in the rule.

Consider the extraction output if the rule above is run against the following sample text:

Flu Widespread, Leading a Range of Winter's Ills
By DONALD G. McNEIL Jr. and KATHARINE Q. SEELYE
Published: January 9, 2013
It is not your imagination - more people you know are sick this winter, even people who have had flu shots.
The country is in the grip of three emerging flu or flulike epidemics: an early start to the annual flu season with an unusually aggressive virus, a surge in a new type of norovirus, and the worst whooping cough outbreak in 60 years. And these are all developing amid the normal winter highs for the many viruses that cause symptoms on the "colds and flu" spectrum.
Influenza is widespread, and causing local crises. On Wednesday, Boston's mayor declared a public health emergency as cases flooded hospital emergency rooms.

The text contains only one combination of values matching the sample rule - flulike epidemics - and the SEQUENCE operator causes the extraction of the entire sequence, and not just of the token (lemma epidemic) matched by the field-prefixed operand.