SEQUENCE
SEQUENCE
can be described as a completion feature of the matched value rather than a normalization. It adds elements surrounding the original matched data to the final output value.
Its action is based on the concept of "sequence", one of the classes of operators available in the Rules language. Sequences define conditions that imply stricter constraints compared to basic rules. The SEQUENCE
option returns all the elements included in a rule's sequence along with the value matched by an attribute enclosed in the rule syntax.
The syntax for extraction rules is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
attribute1
sequenceOperator
@field[attribute2]|[SEQUENCE]
}
}
The syntax for tagging rules is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
attribute1
sequenceOperator
@tag[attribute2]|[SEQUENCE]
}
}
sequenceOperator
refers to one of the positional or logical sequence operators available. The operators and the attributes other than the one enclosed in the syntaxes can be positioned before or after the field- or tag-prefixed operand and as many operators and attributes may be used as needed. The SEQUENCE
option must be used in a rule containing at least one sequence.
Basic rules are commonly made up of an attribute (or combination of attributes).
In some cases, a condition implies that other elements must be taken into account in addition to the attributes specified within the rule. A rule may in fact contain one or more constraining attributes, as well as the attribute enclosed in the syntax, all of which are combined using one of the available positional or logical operators, for example:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
KEYWORD("flu", "flulike", "flu-like")
>>
@field_1[LEMMA("epidemic")]
}
}
The purpose of this rule is to extract only the lemma epidemic, in singular or plural form, if it appears in a text strictly preceded (double greater than sign, >>
) by one of the keywords flu, flulike or flu-like.
The SEQUENCE
option must be used when it is required for a sequence rule to extract not only the attribute enclosed in the extraction syntax but also any other element included in the sequence.
Consider the same rule above with the addition of the SEQUENCE
transformation:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
KEYWORD("flu", "flulike", "flu-like")
>>
@field_1[LEMMA("epidemic")]|[SEQUENCE]
}
}
Now, if the condition is verified, the SEQUENCE
transformation option will ensure that the extraction value will be expanded to include all elements pertaining to the sequence specified in the rule.
Consider the extraction output if the rule above is run against the following sample text:
Flu Widespread, Leading a Range of Winter's Ills
By DONALD G. McNEIL Jr. and KATHARINE Q. SEELYE
Published: January 9, 2013
It is not your imagination - more people you know are sick this winter, even people who have had flu shots.
The country is in the grip of three emerging flu or flulike epidemics: an early start to the annual flu season with an unusually aggressive virus, a surge in a new type of norovirus, and the worst whooping cough outbreak in 60 years. And these are all developing amid the normal winter highs for the many viruses that cause symptoms on the "colds and flu" spectrum.
Influenza is widespread, and causing local crises. On Wednesday, Boston's mayor declared a public health emergency as cases flooded hospital emergency rooms.
The text contains only one combination of values matching the sample rule - flulike epidemics - and the SEQUENCE
operator causes the extraction of the entire sequence, and not just of the token (lemma epidemic) matched by the field-prefixed operand.