Skip to content

SECTION

SECTION can be described as a completion feature of the matched value rather than a normalization. It adds elements surrounding the original matched data to the final output value.

Its action is based on the concept of section, which is a custom text subdivision that can be optionally defined for a project.

The SECTION option returns the whole section containing the value matched by an attribute. This option should only be used if one or more sections have been previously defined in the project. Also, at least one section must be used in the rule scope.

The syntax for extraction rules is:

SCOPE scopeOption IN SECTION(sectionName)
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[SECTION]
    }
}

The syntax for tagging rules is:

SCOPE scopeOption
{
    TAGGER(tagLevel)
    {
        @tag[attribute]|[SECTION]
    }
}

This option is useful in situations where it's necessary to expand the output revolving around a matched element.

Consider the following example:

SCOPE SECTION(PUBLICATIONDATE)
{
    IDENTIFY(ARTICLE)
    {
        @Date[TYPE(DAT)]|[SECTION]
    }
}

The purpose of this rule is to extract dates (TYPE(DAT)) within a previously defined section called PUBLICATIONDATE (SCOPE SECTION (PUBLICATIONDATE)). If this condition is verified, the SECTION transformation option will ensure that every extracted value will be expanded to the section where the dates are found.

Consider the extraction output if the rule above is run against the following sectioned text:

TITLE section:

    Flu Widespread, Leading a Range of Winter's Ills


AUTHOR section:

    By DONALD G. McNEIL Jr. and KATHARINE Q. SEELYE


PUBLICATIONDATE section:

    Published: January 9, 2013


BODY section:

    It is not your imagination - more people you know are sick this winter, even people who have had flu shots.
    The country is in the grip of three emerging flu or flulike epidemics: an early start to the annual flu season with an unusually aggressive virus, a surge in a new type of norovirus, and the worst whooping cough outbreak in 60 years. And these are all developing amid the normal winter highs for the many viruses that cause symptoms on the "colds and flu" spectrum.
    Influenza is widespread, and causing local crises. On Wednesday, Boston's mayor declared a public health emergency as cases flooded hospital emergency rooms.

Here the section PUBLICATIONDATE allows for two actions:

  • As the scope of the rule, it restricts the extraction of dates to just the portion of text it delimits, in this case, the section called PUBLICATIONDATE.
  • As the transformation option, the section itself is the final output of the extraction process.

The rule condition is triggered by January 9, 2013 found within the section PUBLICATIONDATE, but the SECTION transformation extracts the whole section in the @Date field: Published: January 9, 2013.