Skip to content

Combinations of standard and custom scope options

Overview

Standard and custom options can be combined when setting a rule scope. In particular, it is possible to select a standard option type to be included within a custom scope option. By selecting this option, a rule is required to act upon a paragraph, sentence, clause or phrase that is contained in a given section or segment.

The syntax for combining standard and custom scope options is the following:

SCOPE standardOption(optionType) IN customOption(name)
{
    rule(s)
}

where:

  • standardOption corresponds to one of the available options.
  • customOption corresponds to one of the available options.
  • optionType corresponds to one of the types available for the standard options (if any).
  • name corresponds to the name of one the sections or segments defined for a specific project. For example, if we consider a newspaper article, where the TITLE, LEAD and BODY of the text are annotated as SECTIONS, the following lines:
SCOPE CLAUSE(INDEPENDENT) IN SECTION(LEAD)
{
    //rule(s)//
}

will act upon a rule on a text block recognized as an independent clause within the section containing the lead paragraph.

It is also possible to use a combination of PHRASE and CLAUSE scope options together with one of the custom scope options. The syntax for defining such a scope is the following:

SCOPE PHRASE IN CLAUSE(clauseType) IN customOption(name)
{
    rule(s)
}

For example, the following definition:

SCOPE PHRASE IN CLAUSE(INDEPENDENT) IN SECTION(LEAD)
{
    //rule(s)//
}

will act upon a rule on a text block recognized as a noun phrase, contained in an independent clause, which is in turn included in the section containing the lead paragraph.

The use of such complex combinations aims to identify a very precise and limited area for the rule to act upon; in fact, the hits generated by rules with this kind of scope are more likely to be characterized by high precision rather than high recall.

IF NOT IN SEGMENT

IF NOT IN SEGMENT is a constraint thanks to which a rule doesn't trigger if it's inside a segment.

Consider the following segment made of these lemmas:

SEGMENT(EVENT)
{
    LEMMA("storm", "snowfall", "tornado", "hurricane")
}

and this categorization rule:

SCOPE SENTENCE IF NOT IN SEGMENT(EVENT)
{
    DOMAIN(dom1)
    {
        LEMMA("destroy")
    }
}

If the latter is applied to the following text:

The storm and the hurricane destroyed everything around the area.

you will get no categorization output, because the SCOPE option will let the rule trigger only if it is not found within the specified segment.

IN SECTION(sectionName:segmentName)

IN SECTION(sectionName:segmentName) is another way to combine standard and custom scope options in a rule.

For example, the following situation with segments, sections and an extraction rule:

SEGMENTS
{
    @MY_SEGMENT
}

SECTIONS
{
    @TITLE,
    @BODY
}

TEMPLATE(PERSONAL_DATA)
{
    @FULL_NAME
}

SCOPE SENTENCE
{
    SEGMENT(MY_SEGMENT)
    {
        KEYWORD("life and death")
        >>
        TYPE(PRE)
        >>
        TYPE(NPH)
    }
}

SCOPE SENTENCE IN SECTION(TITLE:MY_SEGMENT)
{
    IDENTIFY(PERSONAL_DATA)
    {
        @FULL_NAME[TYPE(NPH)]
    }
}

applied to this text:

Life and death of Julius Ceasar
Gaius Julius Caesar (12 July 100 BC - 15 March 44 BC) was a Roman general and statesman. A member of the First Triumvirate, Caesar led the Roman armies in the Gallic Wars before defeating his political rival Pompey in a civil war, and subsequently became dictator from 49 BC until his assassination in 44 BC. He played a critical role in the events that led to the demise of the Roman Republic and the rise of the Roman Empire.

will extract the proper name Julius Ceasar only if it is in the title of the article—Life and death of Julius Ceasar—because of the rule scope that can be summarized as follows: extract proper names only if they are in the scope of a sentence contained in a specific region of text given by the intersection of a section called TITLE and a segment called MY_SEGMENT.