SCOPE custom options

Introduction

Custom scope options are those portions of a text that relate to the textual subdivisions that can be optionally defined by the user for a specific project and/or text type, they are used to delimit the area of action of a rule or a group of rules.

There are the following custom scope options:

SECTION
SEGMENT
TABLE

These can be used alone or they can be combined, either with each other, or with the standard scope options.

Warning

The use of these options combined with boolean operators can slow down the engine.

SECTION

SECTION is one of the custom textual subdivisions that can be defined. The SECTION scope option can be used only if the input texts have section annotations. The names of such SECTION tags can be used to define the scope of both categorization and extraction rules, provided that these names have been declared beforehand. By selecting this scope option, a rule is required to apply to a text block previously tagged as a section.

The syntax for the scope option SECTION is the following:

SCOPE SECTION(sectionName) [ON ATOM]
{
    rule(s)
}

Note

Parts between square brackets ([]) are optional.

ON ATOM is optional and lets your rules trigger in function of an atom-based count of the textual elements of the sentence. You can find a practical example in the positional sequences section of this manual.

Note

Every input document can contain a single instance for each SECTION tag.

Every project has a default standard section named BODY. All documents without section annotations are automatically treated as if they are entirely contained in the standard section BODY. In fact, it is possible to write rules to act upon this section with no need for input text pre-processing or section name declaration. For example:

SCOPE SECTION(BODY)
{
    //rule(s)//
}

would act upon the text block previously defined as the section named BODY.

When using the SECTION scope option, it is possible to select a single section name or a list of them. For example, if we consider a newspaper article, where the TITLE, LEAD and BODY parts of the text are annotated as sections, the following rule:

SCOPE SECTION(TITLE, LEAD)
{
    //rule(s)//
}

will act upon any text block contained in the TITLE and LEAD sections, thus not considering the text contained in the BODY section.

Note

Such a SCOPE definition would, for example, give priority to concepts mentioned in the title (presumably the main topic of the article) and ignore any correlated but secondary topics mentioned in the body of the article.

SEGMENT

SEGMENT is one of the custom textual subdivisions that can be defined by means of specific semantic rules designed for segments creation. The names of segments can be used to define the scope of both categorization and extraction rules, provided that these names have been declared beforehand in the configuration file. By selecting this scope option, a rule is required to act upon a text block previously recognized as a segment.

The syntax for the scope option SEGMENT is the following:

SCOPE SEGMENT(segmentName) [ON ATOM]
{
    rule(s)
}

In comparison to sections, for which only a single instance per document is allowed, segments can be instantiated several times in a single input document. Every project has two predefined segments named SEGMENT1 and SEGMENT2. It is possible to write rules to be instantiated on these segments with no need to declare the segment in the Section/Segment definition panel. For example:

SCOPE SEGMENT(SEGMENT1)
{
    //rule(s)//
}

would act upon the text block previously defined as the segment named SEGMENT1.

When using the SEGMENT scope option, it is possible to select a single segment name or a list of them. For example, if we consider a letter or an e-mail, where it is possible to recognize text subdivisions regarding the sender and receiver, and these are contained in as many segments, the following lines:

SCOPE SEGMENT(SENDER, RECEIVER)
{
    //rule(s)//
}

will act only upon specific text blocks identified as the SENDER and RECEIVER segments, thus ignoring the information contained in the body of the letter.

Note

Such a SCOPE definition could allow a user to find just the personal information of the sender and/or the receiver of the correspondence and exclude information regarding any third party mentioned in the body of the message.

TABLE

TABLE is the custom textual subdivisions that can be defined when the target documents are the result of the document understanding processing with Extract.

This option allows you to work directly on table content.

The syntax is the following:

SCOPE TABLE [ON ATOM]
{
    rule(s)
}

The rules are generally combined with the CELL attribute usage.