SCOPE custom options
Introduction
Custom scope options are those portions of a text that relate to the textual subdivisions that can be optionally defined by the user for a specific project and/or text type, they are used to delimit the area of action of a rule or a group of rules.
There are the following custom scope options:
SECTION
SEGMENT
TABLE
These can be used alone or they can be combined, either with each other, or with the standard scope options.
Warning
The use of these options combined with boolean operators can slow down the engine.
SECTION
SECTION
is one of the custom textual subdivisions that can be defined. The SECTION
scope option can be used only if the input texts have section annotations. The names of such SECTION
tags can be used to define the scope of both categorization and extraction rules, provided that these names have been declared beforehand. By selecting this scope option, a rule is required to apply to a text block previously tagged as a section.
The syntax for the scope option SECTION
is the following:
SCOPE SECTION(sectionName) [ON ATOM]
{
rule(s)
}
Note
Parts between square brackets ([]
) are optional.
ON ATOM
is optional and lets your rules trigger in function of an atom-based count of the textual elements of the sentence. You can find a practical example in the positional sequences section of this manual.
Note
Every input document can contain a single instance for each SECTION
tag.
Every project has a default standard section named BODY. All documents without section annotations are automatically treated as if they are entirely contained in the standard section BODY. In fact, it is possible to write rules to act upon this section with no need for input text pre-processing or section name declaration. For example:
SCOPE SECTION(BODY)
{
//rule(s)//
}
would act upon the text block previously defined as the section named BODY
.
When using the SECTION
scope option, it is possible to select a single section name or a list of them. For example, if we consider a newspaper article, where the TITLE, LEAD and BODY parts of the text are annotated as sections, the following rule:
SCOPE SECTION(TITLE, LEAD)
{
//rule(s)//
}
will act upon any text block contained in the TITLE and LEAD sections, thus not considering the text contained in the BODY section.
Note
Such a SCOPE
definition would, for example, give priority to concepts mentioned in the title (presumably the main topic of the article) and ignore any correlated but secondary topics mentioned in the body of the article.
SEGMENT
SEGMENT
is one of the custom textual subdivisions that can be defined by means of specific semantic rules designed for segments creation. The names of segments can be used to define the scope of both categorization and extraction rules, provided that these names have been declared beforehand in the configuration file. By selecting this scope option, a rule is required to act upon a text block previously recognized as a segment.
The syntax for the scope option SEGMENT
is the following:
SCOPE SEGMENT(segmentName) [ON ATOM]
{
rule(s)
}
In comparison to sections, for which only a single instance per document is allowed, segments can be instantiated several times in a single input document. Every project has two predefined segments named SEGMENT1 and SEGMENT2. It is possible to write rules to be instantiated on these segments with no need to declare the segment in the Section/Segment definition panel. For example:
SCOPE SEGMENT(SEGMENT1)
{
//rule(s)//
}
would act upon the text block previously defined as the segment named SEGMENT1.
When using the SEGMENT
scope option, it is possible to select a single segment name or a list of them. For example, if we consider a letter or an e-mail, where it is possible to recognize text subdivisions regarding the sender and receiver, and these are contained in as many segments, the following lines:
SCOPE SEGMENT(SENDER, RECEIVER)
{
//rule(s)//
}
will act only upon specific text blocks identified as the SENDER and RECEIVER segments, thus ignoring the information contained in the body of the letter.
Note
Such a SCOPE
definition could allow a user to find just the personal information of the sender and/or the receiver of the correspondence and exclude information regarding any third party mentioned in the body of the message.
TABLE
TABLE
is the custom textual subdivisions that can be defined when the target documents are the result of the document understanding processing with Extract.
This option allows you to work directly on table content.
The syntax is the following:
SCOPE TABLE [ON ATOM]
{
rule(s)
}
The rules are generally combined with the CELL
attribute usage.