POSITION attribute identifies a token by specifying its position in a text. The token will be recognized in a text, if it is found in the specified position.
The syntax is:
POSITION(position1[, position2, ...])
POSITIONis the attribute name and must be written in uppercase.
position#refers to a list of predefined values identifying key positions for textual elements inside the document itself. These textual elements include any sequence of alphabetical characters, numbers and punctuation marks.
A rule using the
POSITION attribute will be valid, only if the position is specified in a predefined format. All positions, with a brief description for each one of them, are listed in the table below.
|BEGIN SENTENCE||First token in a sentence|
|END SENTENCE||Last token in a sentence|
|BEGIN PARAGRAPH||First token in a paragraph|
|END PARAGRAPH||Last token in a paragraph|
|BEGIN SECTION||First token in a document|
|END SECTION||Last token in a document|
Please note: the
POSITION attribute, if used alone, is hyper generative. It is highly recommended to use the
POSITION attribute in conjunction with other attributes.
POSITION attribute allows the use of one or more positions in a given statement. A token will be identified in a text, if it is found in the specified position.
This statement would identify any element found at the beginning of a sentence.
For demonstrative purposes, let's imagine the statement above is used alone in rule-writing. In a sentence such as:
Investigators said it could take months to create a full account of the events preceding and during the killing rampage. The State Police officially confirmed the identity of the killer.
The elements that are recognized as the beginning of the sentence would be Investigators and The.
This other example:
would identify any element found at the end of a sentence. If this rule is applied to the same text above, the elements recognized as the end of the sentence would be:
- The full stop after rampage
- The full stop after killer
Even though punctuation marks are considered as the elements at the end of the sentences, tokens immediately before the punctuation marks are also triggered by the rule. This situation alters the scores of your rules, because more elements are triggered.
In another example, if this rule:
is applied to this Spanish text:
The elements recognized by the rule as the beginning of the sentence are:
- The question mark ¿
In these specific cases, if you don't need punctuation marks but only meaningful tokens, modify the rule like this:
POSITION(END SENTENCE) - TYPE(PNT)