Skip to content

POSITION attribute

The POSITION attribute identifies a token by specifying its position in a text. The token will be recognized in a text, if it is found in the specified position.

The syntax is:

POSITION(position1[, position2, ...])

where:

  • POSITION is the attribute name and must be written in uppercase.
  • position# refers to a list of predefined values identifying key positions for textual elements inside the document itself. These textual elements include any sequence of alphabetical characters, numbers and punctuation marks.

A rule using the POSITION attribute will be valid, only if the position is specified in a predefined format. All positions, with a brief description for each one of them, are listed in the table below.

Position Description
BEGIN SENTENCE First token in a sentence
END SENTENCE Last token in a sentence
BEGIN PARAGRAPH First token in a paragraph
END PARAGRAPH Last token in a paragraph
BEGIN SECTION First token in a document
END SECTION Last token in a document

Warning

Please note: the POSITION attribute, if used alone, is hyper generative. It is highly recommended to use the POSITION attribute in conjunction with other attributes.

The POSITION attribute allows the use of one or more positions in a given statement. A token will be identified in a text, if it is found in the specified position.

For example:

POSITION(BEGIN SENTENCE)

This statement would identify any element found at the beginning of a sentence.

For demonstrative purposes, let's imagine the statement above is used alone in rule-writing. In a sentence such as:

Investigators said it could take months to create a full account of the events preceding and during the killing rampage. The State Police officially confirmed the identity of the killer.

The elements that are recognized as the beginning of the sentence would be Investigators and The.

This other example:

POSITION(END SENTENCE)

would identify any element found at the end of a sentence. If this rule is applied to the same text above, the elements recognized as the end of the sentence would be:

  • rampage
  • The full stop after rampage
  • killer
  • The full stop after killer

Even though punctuation marks are considered as the elements at the end of the sentences, tokens immediately before the punctuation marks are also triggered by the rule. This situation alters the scores of your rules, because more elements are triggered.

In another example, if this rule:

POSITION(BEGIN SENTENCE)

is applied to this Spanish text:

¿Cómo estás?

The elements recognized by the rule as the beginning of the sentence are:

  • The question mark ¿
  • Cómo

In these specific cases, if you don't need punctuation marks but only meaningful tokens, modify the rule like this:

POSITION(END SENTENCE) - TYPE(PNT)