Skip to content

Attributes overview

Attributes are the building blocks of categorization and extraction rules.

They are operands which are used to match disambiguation output tokens based on the attributes they possess.

The syntax of a generic attribute is:

attributeName(value1[, value2, ...])

where:

  • attributeName can be one of the possible attributes listed below.
  • value# refers to the parameter taken by the attribute. For each attribute, it is possible to specify more than one value.

This table lists all possible attributes along with their values and a short description. For further details, please see the individual attribute sections.

Attribute Values type Description
KEYWORD String Matches any token that is exactly equal to the given strings
LEMMA Lemma Matches any token that is a possible inflection of a given lemma contained in the knowledge graph
BLEMMA Lemma Similar to LEMMA, but a match is performed at the sub-token or "atom" level of the text
ULEMMA Lemma Matches any token that is a possible inflection of a given lemma not contained in the knowledge graph
SYNCON Syncon Matches any token which corresponds to a given concept (syncon) contained in the knowledge graph
ANCESTOR Syncon Matches every token which corresponds to a "descendant concept" of a given concept contained in the knowledge graph
LIST Syncon Matches every token corresponding to any lemma of the given knowledge graph concepts
BLIST Syncon Similar to LIST, but matches are performed at the "atom level" of the text analysis output
TYPE Type Matches any token of the given types
PATTERN Regular expression(s) Matches any token matching the given regular expressions
ROLE Role Matches any token having one of the given roles in the sentence analysis of the text
POSITION Position Matches any token occupying one of the given positions
RELEVANT List Matches any token being in one of the given lists of relevant text elements
TAG Tag Matches any token corresponding to one of the given tags
BTAG Tag Similar to TAG, but a match is performed at the sub-token or "atom" level of the text
CELL Integer Query and extract cells content from tables of given coordinates defined with row and column
TITLELEVEL Integer Extract content related with a given heading level
STEM String Matches any token sharing the root of the given strings

As an option, values can be loaded from an external list.