Skip to content

TOKEN

TOKEN can be described as the equivalent of TEXT on the word level of text analysis. The TOKEN option, however, acts on the atom level and maintains what is matched by the attribute in its original form.

The syntax for extraction rules is:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[TOKEN]
    }
}

The syntax for tagging rules is:

SCOPE scopeOption
{
    TAGGER(taggerLevel)
    {
        @tagName[attribute]|[TOKEN]
    }
}

To fully understand the TOKEN option, it is useful to compare it to the TEXT option. Consider the following example:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[KEYWORD("emergency")]|[TEXT]
    }
}

This rule aims is to extract the keyword emergency and to keep its value in its original form. Consider the extraction output if the above rule is run against the following sample sentence:

Emergency teams battled more than 130 fires across New South Wales.

The text contains only one value that matches the sample rule; the base form on the atom level for this value is emergency. Due to the TEXT transformation, the field is set to the actual instance found in the text at the atom level, which is the keyword Emergency.

Compare this result with the TOKEN transformation run against the same sample sentence:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[KEYWORD("emergency")]|[TOKEN]
    }
}

Here, the keyword emergency is part of a noun lemma emergency teams at the word level, which is composed of two atoms: emergency and team. The base form at the word level for this value is emergency team. The effect of the TOKEN transformation is the extraction of the text form on the word level: Emergency teams.