Skip to content

TOKEN

TOKEN is an extraction transformation option which can be described as the equivalent of the TEXT option on the word level of text analysis. The TOKEN option, however, acts on the atom level and maintains what is matched by the attribute in its original form.

The syntax of the TOKEN option is the following:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[TOKEN]
    }
}

To fully understand the TOKEN option, it is useful to compare it to the TEXT option. Consider the following example:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[KEYWORD("emergency")]|[TEXT]
    }
}

This rule aims is to extract the keyword emergency and to keep its value in its original form. Consider the extraction output if the above rule is run against the following sample sentence:

Emergency teams battled more than 130 fires across New South Wales.

The text contains only one value that matches the sample rule; the base form on the atom level for this value is emergency. Due to the TEXT transformation, the field is set to the actual instance found in the text at the atom level, which is the keyword Emergency.

Compare this result with the TOKEN transformation run against the same sample sentence:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[KEYWORD("emergency")]|[TOKEN]
    }
}

Here, the keyword emergency is part of a noun lemma emergency teams at the word level, which is composed of two atoms: emergency and team. The base form at the word level for this value is emergency team. The effect of the TOKEN transformation is the extraction of the text form on the word level: Emergency teams.