Skip to content

BASE

Overview

BASE transforms what is matched by the field- or tag-prefixed operand into its base form.

With the exception of the KEYWORD attribute, for which the default is the TEXT transformation, and PATTERN, which always returns what is matched by the regular expression, the base form transformation is the default behavior.

The syntax for extraction rules is:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[BASE]
    }
}

The syntax for tagging rules is:

SCOPE scopeOption
{
    TAGGER(tagLevel)
    {
        @tag[attribute]|[BASE]
    }
}

The concept of base form changes depending on the value matched by the sub-condition being contained in the Knowledge Graph or not.
If contained in the Knowledge Graph, the base form will be the singular form for nouns (e.g., children will be transformed into child), the bare infinitive for verbs (for example went, goes and going will be transformed into go), the positive form for adverbs and adjectives (for example easier will be transformed into easy) and the most significant form for proper nouns.

Consider the following example:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[TYPE(NOU, VER)]|[BASE]
    }
}

The purpose of this rule is to extract nouns and verbs (TYPE (NOU, VER)) and to transform these values into their base form.

If the rule above is run against the following sentence:

Emergency teams battled more than 130 fires across New South Wales.

the following transformations will take place:

Token text Type Final field value
emergency teams Noun emergency team
battled Verb battle
fires Noun fire

Transformation of unknown tokens

If the matched token is not contained in the Knowledge Graph, a further distinction will be applied based on whether or not a virtual supernomen has been assigned to the token.

In the first scenario, the base form returned depends on the type of entity and the text content.
Consider this rule:

SCOPE SENTENCE
{
    IDENTIFY(PERSON)
    {
        @Person[TYPE(NPH)]|[BASE]
    }
}

It is meant to extract people's names and return their base form.
Now consider this text:

To stand your ground in the face of relentless criticism from a double Nobel prize-winning scientist takes a lot of guts. For engineer and materials scientist Dan Shechtman, however, years of self-belief in the face of the eminent Linus Pauling's criticisms led him to the ultimate accolade: his own Nobel prize.
Shechtman was the sole winner of the Nobel prize for chemistry in 2011, for his discovery of seemingly impossible crystal structures in metal alloys.

Two elements, Dan Shechtman and Linus Pauling, are not contained in the Knowledge Graph, it can be said that they are "unknown".
They are both disambiguated as people's names so the corresponding output tokens have their meaning set to syncon ID 78452 (person). This is matched by operand TYPE(NPH) so the condition is met, the rule is triggered and the field @Person is extracted.
The first entity appears in two different forms, Dan Shechtman and Shechtman. Given the context in which they appear, the disambiguator deduces that they refer to the same entity and chooses Dan Shechtman as the most significant form for both tokens. This representation is used as the base form and returned as the outcome of the transformation.

If the same rule is applied to this text:

Shechtman was born in Tel Aviv in 1941 and received his PhD from Technion, the Israel Institute of Technology in Haifa, in 1972.

Shechtman again will be recognized as a person's name, but unlike the first sample sentence, only one form is contained in the text and no "more significant" form exists. The base form is then Shechtman, which is identical to the matched text.

For the same reason, in a scenario where the extracted value is not contained in the Knowledge Graph and is not assigned a virtual supernomen, the same (the matched text) would be returned by the TEXT transformation.