ATOM
ATOM
transforms what is matched by the operand into the base form of the atoms that are matched. It is suggested to use this transformation only with the KEYWORD
attribute.
The syntax for extraction rules is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
@field[attribute]|[ATOM]
}
}
The syntax for tagging rules is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
@tag[attribute]|[ATOM]
}
}
This transformation is based on the concept of atom, one of the subdivisions of the input text resulting from the disambiguation process. An atom is an indivisible particle; in expert.ai terminology it is the smallest linguistic unit which is able to convey meaning.
Generally, rules act on the word level of disambiguation which includes single or composite terms resulting from the semantic analysis of texts.
At the word level, it's possible to have tokens corresponding to lemmas contained in the Knowledge Graph (such as emergency team), sequences of words recognized as entities like Sept. 15, 2008, recognized as a date, and multi-word proper nouns like. Lehman Brothers Holdings Inc..
The atom level, on the other hand, contains only single word tokens; it can be considered as a primitive disambiguation output that precedes word aggregation.
The KEYWORD
attribute matches tokens at the atom level.
The ATOM
transformation sets the field or the tag to the base form of the atom matched by the operand. To fully understand it, it's useful to compare it with the BASE
transformation.
Consider this sample rule:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
@FIELD1[KEYWORD("emergency")]|[BASE]
}
}
The purpose of this rule is to extract the base form of anything matched by keyword emergency.
In the following sentence:
Emergency teams battled more than 130 fires across New South Wales.
Emergency teams is recognized as an inflection of lemma emergency team. It is composed of two atoms:
- Emergency
- teams
The rule above is triggered by atom Emergency that's matched by KEYWORD("emergency")
.
While the KEYWORD
attribute matches tokens at the atom level, the BASE
transformation operates at the word level, as if the operand had matched the upper level token, so @FIELD1 is set with the base form of the word level token Emergency teams therefore with emergency team.
If the rule condition is changed as follows:
@FIELD1[KEYWORD("emergency")]|[ATOM]
the rule will be triggered for the same reason, but the transformation takes place at the atom level, so the base form of the atom matched by the operand (emergency) will be used to set the field.