Transformation overview
By default, the value of extracted fields and tag instances can be exactly the portion of text that matches the field-prefixed or tag-prefixed operand of the rule condition or a normalized version of that text.
With an optional transformation, instead, fields and tag instances are set to an attribute of the matched token, to one the the text divisions the token belongs to or to an otherwise transformed version of the text.
The name of the optional transformation must be written between square parentheses, prefixed by a pipe (|
), at the end of the field-prefixed or tag-prefixed operand of the rule's condition. For example, for extraction rules:
@fieldName[operand]|[transformation]
and for tagging rules:
@tagName[operand]|[transformation]
Possible transformations are:
BASE
ATOM
TEXT
TOKEN
ENTRY
SMARTENTRY
SYNCON
PHRASE
CLAUSE
SENTENCE
PARAGRAPH
SEGMENT
SECTION
EXTENSION
SEQUENCE
SECTOR
NORM
TAG
TAGENTRY
SCRIPT
TITLETEXT
Transformation names are language keywords and must be written in uppercase.
For example, this extraction rule:
SCOPE SENTENCE
{
IDENTIFY(DOGS)
{
@BREED[LEMMA("rottweiler")]|[SYNCON]
}
}
applied to this text:
Rottweilers are beautiful.
will extract:
Template | Field | Value |
---|---|---|
DOGS | BREED | 100039093 |
so the field value will be the syncon ID attribute of the token matched by the LEMMA("rottweiler") operand.
Likewise, if the same transformation is used in this tagging rule:
SCOPE SENTENCE
{
TAGGER()
{
@DOG_BREED[LEMMA("rottweiler")]|[SYNCON]
}
}
when the rule is applied to the same text as above the instance of tag DOG_BREED will be set to 100039093.
Transformations are described in the next articles. Examples are based on extractions rules, but they work in the same way in tagging rules.
Info
Transformations affect values, not the position of extracted fields and tag instances, that continue to be those of the text matched by the operand.