Field attributes
Overview
Within the template definition, field names can optionally be followed by an attribute—not to be confused with the attributes of the text tokens used in categorization, extraction and tagging rules.
The attribute characterizes fields with special features.
The syntax is:
@fieldName(attribute)
Possible field attributes are:
Attribute | Meaning |
---|---|
C |
Cardinal |
S |
Solitary |
V |
Validating |
Attributes are language keywords and must be typed in uppercase.
Cardinal field
The C
attribute is used to mark the cardinal (i.e., fundamental) field in a template.
It will affect the MERGE
option causing the merging of two or more simple records, if they contain the same value for the cardinal field.
The attribute must then be used in conjunction with the MERGE
option and by-rule aggregation has to be used to link the cardinal field to non-cardinal fields.
For example, consider the following template and extraction rules:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Telephone,
@Address
}
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
AND
@Telephone[ANCESTOR(29700)]// 29700: phone number
}
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
AND
@Address[TYPE(ADR)]
}
}
Both rules exhibit the so called "by-rule aggregation", i.e. they give value to more than one field to create multi-field records. In this case, the records are incomplete with respect to the template, because the template has three fields.
If the rules are run against this text:
Doug Smith lives at 1540 Chicago Avenue, Baltimore and his number is 555-234-567.
the first rule will be triggered by Doug Smith and 555-234-567 and the second by Doug Smith and 1540 Chicago Avenue, Baltimore, so the first rule will generate this record:
Template: PERSONAL_DATA
@Name | @Telephone |
---|---|
Doug Smith | 555-234-567 |
and the second will generate this one:
Template: PERSONAL_DATA
@Name | @Address |
---|---|
Doug Smith | 1540, Chicago Avenue - Baltimore1 |
However, if the template is changed in this way:
TEMPLATE(PERSONAL_DATA)
{
@Name(C),
@Telephone,
@Address
MERGE WHEN DOCUMENT
}
the @Name field has been marked as cardinal and the MERGE
option has been added.
The effect of this change can be seen in the table below: the two simple records are merged in one compound record, because they both contain the cardinal field and the value of the cardinal field is the same. After the merge, the simple contributing records are discarded.
Template: PERSONAL_DATA
@Name | @Telephone | @Address |
---|---|---|
Doug Smith | 555-234-567 | 1540, Chicago Avenue - Baltimore |
Non-cardinal fields must be combined with the cardinal field in the extraction rules to make their relationship explicit. Typically, rules are written to extract pairs of fields such as:
Cardinal + Non-Cardinal #1 , Cardinal + Non-Cardinal #2.
To summarize, in order to merge several records which have the value of one field in common:
- That field must be declared as cardinal.
- The
MERGE
option must be activated. - Extraction rules must extract the cardinal field and one or more non-cardinal fields.
Solitary field
If the attribute S
is appended to one or more fields in a template, it will inhibit the normal "bundling" mechanism acting on the extraction records. This process is responsible for returning a single extraction value when the same token is identified multiple times, in different positions, inside a single document. In other words, tokens identified as being the same entity or concept are usually grouped together, and a single value is returned representing all instances.
For example, consider the same text, the following template and extraction rule:
TEMPLATE(PERSONAL_DATA)
{
@Name
}
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
}
}
It will generate records containing only human proper names (TYPE(NPH))
.
Instead, if the field @Name is marked as solitary (S):
TEMPLATE(PERSONAL_DATA)
{
@Name(S)
}
the engine will generate two separate records for each instance of the name.
@Name |
---|
Doug Smith |
Doug Smith |
Please note how the second instance matches his in the text, but returns Doug Smith thanks to Studio anaphora recognizer capabilities.
Validating field
If the attribute V
is appended to one or more fields in a template, it generates a record only if the extraction output contains values for all the fields identified as validating. For every processed document, all the V
fields must extract at least a value for the whole record to be validated. If this condition is not verified, then the whole record is discarded. For example, consider this template:
TEMPLATE (PERSONAL_DATA)
{
@Name,
@Telephone,
@Address
MERGE WHEN SENTENCE
}
If these extraction rules:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
}
IDENTIFY(PERSONAL_DATA)
{
@Telephone[ANCESTOR(29700)]// 29700: phone number
}
IDENTIFY(PERSONAL_DATA)
{
@Address[TYPE(ADR)]
}
}
are applied to this text:
My friend lives at 1540 Chicago Avenue, Baltimore and his number is 555-234-567.
you will get this record:
Template: PERSONAL_DATA
Field | Value |
---|---|
Telephone | 555234567 |
Address | 1540, Chicago Avenue |
If the @Name field is marked as validating, like this:
TEMPLATE (PERSONAL_DATA)
{
@Name(V),
@Telephone,
@Address
MERGE WHEN SENTENCE
}
with the same rules applied to the same text, no record is available, because the document does not contain human proper nouns—associated to the @Name field—considered as validating for the record generation.
-
Note that the field value is slightly different from the text. This is because the values of typed entities go through a default normalization process. ↩