Field attributes
Overview
Within the template definition, fields names can optionally be followed by one or more attributes—not to be confused with the attributes used in categorization and extraction rules.
These attributes are used to characterize fields that have special features.
The syntax is:
@fieldName[attribute1, ...]
Possible field attributes are:
Attribute | Meaning |
---|---|
C |
Cardinal |
S |
Solitary |
Attributes are language keywords and must be typed in uppercase. If several attributes are specified, they will have to be separated by commas.
Cardinal field
The C
attribute is used to mark the cardinal (i.e., fundamental) field in a template.
It will affect the MERGE
option causing the merging of two or more simple records, if they contain the same value for the cardinal field.
The attribute must then be used in conjunction with the MERGE
option and by-rule aggregation has to be used to link the cardinal field to non-cardinal fields.
For example, consider the following template and extraction rules:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Telephone,
@Address
}
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
AND
@Telephone[ANCESTOR(29700)]// 29700: phone number
}
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
AND
@Address[TYPE(ADR)]
}
}
Both rules exhibit the so called "by-rule aggregation", i.e. they give value to more than one field to create multi-field records. In this case, the records are incomplete with respect to the template, because the template has three fields.
If the rules are run against this text:
Doug Smith lives at 1540 Chicago Avenue, Baltimore and his number is 555-234-567.
the first rule will be triggered by Doug Smith and 555-234-567 and the second by Doug Smith and 1540 Chicago Avenue, Baltimore, so the first rule will generate this record:
Template: PERSONAL_DATA
@Name | @Telephone |
---|---|
Doug Smith | 555-234-567 |
and the second will generate this one:
Template: PERSONAL_DATA
@Name | @Address |
---|---|
Doug Smith | 1540, Chicago Avenue - Baltimore1 |
However, if the template is changed in this way:
TEMPLATE(PERSONAL_DATA)
{
@Name(C),
@Telephone,
@Address
MERGE WHEN DOCUMENT
}
the @Name field has been marked as cardinal and the MERGE
option has been added.
The effect of this change can be seen in the table below: the two simple records are merged in one compound record, because they both contain the cardinal field and the value of the cardinal field is the same. After the merge, the simple contributing records are discarded.
Template: PERSONAL_DATA
@Name | @Telephone | @Address |
---|---|---|
Doug Smith | 555-234-567 | 1540, Chicago Avenue - Baltimore |
Non-cardinal fields must be combined with the cardinal field in the extraction rules to make their relationship explicit. Typically, rules are written to extract pairs of fields such as:
Cardinal + Non-Cardinal #1 , Cardinal + Non-Cardinal #2.
To summarize, in order to merge several records which have the value of one field in common:
- That field must be declared as cardinal.
- The
MERGE
option must be activated. - Extraction rules must extract the cardinal field and one or more non-cardinal fields.
Solitary field
If the attribute S
is appended to one or more fields in a template, it will inhibit the normal "bundling" mechanism acting on the extraction records. This process is responsible for returning a single extraction value when the same token is identified multiple times, in different positions, inside a single document. In other words, tokens identified as being the same entity or concept are usually grouped together, and a single value is returned representing all instances.
For example, consider the same text, the following template and extraction rule:
TEMPLATE(PERSONAL_DATA)
{
@Name
}
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
}
}
It will generate records containing only human proper names (TYPE(NPH))
.
Instead, if the field @Name is marked as solitary (S):
TEMPLATE(PERSONAL_DATA)
{
@Name(S)
}
the engine will generate two separate records for each instance of the name.
@Name |
---|
Doug Smith |
Doug Smith |
Please note how the second instance matches his in the text, but returns Doug Smith thanks to Studio anaphora recognizer capabilities.
-
Note that the field value is slightly different from the text. This is because the values of typed entities go through a default normalization process. ↩