Skip to content

Field attributes

Overview

Within the template definition, field names can optionally be followed by an attribute—not to be confused with the attributes of the text tokens used in categorization, extraction and tagging rules.
The attribute characterizes fields with special features.

The syntax is:

@fieldName(attribute)

Possible field attributes are:

Attribute Meaning
C Cardinal
S Solitary
V Validating

Attributes are language keywords and must be typed in uppercase.

Cardinal field

The C attribute is used to mark the cardinal (i.e., fundamental) field in a template.
It will affect the MERGE option causing the merging of two or more simple records, if they contain the same value for the cardinal field.
The attribute must then be used in conjunction with the MERGE option and by-rule aggregation has to be used to link the cardinal field to non-cardinal fields.

For example, consider the following template and extraction rules:

TEMPLATE(PERSONAL_DATA)
{
    @Name,
    @Telephone,
    @Address
}

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]
        AND
        @Telephone[ANCESTOR(29700)]//  29700: phone number
    }

    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]
        AND
        @Address[TYPE(ADR)]
    }
}

Both rules exhibit the so called "by-rule aggregation", i.e. they give value to more than one field to create multi-field records. In this case, the records are incomplete with respect to the template, because the template has three fields.
If the rules are run against this text:

Doug Smith lives at 1540 Chicago Avenue, Baltimore and his number is 555-234-567.

the first rule will be triggered by Doug Smith and 555-234-567 and the second by Doug Smith and 1540 Chicago Avenue, Baltimore, so the first rule will generate this record:

Template: PERSONAL_DATA

@Name @Telephone
Doug Smith 555-234-567

and the second will generate this one:

Template: PERSONAL_DATA

@Name @Address
Doug Smith 1540, Chicago Avenue - Baltimore[^1]

However, if the template is changed in this way:

TEMPLATE(PERSONAL_DATA)
{
    @Name(C),
    @Telephone,
    @Address

    MERGE WHEN DOCUMENT
}

the Name field has been marked as cardinal and the MERGE option has been added.
The effect of this change can be seen in the table below: the two simple records are merged in one compound record, because they both contain the cardinal field and the value of the cardinal field is the same. After the merge, the simple contributing records are discarded.

Template: PERSONAL_DATA

@Name @Telephone @Address
Doug Smith 555-234-567 1540, Chicago Avenue - Baltimore

Non-cardinal fields must be combined with the cardinal field in the extraction rules to make their relationship explicit. Typically, rules are written to extract pairs of fields such as:

Cardinal + Non-Cardinal #1 , Cardinal + Non-Cardinal #2.

To summarize, in order to merge several records which have the value of one field in common:

  • That field must be declared as cardinal.
  • The MERGE option must be activated.
  • Extraction rules must extract the cardinal field and one or more non-cardinal fields.

Solitary field

Attribute S (solitary) inhibits the default bundling of records. This process creates a single record even if a field is extracted multiple times. The occurrences of the field are stored inside the record as field instances or "hits". With the solitary attribute, the extractions do not coalesce and a separate record is generated for each of them.

For example, consider this text:

Doug Smith lives in Baltimore, he has a barber shop there.

the following template definition:

TEMPLATE(PERSONAL_DATA)
{
    @Name
}

and this extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]
    }
}

It will generate one PERSONAL_DATA record with field Name set to Doug Smith and two instances, one due to Dough Smith and the other due to he anaphora.

Instead, if field Name has the S attribute:

TEMPLATE(PERSONAL_DATA)
{
    @Name(S)
}

the rule will generate two separate PERSONAL_DATA records, one for each hit (Doug Smith and the he anaphora), both with field Name set to Doug Smith.

Validating field

The V attribute marks a field as validating. Records are generated only if all the validating fields are extracted.
For example, consider this template:

TEMPLATE (PERSONAL_DATA)
{
    @Name,
    @Telephone,
    @Address

    MERGE WHEN SENTENCE
}

If these extraction rules:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]
    }

    IDENTIFY(PERSONAL_DATA)
    {
        @Telephone[ANCESTOR(29700)]//  29700: phone number
    }

    IDENTIFY(PERSONAL_DATA)
    {
        @Address[TYPE(ADR)]
    }
}

are applied to this text:

My friend lives at 1540 Chicago Avenue, Baltimore and his number is 555-234-567.

you will get this record:

Template: PERSONAL_DATA

Field Value
Telephone 555234567
Address 1540, Chicago Avenue

If the Name field is marked as validating, like this:

TEMPLATE (PERSONAL_DATA)
{
    @Name(V),
    @Telephone,
    @Address

    MERGE WHEN SENTENCE
}

with the same rules applied to the same text, no record is generated, because field Name is never extracted.