Skip to content

extraction property

result.match_info.rules.extraction is an array containing the results of extraction.
Each array item represents an extraction record and has the following properties:

Property Description
template Extraction's template
fields Extraction's fields

fields is an array. Each item represents a template's field and has the following properties:

Property Description
field Field name
value Field value
instance Field instances
confidence Field confidence score

instance is an array. Each item represents an instance of the field and has the following properties:

Property Description
group_by When two instances of different fields of the same record have the same value for this property, they must be considered as an aggregate
text Field instance text
pos Zero based position of the field instance text
len Length of the field instance text
snt Sentence number
snt_begin Sentence initial position in the text
snt_end Sentence final position in the text
syncon Syncon ID
ancestor Ancestor ID
rule_details Rule details
confidence Instance confidence score

rule_details is an array. Its items have the following properties:

Property Description
id Rule ID is a rule identification number created during the project building. It is a compiled rule index of an array where the rules are placed. It changes after every building.
label Rule label, if any

For example, consider the following text:

BMW released Tuesday the details of an electric concept car, with production of the vehicle expected to start in 2021.
In an interview with CNBC Tuesday, CEO Oliver Zipse described the BMW Concept i4 vehicle as bringing "electromobility to the heart of the BMW brand".
The firm is one of several major manufacturers developing an electric vehicle offering to challenge electric car makers like Tesla.

and the rule:

SCOPE SENTENCE
{
    IDENTIFY(BRANDS)
    {
        @BRAND[ANCESTOR(376882)] //@SYN: #376882# [tag_all_brands]
    }
}

the extraction property has the following JSON serialization:

`extraction": [
        {
          "template": "BRANDS",
          "fields": [
            {
              "field": "BRAND",
              "value": "BMW",
              "instance": [
                {
                  "group_by": 0,
                  "text": "BMW",
                  "rule_details": [
                    {
                      "id": 1,
                      "label": ""
                    }
                  ],
                  "pos": 0,
                  "len": 3,
                  "snt": 1,
                  "snt_begin": 0,
                  "snt_end": 117,
                  "syncon": 1039566,
                  "ancestor": -1
                }
              ]
            }
          ],
        },
        {
          "template": "BRANDS",
          "fields": [
            {
              "field": "BRAND",
              "value": "Tesla (Veicoli)",
              "instance": [
                {
                  "group_by": 1000000,
                  "text": "Tesla",
                  "rule_details": [
                    {
                      "id": 1,
                      "label": ""
                    }
                  ],
                  "pos": 394,
                  "len": 5,
                  "snt": 3,
                  "snt_begin": 269,
                  "snt_end": 399,
                  "syncon": 1001728,
                  "ancestor": -1
                }
              ]
            }
          ],

In that context, the following code:

function onFinalize(result) {
    var extractionsCount = result.match_info.rules.extraction.length;
    var extraction;
    var fieldsCount;

    for (i=0; i < extractionsCount; i++)
    {
        extraction = result.match_info.rules.extraction[i];

        fieldsCount = extraction.fields.length;

        for(j=0; j < fieldsCount; j++)
        {
            if(extraction.fields[j].field == "BRAND" && extraction.fields[j].value == "Tesla (Veicoli)")
            {
                extraction.fields[j].value = "Tesla (Vehicles)";
            }
        }
    }

    return result;
}

changes the value of the BRAND field from Tesla (Veicoli) to Tesla (Vehicles). The figures below show extraction results as they appear in Studio without and with the manipulation.