Manipulating extraction

Overview

You can manipulate the extraction output with some built-in functions.

These functions can be directly invoked in the onFinalize function or you can use them in functions defined by you which you then invoke in the onFinalize function.

They are:

addRecord
addField
removeRecord
addFieldToRecord
updateField
updateFieldByIndex
removeField
addInstance
updateInstance
updateInstanceByIndex
removeInstance
validate_template_name
validate_field_name

Example

If a field value contains comma separated strings (for example: cabbage, onion, soy sauce) and you want to replace it with as many fields as the strings, you can use the following code in which addRecord is used in combination with addField and removeRecord.

function split_records(result, target_template, target_field_name) {

    var records_to_remove = [];

    // Cycle through extraction output to find records with the target template.

    for (var i = 0; i < result.match_info.rules.extraction.length; i++) {
        var template_name = result.match_info.rules.extraction[i].template;

        if (template_name == target_template) {
            // Record found, now look for target fields.

            for (var j = 0; j < result.match_info.rules.extraction[i].fields.length; j++) {
                var field_name = result.match_info.rules.extraction[i].fields[j].field;
                var field_value = result.match_info.rules.extraction[i].fields[j].value;
                var field_instances = result.match_info.rules.extraction[i].fields[j].instance;

                if (field_name == target_field_name && field_value.includes(",")) {
                    // Target field found, split the value, for example "cabbage,onion,soy sauce" -> ["cabbage", "onion", "soy sauce"].

                    var field_split_values = field_value.split(/,/g);

                    // Cycle the strings and create the new records.

                    for (var k = 0; k < field_split_values.length; k++) {
                        // Create an empty array of fields.

                        var new_fields = [];

                        // Add the field with original field's instances.

                        addField(new_fields, field_name, field_split_values[k].trim(), field_instances);

                        // Add the new record.

                        addRecord(result.match_info.rules.extraction, template_name, new_fields);
                    }

                    // Remembers the index of the original record to remove it.

                    records_to_remove.push(i);
                }
            }
        }
    }

    // Removes original records.

    if (records_to_remove.length > 0) {
        for (var i = records_to_remove.length - 1; i >= 0; i--) {
            removeRecord(result.match_info.rules.extraction, i);
        }
    }
}

function onFinalize(result) {
    split_records(result, "MAIN_COURSE", "INGREDIENT")
    return result;
}

addRecord

addRecord adds a new record to the extraction output. See the example above.

The syntax is:

addRecord(extraction, templateName, fields)

where:

extraction is the array containing extraction results.
templateName is the name of the record template.
fields is an array of field objects.

The function returns true if the record is added, false if:

extraction is not an array.
fields is not an array or it is empty.
The template name is not valid.
One of the field names is not valid.

addField

addField adds a new field to an array of field objects. See the example above.

The syntax is:

addField(fields, fieldName, fieldValue, instances[, confidence])

where:

fields is an array of extraction fields.
fieldName is the field name.
fieldValue is the field value.
instances is an array containing field instances. It can be empty.
confidence is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.

The function returns true if the field is added, false if:

fields is not an array.
instances is not an array.
The field name is not valid.

removeRecord

removeRecord removes a record from the extraction output. See the example above.

The syntax is:

removeRecord(extraction, index)

where:

extraction is the array containing extraction results.
index is the index of the record to remove.

The function returns true if the record is removed, false if:

extraction is not an array.
The index is out of range.

addFieldToRecord

addFieldToRecord adds a new field to a record.

The syntax is:

addFieldToRecord(record, field name, fieldValue, instances[, confidence])

where:

record is a record object.
field name is the field name.
fieldValue is the field value.
instances is an array containing field instances, it can be empty.
confidence is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.

The function returns true if the field is added, false if:

record is not an object.
instances is not an array.
The field name is not valid.

updateField

updateField updates a field name or value.

For example, if new_values is a two-dimensional array containing old and new values for a given field, you can use updateField like shown in the code below to replace the value of extracted fields.

function replaceFieldValues(result, target_template, target_field_name, new_values) {
    // Cycle through extraction output to find records with the target template.

    for (var k = 0; k < result.match_info.rules.extraction.length; k++) {
        var template_name = result.match_info.rules.extraction[k].template;
        if (template_name == target_template) {
            // Record found, now find and update fields.

            for (var y = 0; y < new_values.length; y++) {
                var old_field_value = new_values[y][0];
                var new_field_value = new_values[y][1];

                for (var x = 0; x < result.match_info.rules.extraction[k].fields.length; x++) {
                    var field_name = result.match_info.rules.extraction[k].fields[x].field;
                    var field_value = result.match_info.rules.extraction[k].fields[x].value;

                    if (field_name == target_field_name && field_value ==  old_field_value)

                        // Field found, replace its value.

                       updateField(result.match_info.rules.extraction[k].fields[x], "value", new_field_value);
                }
            }
        }
    }
}

function onFinalize(result) {
    var new_values = [["poulet", "chicken"],["bœuf", "beef"],["côte d’agneau", "lamb chop"]];

    replaceFieldValues(result, "MAIN_COURSE", "INGREDIENT", new_values);

    return result;
}

The syntax is:

updateField(object, property, newValue)

where:

object is the field object.
property is the name of the property to update. Use:
- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValue is the new value of property property.

The function returns true if the field is updated, false if:

object is an invalid object.
property is invalid.
The new value is invalid.

updateFieldByIndex

updateFieldByIndex updates a field name or value using field index inside the list of record fields.

It is similar to updateField. In the example for updateField replace line 21 with:

updateFieldByIndex(result.match_info.rules.extraction[k], x, "value", new_field_value);

to see how it works.

The syntax is:

updateFieldByIndex(record, index, property, newValue)

where:

record is the record object.
index is the index inside the record of the field to update.
property is the name of the property to update. Use:
- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValue is the new value of property property.

The function returns true if the field is updated, false if:

record is an invalid object.
property is invalid.
The new value is invalid.

removeField

removeField removes a field from a record. If the record only has one field, use removeRecord.

The syntax is:

removeField(record, index)

where:

record is the record object.
index is the index inside the record of the field to remove.

The function returns true if the record is removed, false if:

record is an invalid object.
The index is out of range.

addInstance

addInstance adds a new instance to a field.

The syntax is:

addInstance(field, text, group by, position, length, sentence, sentenceBeginning, sentenceEnd, syncon, ancestor[, confidence])

where:

field is a field object.
text is the instance text.
group by is the number of the group the instance belongs to.
position is the zero-based index of the first character of the instance text.
length is the instance length.
sentence is the zero-based index of the sentence containing the instance.
sentenceBeginning is the zero-based index of the first character of the sentence.
sentenceEnd is the zero-based index of the first character after the sentence.
syncon is the numeric ID of the concept expressed by the instance text. Set it to -1 in case of no related concept.
ancestor is the numeric ID of the ancestor of the concept expressed by the instance text. Set it to -1 in case of no related concept or no ancestor.
confidence is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.

The function returns true if the instance is added, false if field is an invalid object.

updateInstance

updateInstance updates a field instance.

The syntax is:

updateInstance(instance, property, newValue)

where:

instance is the instance object.
property is the name of the instance property to update. It can be:
- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValue is the new value for property property.

The function returns true if the instance is updated, false if:

instance is an invalid object.
The property name is invalid.

updateInstanceByIndex

updateInstanceByIndex updates an instance object using its index inside the list of the field instances.

The syntax is:

updateInstanceByIndex(field, index, property, newValue)

where:

field is the field object.
index is the index of the instance to update.
property is the name of the instance property to update. It can be:
- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValue is the new value for property property.

The function returns true if the instance is updated, false if:

field is an invalid object.
The property name is invalid.

removeInstance

removeInstance removes an instance from a field.

The syntax is

removeInstance(field, index)

where:

field is the field object
index is the index of the instance to remove.

The function returns true if the instance is removed, false if:

field is an invalid object.
The index is out of range.

validate_template_name

validate_template_name checks if a template is defined in the project.

The syntax is:

validate_template_name(templateName)

where templateName is the template name to validate.

The function returns true if the template name is defined in the project, false otherwise.

validate_field_name

validate_field_name checks if a field is defined in a project.

The syntax is:

validate_field_name(fieldName, templateName)

where:

fieldName is the string containing the field name to validate.
templateName is the string containing the template name, it can be empty.

The function returns true:

If the field is defined in the templateName template.
If the templateName is an empty string and the field is defined in any template (that is, there is an occurrence in a template).