Manipulating extraction
Overview
You can manipulate the extraction output with some built-in functions.
These functions can be directly invoked in the onFinalize function or you can use them in functions defined by you which you then invoke in the onFinalize function.
They are:
addRecordaddFieldremoveRecordaddFieldToRecordupdateFieldupdateFieldByIndexremoveFieldaddInstanceupdateInstanceupdateInstanceByIndexremoveInstancevalidate_template_namevalidate_field_name
Example
If a field value contains comma separated strings (for example: cabbage, onion, soy sauce) and you want to replace it with as many fields as the strings, you can use the following code in which addRecord is used in combination with addField and removeRecord.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | |
addRecord
addRecord adds a new record to the extraction output. See the example above.
The syntax is:
addRecord(extraction, templateName, fields)
where:
extractionis the array containing extraction results.templateNameis the name of the record template.fieldsis an array of field objects.
The function returns true if the record is added, false if:
extractionis not an array.fieldsis not an array or it is empty.- The template name is not valid.
- One of the field names is not valid.
addField
addField adds a new field to an array of field objects. See the example above.
The syntax is:
addField(fields, fieldName, fieldValue, instances[, confidence])
where:
fieldsis an array of extraction fields.fieldNameis the field name.fieldValueis the field value.instancesis an array containing field instances. It can be empty.confidenceis the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the field is added, false if:
fieldsis not an array.instancesis not an array.- The field name is not valid.
removeRecord
removeRecord removes a record from the extraction output. See the example above.
The syntax is:
removeRecord(extraction, index)
where:
extractionis the array containing extraction results.indexis the index of the record to remove.
The function returns true if the record is removed, false if:
extractionis not an array.- The index is out of range.
addFieldToRecord
addFieldToRecord adds a new field to a record.
The syntax is:
addFieldToRecord(record, field name, fieldValue, instances[, confidence])
where:
recordis a record object.field nameis the field name.fieldValueis the field value.instancesis an array containing field instances, it can be empty.confidenceis the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the field is added, false if:
recordis not an object.instancesis not an array.- The field name is not valid.
updateField
updateField updates a field name or value.
For example, if new_values is a two-dimensional array containing old and new values for a given field, you can use updateField like shown in the code below to replace the value of extracted fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | |
The syntax is:
updateField(object, property, newValue)
where:
objectis the field object.propertyis the name of the property to update. Use:- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValueis the new value ofpropertyproperty.
The function returns true if the field is updated, false if:
objectis an invalid object.propertyis invalid.- The new value is invalid.
updateFieldByIndex
updateFieldByIndex updates a field name or value using field index inside the list of record fields.
It is similar to updateField. In the example for updateField replace line 21 with:
updateFieldByIndex(result.match_info.rules.extraction[k], x, "value", new_field_value);
to see how it works.
The syntax is:
updateFieldByIndex(record, index, property, newValue)
where:
recordis the record object.indexis the index inside the record of the field to update.propertyis the name of the property to update. Use:- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValueis the new value ofpropertyproperty.
The function returns true if the field is updated, false if:
recordis an invalid object.propertyis invalid.- The new value is invalid.
removeField
removeField removes a field from a record. If the record only has one field, use removeRecord.
The syntax is:
removeField(record, index)
where:
recordis the record object.indexis the index inside the record of the field to remove.
The function returns true if the record is removed, false if:
recordis an invalid object.- The index is out of range.
addInstance
addInstance adds a new instance to a field.
The syntax is:
addInstance(field, text, group by, position, length, sentence, sentenceBeginning, sentenceEnd, syncon, ancestor[, confidence])
where:
fieldis a field object.textis the instance text.group byis the number of the group the instance belongs to.positionis the zero-based index of the first character of the instance text.lengthis the instance length.sentenceis the zero-based index of the sentence containing the instance.sentenceBeginningis the zero-based index of the first character of the sentence.sentenceEndis the zero-based index of the first character after the sentence.synconis the numeric ID of the concept expressed by the instance text. Set it to -1 in case of no related concept.ancestoris the numeric ID of the ancestor of the concept expressed by the instance text. Set it to -1 in case of no related concept or no ancestor.confidenceis the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the instance is added, false if field is an invalid object.
updateInstance
updateInstance updates a field instance.
The syntax is:
updateInstance(instance, property, newValue)
where:
instanceis the instance object.propertyis the name of the instance property to update. It can be:- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValueis the new value forpropertyproperty.
The function returns true if the instance is updated, false if:
instanceis an invalid object.- The property name is invalid.
updateInstanceByIndex
updateInstanceByIndex updates an instance object using its index inside the list of the field instances.
The syntax is:
updateInstanceByIndex(field, index, property, newValue)
where:
fieldis the field object.indexis the index of the instance to update.propertyis the name of the instance property to update. It can be:- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValueis the new value forpropertyproperty.
The function returns true if the instance is updated, false if:
fieldis an invalid object.- The property name is invalid.
removeInstance
removeInstance removes an instance from a field.
The syntax is
removeInstance(field, index)
where:
fieldis the field objectindexis the index of the instance to remove.
The function returns true if the instance is removed, false if:
fieldis an invalid object.- The index is out of range.
validate_template_name
validate_template_name checks if a template is defined in the project.
The syntax is:
validate_template_name(templateName)
where templateName is the template name to validate.
The function returns true if the template name is defined in the project, false otherwise.
validate_field_name
validate_field_name checks if a field is defined in a project.
The syntax is:
validate_field_name(fieldName, templateName)
where:
fieldNameis the string containing the field name to validate.templateNameis the string containing the template name, it can be empty.
The function returns true:
- If the field is defined in the
templateNametemplate. - If the
templateNameis an empty string and the field is defined in any template (that is, there is an occurrence in a template).