Manipulating extraction
Overview
You can manipulate the extraction output with some built-in functions.
These functions can be directly invoked in the onFinalize
function or you can use them in functions defined by you which you then invoke in the onFinalize
function.
They are:
addRecord
addField
removeRecord
addFieldToRecord
updateField
updateFieldByIndex
removeField
addInstance
updateInstance
updateInstanceByIndex
removeInstance
validate_template_name
validate_field_name
Example
If a field value contains comma separated strings (for example: cabbage, onion, soy sauce) and you want to replace it with as many fields as the strings, you can use the following code in which addRecord
is used in combination with addField
and removeRecord
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
|
addRecord
addRecord
adds a new record to the extraction output. See the example above.
The syntax is:
addRecord(extraction, templateName, fields)
where:
extraction
is the array containing extraction results.templateName
is the name of the record template.fields
is an array of field objects.
The function returns true if the record is added, false if:
extraction
is not an array.fields
is not an array or it is empty.- The template name is not valid.
- One of the field names is not valid.
addField
addField
adds a new field to an array of field objects. See the example above.
The syntax is:
addField(fields, fieldName, fieldValue, instances[, confidence])
where:
fields
is an array of extraction fields.fieldName
is the field name.fieldValue
is the field value.instances
is an array containing field instances. It can be empty.confidence
is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the field is added, false if:
fields
is not an array.instances
is not an array.- The field name is not valid.
removeRecord
removeRecord
removes a record from the extraction output. See the example above.
The syntax is:
removeRecord(extraction, index)
where:
extraction
is the array containing extraction results.index
is the index of the record to remove.
The function returns true if the record is removed, false if:
extraction
is not an array.- The index is out of range.
addFieldToRecord
addFieldToRecord
adds a new field to a record.
The syntax is:
addFieldToRecord(record, field name, fieldValue, instances[, confidence])
where:
record
is a record object.field name
is the field name.fieldValue
is the field value.instances
is an array containing field instances, it can be empty.confidence
is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the field is added, false if:
record
is not an object.instances
is not an array.- The field name is not valid.
updateField
updateField
updates a field name or value.
For example, if new_values
is a two-dimensional array containing old and new values for a given field, you can use updateField
like shown in the code below to replace the value of extracted fields.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
The syntax is:
updateField(object, property, newValue)
where:
object
is the field object.property
is the name of the property to update. Use:- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValue
is the new value ofproperty
property.
The function returns true if the field is updated, false if:
object
is an invalid object.property
is invalid.- The new value is invalid.
updateFieldByIndex
updateFieldByIndex
updates a field name or value using field index inside the list of record fields.
It is similar to updateField
. In the example for updateField
replace line 21 with:
updateFieldByIndex(result.match_info.rules.extraction[k], x, "value", new_field_value);
to see how it works.
The syntax is:
updateFieldByIndex(record, index, property, newValue)
where:
record
is the record object.index
is the index inside the record of the field to update.property
is the name of the property to update. Use:- field to update field name.
- value to update field value.
- confidence to set the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
newValue
is the new value ofproperty
property.
The function returns true if the field is updated, false if:
record
is an invalid object.property
is invalid.- The new value is invalid.
removeField
removeField
removes a field from a record. If the record only has one field, use removeRecord
.
The syntax is:
removeField(record, index)
where:
record
is the record object.index
is the index inside the record of the field to remove.
The function returns true if the record is removed, false if:
record
is an invalid object.- The index is out of range.
addInstance
addInstance
adds a new instance to a field.
The syntax is:
addInstance(field, text, group by, position, length, sentence, sentenceBeginning, sentenceEnd, syncon, ancestor[, confidence])
where:
field
is a field object.text
is the instance text.group by
is the number of the group the instance belongs to.position
is the zero-based index of the first character of the instance text.length
is the instance length.sentence
is the zero-based index of the sentence containing the instance.sentenceBeginning
is the zero-based index of the first character of the sentence.sentenceEnd
is the zero-based index of the first character after the sentence.syncon
is the numeric ID of the concept expressed by the instance text. Set it to -1 in case of no related concept.ancestor
is the numeric ID of the ancestor of the concept expressed by the instance text. Set it to -1 in case of no related concept or no ancestor.confidence
is the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
The function returns true if the instance is added, false if field
is an invalid object.
updateInstance
updateInstance
updates a field instance.
The syntax is:
updateInstance(instance, property, newValue)
where:
instance
is the instance object.property
is the name of the instance property to update. It can be:- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValue
is the new value forproperty
property.
The function returns true if the instance is updated, false if:
instance
is an invalid object.- The property name is invalid.
updateInstanceByIndex
updateInstanceByIndex
updates an instance object using its index inside the list of the field instances.
The syntax is:
updateInstanceByIndex(field, index, property, newValue)
where:
field
is the field object.index
is the index of the instance to update.property
is the name of the instance property to update. It can be:- text: the instance text.
- group by: the instance group number.
- pos: the instance starting position.
- len: the length of the instances.
- snt: the zero-based index of the sentence containing the instance.
- snt_begin: the zero-based index of the first character of the sentence.
- snt_end: the zero-based index of the first character after the sentence.
- confidence: the optional confidence score for the field. It is a decimal number between 0 and 1. The default value is 1.
- syncon: the numeric ID of the concept expressed by the instance text.
- ancestor: the numeric ID of the ancestor of the concept expressed by the instance text.
newValue
is the new value forproperty
property.
The function returns true if the instance is updated, false if:
field
is an invalid object.- The property name is invalid.
removeInstance
removeInstance
removes an instance from a field.
The syntax is
removeInstance(field, index)
where:
field
is the field objectindex
is the index of the instance to remove.
The function returns true if the instance is removed, false if:
field
is an invalid object.- The index is out of range.
validate_template_name
validate_template_name
checks if a template is defined in the project.
The syntax is:
validate_template_name(templateName)
where templateName
is the template name to validate.
The function returns true if the template name is defined in the project, false otherwise.
validate_field_name
validate_field_name
checks if a field is defined in a project.
The syntax is:
validate_field_name(fieldName, templateName)
where:
fieldName
is the string containing the field name to validate.templateName
is the string containing the template name, it can be empty.
The function returns true:
- If the field is defined in the
templateName
template. - If the
templateName
is an empty string and the field is defined in any template (that is, there is an occurrence in a template).