jsonPlug
Overview
jsonplug is a post-processor that leverages the JsonPath query language to make simple changes to the output object.
With this module it is possible to delete, add and modify categories and extracted records.
The module depends on the jsonpath module which must therefore be installed too.
When in Studio you install the jsonplug module in your project, Studio modifies the main.jr file to insert this statement at the beginning of the file:
var jsonPlug = require('modules/jsonPlug');
The statement above sets a variable with an instance of the module so that you can use it in all event handling functions.
This module has to be invoked in the onFinalize
function, where the results object is available. It only has one method, called jsonplug
, whose syntax is:
moduleVariable.jsonPlug(result, action, jsPathConditionFlag, jsPath, jsPathAction, recursive, values, skipNameValidation)
where:
moduleVariable
is the variable corresponding to the module and set withrequire()
.result
is the object containing the analysis results.-
action
is the action to perform on the results object. The available values of this parameter are:- delete
- delete template
- delete record
- add field
- add template
- add record
- clone
- clone new
- clone value
- clone instances
- add category
- modify
- modify regex
- apply math
Actions are described below.
-
jsPathConditionFlag
is either a boolean or a string. In the first case, it can be:true
: the action is applied only if thejspath
expression selects one or more nodes.false
: the action is applied only if thejspath
expression doesn't select any nodes.
In the second case, it is a string having the following format:
count operator integer
where:
operator
can be:- >
- <
- <=
- >=
- =
- ==
integer
is a non-negative integer number.
Or, it can be a string with the following format:
multiple
where
multiple
is used to process multiple JSONPaths—each one needing atrue
/false
boolean—in order to validate the condition to perform an action (see the example in modify).jsPathConditionFlag
is a condition relating to the number of nodes—the count—selected byjspath
. The action is applied if the condition is verified.For example, if
jsPathConditionFlag
is:count > 3
it means that
jsPath
must identify at least three nodes for the action to be applied. -
jsPath
is the JSONPath expression that, in combination withjsPathAction
(except for the case when alsojsPathAction
is a full JSONPath expression), determines the nodes to which the action must be applied.Info
Several useful resources are available on the Web to get more information about JSONPath syntax and to test JSONPath expressions.
-
jsPathAction
is the JSONPath expression that, alone or in combination withjsPath
, determines where the action must be applied. Its value can be:- #this# or .: the action is applied to the nodes selected by
jspath
. - ^: the action is applied to the parent nodes of the nodes selected by
jspath
. ^ relativepath
: the action is applied to the sibling nodes of the nodes selected byjspath
that matchrelativepath
.- a full JSONPath: the action is applied to the nodes selected by
jspathaction
ifjspathcondition
is satisfied.
- #this# or .: the action is applied to the nodes selected by
recursive
is a boolean indicating if the action has to be applied to the first or to all of the nodes selected byjsPath
and/orjsPathAction
.values
is an array containing the specification of the action. The meaning of the array items and their order vary according to the action.skipNameValidation
doesn't apply the template and field validation if set totrue
. This optional flag is used for modify and modify regex.
Note
For modify, values
can also be a string.
Actions
The following paragraphs will describe the possible values of the action
parameter. The value of the values
parameter is interpreted based upon the value of action
.
delete
Use delete to delete extraction fields, a record or a category.
For example, consider this template:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Phone_number,
@Address,
@Job,
@Type_of_job
}
If this rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Phone_number[TYPE(PHO)]
}
}
is applied to the following input text:
Stephen King's number is 0000000000.
you will get this output:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "delete", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, "");
return result;
}
you will get:
As you can see, the Name field belonging to the PERSONAL_DATA record was deleted.
The values
parameter is ignored by this action and can be left empty.
delete template and delete record
Use delete template or delete record to delete whole records.
For example, consider this template:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Phone_number,
@Address,
@Job,
@Type_of_job
}
If the rule used for delete is applied to the same input text as above, with this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "delete template", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, "");
return result;
}
or with this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "delete record", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, "");
return result;
}
all the PERSONAL_DATA records containing a Name field are deleted.
The values
parameter is ignored by this action and can be left empty.
add field
Use add field to add a field to a record.
For example, consider this template:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Age,
@Date_of_birth,
@Phone_number,
@Address,
@Job,
@Type_of_job
}
If this rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Job[LEMMA("writer")]
<1:3>
LEMMA("comic")
}
}
is applied to the following input text:
Alan Moore is considered as the best writer of comics ever.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "add field", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, ["Age", "68"]);
return result;
}
you will get:
As you can see, the Age field with a value of 68 was added to the PERSONAL_DATA record.
The contents of the values
array must be:
fieldName, fieldValue
where:
fieldName
is the field name.fieldValue
is the field value.
add template and add record
Use add template or add record to create a new record of an existing template.
For example, consider these templates:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Phone_number,
@Address,
@Age,
@Job
}
TEMPLATE(COMPANY)
{
@Location,
@Name
}
If the following rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Job[LEMMA("knowledge engineer")]
}
}
is applied to this input text:
Jonathan works as a knowledge engineer.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "add template", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, ["COMPANY", "Location", "Rovereto", "Name", "Expert.ai"]);
return result;
}
or with this one:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "add record", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Name')]", "#this#", true, ["COMPANY", "Location", "Rovereto", "Name", "Expert.ai"]);
return result;
}
you will get:
As you can see, a new COMPANY record was added, having the Location field set to Rovereto and the Name field set to Expert.ai.
The contents of the values
array must be:
templateName, field1Name, field1Value [, field2Name, field2Value [, ... fieldnName, fieldnValue]]
where:
templateName
is the template name.field#Name
is the name of a field.field#Value
is the value of that field.
The first three items are mandatory in order to define at least one field. The other items, when present, have to be added in couples to define additional fields and their values.
Note
If jsPathConditionFlag
is set to false, jsPathAction
must be $..extraction
.
clone
Use clone to clone extraction fields and optionally modify the cloned values.
For example, consider this template:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Phone_number,
@Address,
@Age,
@Job,
@Type_of_job
}
If the following rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Job[LEMMA("software engineer")]
}
}
is applied to this input text:
Jane works as a software engineer.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "clone", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Job')]", "#this#", true, ["Type_of_job", "no tokens"]);
return result;
}
you will get:
As you can see, the the Job field was cloned into the Type_of_job field.
In a common use case, the new field is then processed to change its value. You can see an example of this in the description of the modify action.
The contents of the values
array must be:
fieldName, cloneOption[, regularExpression, replacementString]
Note
The parts in square brackets are optional.
where:
fieldName
is either the new field name or an empty string meaning that the cloned filed will have the name of the source field.cloneOption
can be:- no tokens: the new field will have no references to the rule(s) that determined the extraction and to the text that triggered the rule(s).
- clone from source or an empty string: the new field has the same information—in terms of triggered rules and triggering text—of the source field.
- clone from sibling: for cases in which
jspathaction
is used to select sibling nodes, the new field has the same information, in terms of triggered rules and triggering text, of the field selected byjspathaction
.
regularExpression
is the regular expression that determines the parts of the node value to change where placeholders like$1
,$2
, etc. can be used to refer to capturing group.replacementString
is the replacement string.
clone new
Use clone new to create a record of a predefined template containing a clone of an existing field.
For example, consider these templates:
TEMPLATE(ATHLETES)
{
@Name,
@Sport_discipline
}
TEMPLATE(OLYMPIC_CHAMPIONS)
{
@Proper_name
}
If this rule:
SCOPE SENTENCE
{
IDENTIFY(ATHLETES)
{
@Name[TYPE(NPH)]
<>
@Sport_discipline[LEMMA("swimmer")]
}
}
is applied to the following input text:
Federica Pellegrini is one of the best swimmers of all time.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "clone new", true, "$..extraction[?(@.template == 'ATHLETES')]..fields[?(@.field == 'Name')]", "#this#", true, ["OLYMPIC_CHAMPIONS", "Proper_name", "no tokens"]);
return result;
}
you will get:
As you can see, a new OLYMPIC_CHAMPIONS record was created with the Proper_name field having the value of the field identified by the jspath
expression.
The contents of the values
array must be:
templateName, fieldName, cloneOption
where:
templateName
is the new record template name.fieldName
is either the new field name or an empty string meaning that the cloned filed will have the name of the source field.cloneOption
can be:- no tokens: the new field will have no references to the rule(s) that determined the extraction and to the text that triggered the rule(s).
- clone from source or an empty string: the new field has the same information—in terms of triggered rules and triggering text—of the source field.
- clone from sibling: for cases in which
jspathaction
is used to select sibling nodes, the new field has the same information, in terms of triggered rules and triggering text, of the field selected byjspathaction
.
clone value
Use clone value to clone and/or modify an extracted value into another predefined field.
For example, consider this template:
TEMPLATE(PERSONAL_DATA)
{
@NAME,
@AGE,
@ADDRESS,
@NICKNAME
}
If these rules:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@NAME[TYPE(NPH)]
}
IDENTIFY(PERSONAL_DATA)
{
@NICKNAME[TYPE(NPH)]
}
}
are applied to this input text:
Hello Alan.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "clone value", true ,
"$..extraction[?(@.template == 'PERSONAL_DATA')].fields[?(@.field == 'NAME')]",
"$..extraction[?(@.template == 'PERSONAL_DATA')].fields[?(@.field == 'NICKNAME')]",
true, [true, /^(.+)$/, "The bard of Northampton"]);
return result;
}
you will get:
As you can see, the value of the field NAME has been turned into The bard of Northampton and moved to the field NICKNAME.
The contents of the values array in case of modification of the cloned value must be:
replaceFlag, regularExpression, replacementString
where:
replaceFlag
is a boolean with the value oftrue
allowing you to apply a regular expression to modify the extracted value.regularExpression
is the regular expression that determines the parts of the value to change.replacementString
is the replacement string where placeholders like$1
,$2
, etc. can be used to refer to the capturing groups of the regular expression.
In case of value clonation, the values array must be left empty.
clone instances
Use clone instances to replace the normalized field values with the extracted textual values.
This method can be very useful when used in combination with tagging and/or transformation.
For example, consider this template and tag:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Age,
@Address,
@Job_type
}
TAGS
{
@TAG1
}
If these rules:
SCOPE SENTENCE
{
TAGGER()
{
@TAG1[LEMMA("developer", "software developer")]
}
IDENTIFY(PERSONAL_DATA)
{
@Job_type[TAG(TAG1)]|[TAG]
}
}
are applied to this input text:
Marco is a developer and Jonathan and Mary are also software developers.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "clone instances", true, "$..extraction[?(@.template == 'PERSONAL_DATA')]..fields[?(@.field == 'Job_type')].value", "#this#", true, ["longest instance", true, /^((software) developer(s)?)$/gi, "$2 dev$3."]);
return result;
}
you will get:
As you can see, the longest textual value software developers was extracted and turned into software devs.
The contents of the values
array must be:
instanceType, replaceFlag, regularExpression, replacementString
where:
instanceType
is a flag that establishes which text value will be copied. It can be:all instances
: clone all text instances separated by a pipe character (|
).longest instance
: clone the first longest instance of the text values.first instance
: clone the first instance of the text values.
replaceFlag
is a boolean, it can be:false
: clone the text values as they are.true
: apply a regular expression and the replacement string.
regularExpression
is the regular expression that determines the parts of the value to change.replacementString
is the replacement string where placeholders like$1
,$2
, etc. can be used to refer to the capturing groups of the regular expression.
Note
The last two parameters must be inserted if replaceFlag
is set to true
.
add category
Use add category to add a new category.
For example, consider this taxonomy:
1 Animals
1.1 Cats
1.2 Dogs
and this template:
TEMPLATE(DOGS_BREED)
{
@Name
}
If this rule:
SCOPE SENTENCE
{
IDENTIFY(DOGS_BREED)
{
@Name[ANCESTOR(100000144)] //@SYN: #100000144# [dog]
}
}
is applied to this input text:
Rex is a beautiful 10 year old German Shepherd, and his help was extremely important for the Police.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "add category", true, "$..extraction[?(@.template == 'DOGS_BREED')]", "#this#", true, ["1.2", "Dogs", 10, 100.0]);
return result;
}
you will get:
Categorization | Extraction |
---|---|
As you can see, the 1.2 category with the Dogs label was added with a category score and compound score equal to 10 and a frequency equal to 100.0%.
Parameters jsPathAction
and recursive
are ignored.
The contents of the values
array must be:
categoryName, categoryLabel, scoreAndCompound, categoryFrequency
where:
categoryName
is the category name.categoryLabel
is the category label.scoreAndCompound
is a non negative integer number used for both the category score and the compound score.categoryFrequency
is a non negative decimal number used for the category frequency.
modify
Use modify to change:
- The category name
- The category label
- The record template name
- The field name
- The field value
For example, consider these templates differring only in the name:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Geographical_location,
@Main_works
}
TEMPLATE(COMIC_WRITERS)
{
@Name,
@Date_of_birth,
@Geographical_location,
@Main_works
}
If the following rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Date_of_birth[TYPE(DAT)]
<>
@Geographical_location[SYNCON(100192240)] //@SYN: #100192240# [Northampton]
}
}
is applied to this input text:
Mr. Alan Moore was born on the 18th of November 1953 in Northampton.
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "modify", true, "$..extraction[?(@.template == 'PERSONAL_DATA')].template", "#this#", true, ["COMIC_WRITERS"]);
return result;
}
or with this one:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "modify", true, "$..extraction[?(@.template == 'PERSONAL_DATA')].template", "#this#", true, "COMIC_WRITERS");
return result;
}
you will get:
As you can see, the record template name has changed from PERSONAL_DATA to COMIC_WRITERS.
values
must contain one item that is the new value for the selected nodes. values
can be an array or a string.
The same output can also be obtained with the following code using the multiple
string as jsPathConditionFlag
:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "modify", "multiple",
[
false, "$..extraction[?(@.template == 'COMIC_WRITERS')]",
true, "$..extraction[?(@.template == 'PERSONAL_DATA')]",
true, ".fields[?(@.field == 'Date_of_birth' && @.value == 'Nov-18-1953')]",
true, "^.template"
], "#this#", true, ["COMIC_WRITERS"], true)
return result;
}
With this example code, the record template name will be modified only if:
- There is not an initial record named COMIC_WRITERS.
- There is an initial record named PERSONAL_DATA.
- There is the Date_of_birth field with the value—at its base form—Nov-18-1953.
Note
- If these conditions are satisfied,
^.template
goes back to the template level modifying its name. - The final
true
will determine the final match of the whole array.
modify regex
Like modify with the only difference that a regular expression is used to determine which parts of the value have to be replaced.
For example, consider the following templates:
TEMPLATE(PERSONAL_DATA)
{
@Name,
@Date_of_birth,
@Geographical_location,
@Main_works
}
TEMPLATE(PERSONAL_DATA_TEST)
{
@Name,
@Date_of_birth,
@Geographical_location,
@Main_works
}
If this rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA)
{
@Name[TYPE(NPH)]
<>
@Date_of_birth[TYPE(DAT)]
<>
@Geographical_location[SYNCON(100192240)] //@SYN: #100192240# [Northampton]
}
}
is applied to this input text:
Mr. Alan Moore was born on the 18th of November 1953 in Northampton.
this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "modify regex", true, "$..extraction[?(@.template == 'PERSONAL_DATA')].template", "#this#", true, [/^(.+)$/, "$1_TEST"]);
return result;
}
will produce the following change:
No code | Code |
---|---|
As you can see, the record name has changed. The new one is a concatenation of the first regular expression capturing group plus _TEST.
The contents of the values
array must be:
regularExpression, replacementString
where:
regularExpression
is the regular expression that determines the parts of the node value to change.replacementString
is the replacement string where placeholders like$1
,$2
, etc. can be used to refer to the capturing groups of the regular expression.
apply math
Use apply math to apply mathematical operations to the extracted fields.
For example, consider this template:
TEMPLATE(TEST)
{
@TOTAL_MONEY,
@VALUE_TO_SUBTRACT
}
If these rules:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
!LEMMA("tax")
<1:2>
@TOTAL_MONEY[TYPE(MON)]
}
IDENTIFY(TEST)
{
LEMMA("tax")
<1:2>
@VALUE_TO_SUBTRACT[TYPE(MON)]
}
}
are applied to this input text:
On a total amount of 40000€, your taxes are 5000€
you will get:
With this code:
function onFinalize(result) {
jsonPlug.jsonPlug(result, "apply math", true ,
"$..extraction[?(@.template == 'TEST')].fields[?(@.field == 'VALUE_TO_SUBTRACT')].value",
"$..extraction[?(@.template == 'TEST')].fields[?(@.field == 'TOTAL_MONEY')].value", true, ["subtract", "jspath", true, "."]);
return result;
}
you will get:
As you can see, the value of the field VALUE_TO_SUBTRACT was subtracted from the value of the field TOTAL_MONEY.
The contents of the values
array must be:
mathematicalOperation, dynamicValue, removalFlag, separator, rounder
where:
mathematicalOperation
is the mathematical operation to apply (case insensitive). It can be:Add
: adds the value to the matchedjsPathAction
.Multiply
: multiplies the value by the matchedjsPathAction
.Subtract
: subtracts the value from the matchedjsPathAction
.Divide
: divides the value from the matchedjsPathAction
Swap divide
: divides the matchedjsPathAction
from the value.Swap subtract
: subtracts the matchedjsPathAction
from the value.Calculate % from digit
: calculates the percentage of the static/JSONPath value compared to thejsPathAction
value.Calculate digit from %
: calculates to which percentage thejsPathAction
value corresponds compared to the static/JSONPath value.
dynamicValue
is either a static numerical value or a special JSONPath keyword in case the value is a variable to be taken from the matched JSONPath.removalFlag
is a boolean value, mandatory if you have non-numerical characters like currencies, otherwise optional. If set totrue
, all non-numerical characters will be removed when parsing the values.separator
is the optional thousand separator. It can be left empty (no separator added) or can be:.
,
none
(no separator added)
rounder
is the optional number of decimals after which rounding is applied, it can be left empty. If 0, the number is rounded to its closest integer.