queryJsonPath
Standard output types
The queryJsonPath method is used to extract values matched by a JSONPath (or a series of absolute/relative JSONPaths ), saving them in various formats.
This method complements other jsonPlug actions by enabling enhanced validation and the identification of complex items that demand advanced querying for subsequent scripting activities. It leverages the custom relative-node parsing feature seamlessly integrated into the JSONPath language by the script.
For example, consider this template:
TEMPLATE(CHARACTERS)
{
@CHARACTER_NAME,
@CHARACTER_NICKNAME,
@CHARACTER_DATE_OF_BIRTH
}
If these rules:
SCOPE SENTENCE
{
IDENTIFY(CHARACTERS)
{
@CHARACTER_NAME[TYPE(NPH)]
<>
@CHARACTER_NICKNAME[KEYWORD("spider-man")]
<>
@CHARACTER_DATE_OF_BIRTH[TYPE(DAT)]
}
IDENTIFY(CHARACTERS)
{
@CHARACTER_NAME[TYPE(NPH)]
<>
@CHARACTER_NICKNAME[KEYWORD("peter parker")]
<>
@CHARACTER_DATE_OF_BIRTH[TYPE(DAT)]
}
}
are applied to this input text:
Peter Parker, known as Spider-Man, was created by Stan Lee and Steve Ditko in 1962.
Miles Morales, known as the modern Peter Parker, was created by Brian Michael Bendis and Sara Pichelli in 2011.
you will get these records:
Template: CHARACTERS
| Field | Value |
|---|---|
| @CHARACTER_NICKNAME | Spider-Man |
| @CHARACTER_NAME | Peter Parker |
| @CHARACTER_DATE_OF_BIRTH | 1962 |
Template: CHARACTERS
| Field | Value |
|---|---|
| @CHARACTER_NICKNAME | Peter Parker |
| @CHARACTER_NAME | Miles Morales |
| @CHARACTER_DATE_OF_BIRTH | 2011 |
With this code:
function onFinalize(result) {
var nickname_values = jsonPlug.queryJsonPath(result, {
jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NAME')].value",
outputType: "regex",
modifierFlag: true
});
jsonPlug.jsonPlug(result, {
action: "delete",
jsPathConditionFlag: true,
jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NICKNAME' && " + nickname_values + ".test(@.value) )]",
jsPathAction: "#this#",
recursive: true
});
return result;
}
you will get these records:
Template: CHARACTERS
| Field | Value |
|---|---|
| @CHARACTER_NICKNAME | Spider-Man |
| @CHARACTER_NAME | Peter Parker |
| @CHARACTER_DATE_OF_BIRTH | 1962 |
Template: CHARACTERS
| Field | Value |
|---|---|
| @CHARACTER_NAME | Miles Morales |
| @CHARACTER_DATE_OF_BIRTH | 2011 |
As you can see, the nickname_values variable has been populated with all the values of the CHARACTER_NAME field of the CHARACTERS template from the JSONPath.
The delete action of the jsonPlug method below the variable definition allows you to delete the CHARACTER_NICKNAME field if its value is matched by the regular expression generated in the variable.
With this other example:
function onFinalize(result) {
var nickname_values = jsonPlug.queryJsonPath(result, {
jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NAME')].value",
outputType: "regex",
modifierFlag: true
});
var character_has_conflicting_nickname = jsonPlug.queryJsonPath(result, {
jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NICKNAME' && " + nickname_values + ".test(@.value) )]",
outputType: "boolean"
});
CONSOLE.log(character_has_conflicting_nickname);
}
you will get the boolean value of true as output in the Output tab of the Console tool window, because the value of the CHARACTER_NICKNAME field of the second sentence is equal to the CHARACTER_NAME value of the first sentence. In case of no match, a value of false would have been reported.
The syntax of the queryJsonPath method is:
moduleVariable.queryJsonPath(result, {
jsPath: parameterValue,
outputType: parameterValue,
modifierFlag: parameterValue
})
where parameterValue is one of the possible values of the corresponding parameters described below:
moduleVariableis the variable corresponding to the module and set withrequire().resultis the object containing the analysis results.-
jsPathis the JSONPath expression that determines the nodes to which the action must be applied. It can be:- A standard JSONPath.
Or:
- An array of multiple JSONPaths (see modify).
-
outputTypeis the output type, it can be:array: an array containing all the matched value(s) by the JSONPath query.string: the value(s) matched by the JSONPath query are concatenated into a single string (elements are separated by a whitespace).regex: the value(s) matched by the JSONPath query are turned into a regular expression.object: the value(s) matched by the JSONPath query are turned into an object (each property will have a value oftrue).boolean: boolean value oftruein case of a match,falseotherwise.count: returns an integer representing the number of matched items.paths: contains advanced query information for further script manipulations, in a custom format optimized for Studio's output model (see below).standard paths: same output as the standardpathsmethod of the official JSONPath library.nodes: same output as the standardnodesmethod of the official JSONPath library.regex object: (to be used with extractions only) an object containing several regular expressions used for sophisticated jsonPath validations involving aggregated data (see below).
-
modifierFlagis a flag that:- If you select
array,stringorcountasoutputTypeand the flag is set tofalse, duplicates will not be removed from the matched items (which is the default behavior). - If you select
regexasoutputTypeand the flag is set totrue, it will make the regular expression case insensitive.
- If you select
Alternatively, you can use this syntax:
moduleVariable.queryJsonPath(result, jsPath, outputType, modifierFlag)
Note
- The parameters in the second syntax must be declared in this exact order.
- Both syntaxes can be used interchangeably.
'paths' output type
If outputType is set to paths, the output array will contain an amount of objects equal to the number of items matched by the JSONPath(s) in the result object.
The format of these objects will change according to the matched items:
- for a record or a field in the
extractionproperty, the following properties will be created:path_type: will contain a string with the value ofextractionin this case.record_id: the position (index) of the matched record in theextractionarray.template_name: the record template name.field_id: the position (index) of the matched field inside its father record.field_name: the name of the matched field.field_value: the value of the matched field.field_instance: theinstancearray of the matched field.field_instance_offset: an array of objects containing all the positional information of the matched extractions, structured as:begin: the starting position of that textual instance.end: the ending position of that textual instance.length: the length of the textual match.
field_instance_text: an array containing all the non-normalized pieces of text matched by this extraction.field_confidence: the confidence score of the matched field.field_siblings: an array of objects containing information about the sibling of the matched field, useful to check aggregated data:field_name: the name of the sibling field.field_value: the value of the sibling field.field_id: the position (index) of the sibling field inside its father record.
- for a category in the
categorizationproperty, the following properties will be created:path_type: will contain a string with the value ofcategorizationin this case.id: the position (index) of the matched category in thecategorizationarray.name: the name of the matched category.label: the label of the matched category.score: the score of the matched category.compound: the compound score of the matched category.frequency: the frequency of the matched category.winner: thewinnerstatus of the matched category.rules: therulesarray of the matched category.
- for a segment in the
segmentproperty, the following properties will be created:path_type: will contain a string with the value ofsegmentin this case.index: the index of the matched segment within thesegmentarray.name: the name of the matched segment.positions: an array of objects containing all the positional information of the matched segment, structured as:begin: the starting position of that segment instance.end: the ending position of that segment instance.score: the score of that segment instance.rules: the rules information of that segment instance.
- for a section in the
sectionsproperty, the following properties will be created:path_type: will contain a string with the value ofsectionsin this case.index: the index of the matched section within thesectionsarray.name: the name of the matched section.positions: an array of objects containing all the positional information of the matched section, structured as:begin: the starting position of that segment instance.end: the ending position of that segment instance.
By creating this output, the user can leverage the JSONPath query language to select and loop only certain items for more advanced use cases which cannot be solved with the standard jsonPlug method actions.
Warning
- This output shows a snapshot of the
resultobject at the time the queryJsonPath method is invoked and will not reflect any changes made after the output is generated
'regex object' output type
If outputType is set to regex object, the method will return a single object that includes seven properties:
path_type: this property will always have a value of extraction since currently onlyextractionsare supported by this output.record_ids: a regex that matches all the indexes of the matched recordstemplates: a regex that matches all the templates values matched by the jsonPath queryfield_names: a regex that matches all the field names matched by the jsonPath queryfield_values: a regex that matches all the extracted values matched by the jsonPath queryfield_instance_offsets: a regex that matches all the extraction offsets matched by the jsonPath queryfield_instance_text: a regex that matches all the instance texts matched by the jsonPath query
These properties provide advanced regex capabilities that can be used to quickly validate complex aggregated data.
Warning
- This output is based on a snapshot of the
resultobject at the time the queryJsonPath method is invoked and will not reflect any changes made after the output is generated - To use the
record_idsregex, you must set theaddIndexToRecordsoption totrue.
Here's an example:
function onFinalize(result) {
var false_covers = jsonPlug.queryJsonPath(result, {
jsPath: "$.match_info.rules.extraction[?(@.template == 'Covers')].fields[?(@.value == 'false')]",
outputType: "regex object"
});
var false_covers_ids = false_covers.record_ids;
var false_covers_names = false_covers.field_names;
jsonPlug.jsonPlug(result, {
action: "delete",
jsPathConditionFlag: true,
jsPath: "$.match_info.rules.extraction[?(@.template == 'Covers' && " + false_covers_ids +
".test(@.index) )].fields[?(" + false_covers_names + ".test(@.field) && @.value == 'true' )]",
jsPathAction: "#this#",
recursive: true,
addIndexToRecords: true
});
return result;
}
This code aims to identify records where the same cover has both a value of false and true. To avoid hard-coding all the possible cover names (which can be hundreds), the code performs the following steps:
- Matches all the records named
Covershaving a child-field with an extracted value offalse, and extracts the record ids to know which records present this issue. - Calls the main
jsonPlugmethod once the querying phase is over, using theaddIndexToRecordsoption, which adds an index property to each record. - Uses the advanced info provided by the regex object to identify which records suffer from the issue and only enters those.
- Validates only the field names presenting the duplication issue, matches those with an extracted value of
true. - Uses the the
deleteaction to remove those fields.