queryJsonPath

Standard output types

The queryJsonPath method is used to extract values matched by a JSONPath (or a series of absolute/relative JSONPaths ), saving them in various formats.

This method complements other jsonPlug actions by enabling enhanced validation and the identification of complex items that demand advanced querying for subsequent scripting activities. It leverages the custom relative-node parsing feature seamlessly integrated into the JSONPath language by the script.

For example, consider this template:

TEMPLATE(CHARACTERS)
{
    @CHARACTER_NAME,
    @CHARACTER_NICKNAME,
    @CHARACTER_DATE_OF_BIRTH
}

If these rules:

SCOPE SENTENCE
{
    IDENTIFY(CHARACTERS) 
    {
        @CHARACTER_NAME[TYPE(NPH)]
        <>
        @CHARACTER_NICKNAME[KEYWORD("spider-man")]
        <>
        @CHARACTER_DATE_OF_BIRTH[TYPE(DAT)]
    }

    IDENTIFY(CHARACTERS)
    {
        @CHARACTER_NAME[TYPE(NPH)]
        <>
        @CHARACTER_NICKNAME[KEYWORD("peter parker")]
        <>
        @CHARACTER_DATE_OF_BIRTH[TYPE(DAT)]
    }
}

are applied to this input text:

Peter Parker, known as Spider-Man, was created by Stan Lee and Steve Ditko in 1962.

Miles Morales, known as the modern Peter Parker, was created by Brian Michael Bendis and Sara Pichelli in 2011.

you will get these records:

Template: CHARACTERS

Field	Value
@CHARACTER_NICKNAME	Spider-Man
@CHARACTER_NAME	Peter Parker
@CHARACTER_DATE_OF_BIRTH	1962

Template: CHARACTERS

Field	Value
@CHARACTER_NICKNAME	Peter Parker
@CHARACTER_NAME	Miles Morales
@CHARACTER_DATE_OF_BIRTH	2011

With this code:

function onFinalize(result) {
    var nickname_values = jsonPlug.queryJsonPath(result, {
            jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NAME')].value",
            outputType: "regex",
            modifierFlag: true
        });
    jsonPlug.jsonPlug(result, {
            action: "delete",
            jsPathConditionFlag: true,
            jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NICKNAME' && " + nickname_values + ".test(@.value) )]",
            jsPathAction: "#this#",
            recursive: true
        });
    return result;
}

you will get these records:

Template: CHARACTERS

Field	Value
@CHARACTER_NICKNAME	Spider-Man
@CHARACTER_NAME	Peter Parker
@CHARACTER_DATE_OF_BIRTH	1962

Template: CHARACTERS

Field	Value
@CHARACTER_NAME	Miles Morales
@CHARACTER_DATE_OF_BIRTH	2011

As you can see, the nickname_values variable has been populated with all the values of the CHARACTER_NAME field of the CHARACTERS template from the JSONPath.

The delete action of the jsonPlug method below the variable definition allows you to delete the CHARACTER_NICKNAME field if its value is matched by the regular expression generated in the variable.

With this other example:

function onFinalize(result) {
    var nickname_values = jsonPlug.queryJsonPath(result, {
            jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NAME')].value",
            outputType: "regex",
            modifierFlag: true
        });
    var character_has_conflicting_nickname = jsonPlug.queryJsonPath(result, {
            jsPath: "$.match_info.rules.extraction[?(@.template == 'CHARACTERS')].fields[?(@.field == 'CHARACTER_NICKNAME' && " + nickname_values + ".test(@.value) )]",
            outputType: "boolean"
        });
    CONSOLE.log(character_has_conflicting_nickname);
}

you will get the boolean value of true as output in the Output tab of the Console tool window, because the value of the CHARACTER_NICKNAME field of the second sentence is equal to the CHARACTER_NAME value of the first sentence. In case of no match, a value of false would have been reported.

The syntax of the queryJsonPath method is:

moduleVariable.queryJsonPath(result, {
    jsPath: parameterValue,
    outputType: parameterValue,
    modifierFlag: parameterValue
})

where parameterValue is one of the possible values of the corresponding parameters described below:

moduleVariable is the variable corresponding to the module and set with require().
result is the object containing the analysis results.
jsPath is the JSONPath expression that determines the nodes to which the action must be applied. It can be:
- A standard JSONPath.
Or:
- An array of multiple JSONPaths (see modify).
outputType is the output type, it can be:
- array: an array containing all the matched value(s) by the JSONPath query.
- string: the value(s) matched by the JSONPath query are concatenated into a single string (elements are separated by a whitespace).
- regex: the value(s) matched by the JSONPath query are turned into a regular expression.
- object: the value(s) matched by the JSONPath query are turned into an object (each property will have a value of true).
- boolean: boolean value of true in case of a match, false otherwise.
- count: returns an integer representing the number of matched items.
- paths: contains advanced query information for further script manipulations, in a custom format optimized for Studio's output model (see below).
- standard paths: same output as the standard paths method of the official JSONPath library.
- nodes: same output as the standard nodes method of the official JSONPath library.
- regex object: (to be used with extractions only) an object containing several regular expressions used for sophisticated jsonPath validations involving aggregated data (see below).
modifierFlag is a flag that:
- If you select array, string or count as outputType and the flag is set to false, duplicates will not be removed from the matched items (which is the default behavior).
- If you select regex as outputType and the flag is set to true, it will make the regular expression case insensitive.

Alternatively, you can use this syntax:

moduleVariable.queryJsonPath(result, jsPath, outputType, modifierFlag)

Note

The parameters in the second syntax must be declared in this exact order.
Both syntaxes can be used interchangeably.

'paths' output type

If outputType is set to paths, the output array will contain an amount of objects equal to the number of items matched by the JSONPath(s) in the result object.

The format of these objects will change according to the matched items:

for a record or a field in the extraction property, the following properties will be created:
- path_type: will contain a string with the value of extraction in this case.
- record_id: the position (index) of the matched record in the extraction array.
- template_name: the record template name.
- field_id: the position (index) of the matched field inside its father record.
- field_name: the name of the matched field.
- field_value: the value of the matched field.
- field_instance: the instance array of the matched field.
- field_instance_offset: an array of objects containing all the positional information of the matched extractions, structured as:
  - begin: the starting position of that textual instance.
  - end: the ending position of that textual instance.
  - length: the length of the textual match.
- field_instance_text: an array containing all the non-normalized pieces of text matched by this extraction.
- field_confidence: the confidence score of the matched field.
- field_siblings: an array of objects containing information about the sibling of the matched field, useful to check aggregated data:
  - field_name: the name of the sibling field.
  - field_value: the value of the sibling field.
  - field_id: the position (index) of the sibling field inside its father record.
for a category in the categorization property, the following properties will be created:
- path_type: will contain a string with the value of categorization in this case.
- id: the position (index) of the matched category in the categorization array.
- name: the name of the matched category.
- label: the label of the matched category.
- score: the score of the matched category.
- compound: the compound score of the matched category.
- frequency: the frequency of the matched category.
- winner: the winner status of the matched category.
- rules: the rules array of the matched category.
for a segment in the segment property, the following properties will be created:
- path_type: will contain a string with the value of segment in this case.
- index: the index of the matched segment within the segment array.
- name: the name of the matched segment.
- positions: an array of objects containing all the positional information of the matched segment, structured as:
  - begin: the starting position of that segment instance.
  - end: the ending position of that segment instance.
  - score: the score of that segment instance.
  - rules: the rules information of that segment instance.
for a section in the sections property, the following properties will be created:
- path_type: will contain a string with the value of sections in this case.
- index: the index of the matched section within the sections array.
- name: the name of the matched section.
- positions: an array of objects containing all the positional information of the matched section, structured as:
  - begin: the starting position of that segment instance.
  - end: the ending position of that segment instance.

By creating this output, the user can leverage the JSONPath query language to select and loop only certain items for more advanced use cases which cannot be solved with the standard jsonPlug method actions.

Warning

This output shows a snapshot of the result object at the time the queryJsonPath method is invoked and will not reflect any changes made after the output is generated

'regex object' output type

If outputType is set to regex object, the method will return a single object that includes seven properties:

path_type: this property will always have a value of extraction since currently only extractions are supported by this output.
record_ids: a regex that matches all the indexes of the matched records
templates: a regex that matches all the templates values matched by the jsonPath query
field_names: a regex that matches all the field names matched by the jsonPath query
field_values: a regex that matches all the extracted values matched by the jsonPath query
field_instance_offsets: a regex that matches all the extraction offsets matched by the jsonPath query
field_instance_text: a regex that matches all the instance texts matched by the jsonPath query

These properties provide advanced regex capabilities that can be used to quickly validate complex aggregated data.

Warning

This output is based on a snapshot of the result object at the time the queryJsonPath method is invoked and will not reflect any changes made after the output is generated
To use the record_ids regex, you must set the addIndexToRecords option to true.

Here's an example:

function onFinalize(result) {
    var false_covers = jsonPlug.queryJsonPath(result, {
        jsPath: "$.match_info.rules.extraction[?(@.template == 'Covers')].fields[?(@.value == 'false')]",
        outputType: "regex object"
    });
    var false_covers_ids = false_covers.record_ids;
    var false_covers_names = false_covers.field_names;

    jsonPlug.jsonPlug(result, {
        action: "delete",
        jsPathConditionFlag: true,
        jsPath: "$.match_info.rules.extraction[?(@.template == 'Covers' && " + false_covers_ids +
        ".test(@.index) )].fields[?(" + false_covers_names + ".test(@.field) && @.value == 'true' )]",
        jsPathAction: "#this#",
        recursive: true,
        addIndexToRecords: true
    });

    return result;
}

This code aims to identify records where the same cover has both a value of false and true. To avoid hard-coding all the possible cover names (which can be hundreds), the code performs the following steps:

Matches all the records named Covers having a child-field with an extracted value of false, and extracts the record ids to know which records present this issue.
Calls the main jsonPlug method once the querying phase is over, using the addIndexToRecords option, which adds an index property to each record.
Uses the advanced info provided by the regex object to identify which records suffer from the issue and only enters those.
Validates only the field names presenting the duplication issue, matches those with an extracted value of true.
Uses the the delete action to remove those fields.