Skip to content

mergepost

Overview

The mergepost module allows you to manipulate the extraction output.

It has the following methods:

  • FIELD_CLONE
  • MERGE_BY_VALUE
  • MERGE_BY_INSTANCE
  • MERGE_BY_INSTANCE_BEGIN_ONLY
  • REPLACE_FIELD_VALUE
  • RECORD_CLONE
  • load
  • apply
  • getLastError

When in Studio you install the mergepost module in your project, Studio modifies the main.jr file to insert this statement at the beginning of the file:

var mergepost = require("modules/mergepost");

The statement above sets a variable with an instance of the module so that you can use it inside event handling functions.

All methods except for load and getLastError must be used in the onFinalize function, because they act on the analysis results available when this function is run.

The load method must be used in the initialize function, because it is the right place for the initialization of objects needed in other event handling functions.

The getLastError method must be used in the initialize function, because it retrieves the message corresponding to the last error that occurred when the load method fails.

FIELD_CLONE

The FIELD_CLONE method creates a new field whose value is a copy of the value of an existing field.
It must be used in the onFinalize function when extractions results are available.

Consider the following template:

TEMPLATE(PERSONAL_DATA)
{
    @NAME,
    @NAME_INITIALS,
    @ADDRESS,
    @PHONE
}

When the following rule:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @NAME[TYPE(NPH)]
    }
}

is applied to this text:

Sherlock Holmes is a private investigator.

you get:

With this code:

function onFinalize(result) {    
    mergepost.FIELD_CLONE(result, "PERSONAL_DATA", "NAME", "NAME_INITIALS");
    return result
}

the output becomes:

In a common use case, the value of the new field is then processed. You can see an example of this in the description of the REPLACE_FIELD_VALUE below.

The syntax of the method is:

moduleVariable.FIELD_CLONE(result, templateName, fieldName, cloneFieldName)

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • templateName is the template name of the records in which to create the new field. It can be an asterisk (*), which means "any template".
  • fieldName is the name of the field to clone.
  • cloneFieldName is the name of the clone field.

MERGE_BY_VALUE

The MERGE_BY_VALUE method merges all the records of a template when they have the same value for one or more given fields.
It must be used in the onFinalize function when extraction results are available.

For example, consider the following template:

TEMPLATE(PERSONAL_DATA)
{
    @Name,
    @Job,
    @Product,
    @Company,
    @Role
}

If these rules:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)] 
        <1:5>
        @Role[LEMMA("purchaser")]
        <1:4>
        @Product[KEYWORD("iPhone")]
    }

    IDENTIFY(PERSONAL_DATA)
    {

        @Job[LEMMA("software engineer")]
        <1:5>
        @Company[TYPE(COM)]
        <1:4>
        @Name[TYPE(NPH)]
    }
}

are applied to this input text:

John Markovitch is the first purchaser of the new iPhone. The best software engineer for Samsung is John Markovitch.

you will get:

As you can see, there are two records about the same person (John) with different information.
If the rules above are applied to the same input text and you have this code:

function onFinalize(result) {
    mergepost.MERGE_BY_VALUE(result, "PERSONAL_DATA", ["Name"])
    return result;
}

you will get the following output:

The MERGE_BY_VALUE method gathered all the information around the "attractor field" value, in this case John for field Name, creating a unique record.

Note

When merging instances with different confidence scores, the field confidence score is calculated with the same formula used for the confidence of instances. The same happens with MERGE_BY_INSTANCE and MERGE_BY_INSTANCE_BEGIN_ONLY.

The syntax of the MERGE_BY_VALUE method is:

moduleVariable.MERGE_BY_VALUE(result, templateName, aggregatorFields[, inhibitorFields])

or:

moduleVariable.MERGE_BY_VALUE(result, templateName, aggregatorFields, inhibitorFields[, caseFlag])

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • templateName is the template name of the records to merge.
  • aggregatorFields is an array containing the field names around which the information is gathered. In case of a single field, you can also use a string instead of an array.
  • inhibitorFields is an optional array containing the names of fields that, if contained in a record, prevent it to be merged. In case of a single field, you can also use a string instead of an array.
  • caseFlag is an optional boolean that applies the merge considering the case of the first field value around which the information is gathered (see the example below).

For example, given the following template:

TEMPLATE(PERSONAL_DATA)
{
    @Name,
    @Job,
    @Product,
    @Company,
    @Role
}

and the following extraction rules with the TEXT transformer applied to the Name field:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH)]|[TEXT] 
        <1:5>
        @Role[LEMMA("purchaser")]
        <1:4>
        @Product[KEYWORD("iPhone")]
    }

    IDENTIFY(PERSONAL_DATA)
    {

        @Job[LEMMA("software engineer")]
        <1:5>
        @Company[TYPE(COM)]
        <1:4>
        @Name[TYPE(NPH)]|[TEXT]
    }
}

applied to this text:

JOHN MARKOVITCH is the first purchaser of the new iPhone. The best software engineer for Samsung is John Markovitch.

with this code:

function onFinalize(result) {
    mergepost.MERGE_BY_VALUE(result, "PERSONAL_DATA", ["Name"], [], true)
    return result;
}

you will get:

Thanks to the true boolean, the merge is based on the first occurrence of the Name field value JOHN MARKOWITCH written in uppercase.

Note

Even though you don't need the inhibitorFields parameter, you must declare it empty if you need to use the caseFlag parameter.

MERGE_BY_INSTANCE

The MERGE_BY_INSTANCE method merges all the records of a template according to the instance of a value for one or more given fields. Unlike MERGE_BY_VALUE, this allows you to distinguish between identical textual values that do not correspond to the same entity.

The method must be used in the onFinalize function when extraction results are available.

Note

The (optional) use of the SOLITARY attribute is recommended when using this method.

For example, consider this template:

TEMPLATE(PERSONAL_DATA)
{
    @Name (S),
    @Address,
    @Phone
}

With these rules:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH) - TYPE(PRO)]
        <-9:9>
        @Address[TYPE(ADR)]
    }

    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH) - TYPE(PRO)]
        <:9>
        @Phone[TYPE(PHO)]
    }
}

applied to this text:

Steven lives in Baltimora Street and his phone number is 3333333333. His friend Mary bought a house in Tropicana Street where she lives with her husband Steven and his phone number is 3333333331.

you will get:

With this code:

function onFinalize(result) {
    mergepost.MERGE_BY_INSTANCE(result, "PERSONAL_DATA", ["Name"])
    return result;
} 

and with this other one:

function onFinalize(result) {
    mergepost.MERGE_BY_VALUE(result, "PERSONAL_DATA", ["Name"])
    return result;
} 

you will get:

MERGE_BY_INSTANCE MERGE_BY_VALUE

As you can see, MERGE_BY_VALUE is based on identical textual values but it does not recognize different entities, while MERGE_BY_INSTANCE allows you to disambiguate between different entities having a common textual value.

The syntax is:

moduleVariable.MERGE_BY_INSTANCE(result, template, aggregatorFields[, inhibitorField])

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • templateName is the template name of the records to merge.
  • aggregatorFields is an array containing the field names around which the information is gathered. In case of a single field, you can also use a string instead of an array.
  • inhibitorFields is an optional array containing the names of fields that, if contained in a record, prevent it to be merged. In case of a single field, you can also use a string instead of an array.

MERGE_INSTANCE_BEGIN_ONLY

Like MERGE_BY_INSTANCE, with the difference that the MERGE_INSTANCE_BEGIN_ONLY method only considers the beginning of the value instance.

Note

The (optional) use of the SOLITARY attribute is recommended when using this method.

For example, consider this template:

TEMPLATE(PERSONAL_DATA)
{
    @Name (S),
    @Address,
    @Phone
}

With these rules:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH) - TYPE(PRO)]
        <-9:9>
        @Address[TYPE(ADR)]
    }

    IDENTIFY(PERSONAL_DATA)
    {
        @Name[TYPE(NPH) - TYPE(PRO)]
        <:9>
        @Phone[TYPE(PHO)]
    }
}

applied to this text:

Steven lives in Baltimora Street and his phone number is 3333333333. His friend Mary bought a house in Tropicana Street where she lives with her husband Steven and his phone number is 3333333331.

you will get:

With this code:

function onFinalize(result) {
    mergepost.MERGE_INSTANCE_BEGIN_ONLY(result, "PERSONAL_DATA", ["Name"])
    return result;
} 

you will get:

The syntax is:

moduleVariable.MERGE_INSTANCE_BEGIN_ONLY_ONLY(result, template, aggregatorFields[, inhibitorField])

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • templateName is the template name of the records to merge.
  • aggregatorFields is an array containing the field names around which the information is gathered. In case of a single field, you can also use a string instead of an array.
  • inhibitorFields is an optional array containing the names of fields that, if contained in a record, prevent it to be merged. In case of a single field, you can also use a string instead of an array.

REPLACE_FIELD_VALUE

The REPLACE_FIELD_VALUE changes fields values.
It must be used in the onFinalize function when extractions results are available.

Consider the same example used to describe the FIELD_CLONE method, which create a "clone" on an existing field. Then this code:

function onFinalize(result) {
    mergepost.FIELD_CLONE(result, "PERSONAL_DATA", "NAME", "NAME_INITIALS");
    mergepost.REPLACE_FIELD_VALUE(result, "PERSONAL_DATA", false, "NAME_INITIALS", {"Sherlock Holmes": "S.H.", "John Watson": "J.W."});
    return result;
}

creates the clone field and changes its value. The output becomes:

The syntax is:

moduleVariable.REPLACE_FIELD_VALUE(result, template, eraseNotFound, field, replacementRules)

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • template is the template name of the records to act upon.
  • eraseNotFound is a boolean. It can be:
    • true: all the fields not matching the replacement rules (see below) are deleted from their respective records. If, as a consequence of fields removal, the record becomes empty, also the record is deleted.
    • false: all the fields not matching the replacement rules are left untouched.
  • field is the field name.
  • replacementRules is a JSON object representing replacement rules. Each property name is interpreted as the value to replace, while the property value is the replacement value.

RECORD_CLONE

The RECORD_CLONE method allows you to clone a record with its fields.

For example, consider these two templates:

TEMPLATE(ATHLETES)
{
    @FULL_NAME,
    @AGE,
    @DATE_OF_BIRTH,
    @PLACE_OF_BIRTH
}

TEMPLATE(OLYMPIC_CHAMPIONS)
{
    @FULL_NAME,
    @AGE,
    @DATE_OF_BIRTH,
    @PLACE_OF_BIRTH
}

If this rule

SCOPE SENTENCE 
{
    IDENTIFY(ATHLETES)
    {
        @FULL_NAME[TYPE(NPH)]
        <>
        @AGE[PATTERN("[1-9][0-9]")]
        <>
        @DATE_OF_BIRTH[TYPE(DAT)]
        <>
        @PLACE_OF_BIRTH[SYNCON(100005092)]//@SYN: #100005092# [Baltimore]
    }
}

is applied to this text:

Michael Phelps (37) was born on the 30th of June 1985 in Baltimore.

you will get:

With this code:

function onFinalize(result) {
    mergepost.RECORD_CLONE(result, "ATHLETES", "OLYMPIC_CHAMPIONS");
    return result
}

you will get:

The syntax for RECORD_CLONE is:

moduleVariable.RECORD_CLONE(result, template, newTemplate)

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.
  • template is the name of the template on which the clonation is based.
  • newTemplate is the cloned template.

load

The load method prepares one or more of the operations that can be attained with the methods above, but using as its source a configuration file generated when importing a project created with a legacy edition of Studio. Prepared operations are then applied using the apply method.

Warning

The use of the load method is not required in cases other than that indicated above and the import procedure already generates the appropriate statements inside the main.jr file, so there are basically no cases in which you have to write code that uses this method.

For example, when importing an old project, Studio may generate this code:

var mergepost = require("modules/mergepost");

function initialize(cmdline) {
    if (!mergepost.load('Config.xml')) {
        CONSOLE.error(mergepost.getLastError());
        return false;
    }
    return true;
}

function onFinalize(result) {
    result = mergepost.apply(result);
    return result;
}

The syntax is:

moduleVariable.load(configPath)

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • configPath is the path of the configuration file generated by the import procedure.

The method returns true in case of success, false otherwise. In case of failure it sets an error message you can retrieve with the getLastError method.

apply

The apply method performs all the operations prepared with the invocation of the load method.
It must be used in the onFinalize function when extractions results are available.

For example:

function onFinalize(result) {
    result = mergepost.apply(result);
    return result;
}

The syntax is:

moduleVariable.apply(result)

where:

  • moduleVariable is the variable corresponding to the module and set with require().
  • result is the object containing the analysis results.

getLastError

The getLastError method retrieves the message corresponding to the last error that occurred when theload method fails. Use it to display the error message.

For example:

function initialize(cmdline) {
    if (!mergepost.load('Config.xml')) {
        CONSOLE.error(mergepost.getLastError());
        return false;
    }
}

The syntax is:

moduleVariable.getLastError()

where moduleVariable is the variable corresponding to the module and set with require().