linkPost
Overview
The linkPost module allows you to modify extraction results by:
- Copying or moving a field from a record to another.
- Removing records based on the presence/absence of fields.
The module has these methods:
LINK_FIELD
VALIDATE_FIELD
REMOVE_WEAK_FIELD
load
apply
getLastError
close
When in Studio you install the linkPost module in your project, Studio modifies the main.jr file to insert this statement at the beginning of the file:
var linkPost = require("modules/linkPost");
The statement above sets the linkPost
variable with an instance of the module so that you can use its methods inside event handling functions.
LINK_FIELD
, VALIDATE_FIELD
and apply
must be used in the onFinalize
function, because they act on the analysis results available when this function is run.
The load
method must be used in the initialize
function, because it is the right place for the initialization of objects needed in other event handling functions.
The getLastError
method must be used together with the load
method.
The close
method must be used in the shutdown
function, because it's the rght place to free up the resources allocated by the module.
LINK_FIELD
The LINK_FIELD
method allows you to copy a field from a record to another record having a specific "attractor" field, optionally deleting the field from the source record.
The template of the destination record must have a field with the same name of the source field.
For example, if you have these templates:
TEMPLATE(SUPERHEROES)
{
@HUMAN_FULL_NAME,
@JOB,
@SUPERHERO_NAME,
@SUPER_POWER,
@PUBLISHER
}
TEMPLATE(COMIC_PUBLISHER_AND_WRITERS)
{
@PUBLISHER,
@WRITER
}
and these rules:
SCOPE PARAGRAPH
{
IDENTIFY(SUPERHEROES)
{
@HUMAN_FULL_NAME[TYPE(NPH)]
<>
@JOB[LEMMA("lawyer")]
<>
@SUPERHERO_NAME[KEYWORD("daredevil")]
<>
@SUPER_POWER[KEYWORD("enhanced sense", "enhanced senses")]
}
IDENTIFY(COMIC_PUBLISHER_AND_WRITERS)
{
@PUBLISHER[SYNCON(100173496)]//@SYN: #100173496# [Marvel Comics Group]
<>
@WRITER[KEYWORD("brian michael bendis")]
}
}
applied to this text:
Matt Murdock is a lawyer by day and Daredevil by night, a blind superhero but with other extremely enhanced senses. Daredevil is published by Marvel Comics and his best writer is Brian Michael Bendis.
you will get these records:
Template: COMIC_PUBLISHER_AND_WRITERS
Field | Value |
---|---|
@PUBLISHER | Marvel Comics |
@WRITER | Brian Michael Bendis |
Template: SUPERHEROES
Field | Value |
---|---|
@HUMAN_FULL_NAME | Matt Murdock |
@JOB | lawyer |
@SUPERHERO_NAME | Daredevil |
@SUPER_POWER | enhanced senses |
With this code:
function onFinalize(result) {
linkPost.LINK_FIELD(result, {
sourceFieldName: "PUBLISHER",
sourceRecordTemplate: "COMIC_PUBLISHER_AND_WRITERS",
destinationRecordTemplate: "SUPERHEROES",
attractorName: "SUPERHERO_NAME",
sourceFieldValue: "*",
attractorValue: "*",
scope: "paragraph",
deleteFlag: true
});
return result;
}
or with this other one:
function onFinalize(result) {
linkPost.LINK_FIELD(result, "COMIC_PUBLISHER_AND_WRITERS", "PUBLISHER", "*", "SUPERHEROES", "SUPERHERO_NAME", "*", "PARAGRAPH", true);
return result;
}
you will get these records:
Template: COMIC_PUBLISHER_AND_WRITERS
Field | Value |
---|---|
@WRITER | Brian Michael Bendis |
Template: SUPERHEROES
Field | Value |
---|---|
@HUMAN_FULL_NAME | Matt Murdock |
@JOB | lawyer |
@SUPERHERO_NAME | Daredevil |
@SUPER_POWER | enhanced senses |
@PUBLISHER | Marvel Comics |
As you can see, the PUBLISHER field has been deleted from the COMIC_PUBLISHER_AND_WRITERS record and added to the SUPERHEROES record.
The LINK_FIELD
method requires:
moduleVariable
is the variable corresponding to the module and set withrequire()
.result
is the object containing the analysis results.-
arguments
is an object containing the parameters to be used. Such parameters are:sourceFieldName
is the name of the source field.sourceRecordTemplate
is the template name of the source record.destinationRecordTemplate
is the template name of the destination record.attractorName
: the attractor is a field that must exist in the destination record in order to "attract" the source filed. This parameter is the name of the attractor field.sourceFieldValue
(optional) is the mandatory value of the source field: the field is used only if it matches this value. It can be an asterisk (*
) meaning any value, which is the default value if this parameter is not expressed.attractorValue
(optional mode) is the mandatory value of the attractor field: the source field is copied/moved only if the attractor field has this value. It can be an asterisk (*
) meaning any value, which is the default value if this parameter is not expressed.-
scope
(case insensitive) is the scope from which the source and the attractor fields must have been extracted. It can be:- document
- section
- paragraph
- sentence
- clause
- phrase
- token
- segment
- segment interval
Use segment if you require the fields to be part of the same segment, no matter the portion of segment. Use segment interval if you want the fields to also come from the same portion of a segment, for example the same sentence.
-
deleteFlag
is a boolean (defaulted to false). Set it totrue
to remove the field from its source record, thus moving the filed instead of copying it segmentName
(optional) is the segment name(s) to specify ifscope
is segment or segment interval. It can be both a string or an array of strings, if multiple segments are to be checked.sectionName
(optional) is the section name(s) to specify ifscope
is section. It can be both a string or an array of strings, if multiple sections are to be checked.
Note
Both segmentName
and sectionName
also support the overlap syntax, which follows the same format as found in the rule scope options. For example, declaring SEGMENT1:SEGMENT2
will exclusively link the target fields if their positions intersect within both segments.
This syntax can be used freely also when declaring an array of segment/sections, for example ["SEGMENT1:SEGMENT2", "SEGMENT3"]
.
Warning
If an intersection is used within the segmentName
parameter, the only supported scope will be segment interval
. Declaring a different scope will trigger an exception.
In the second example code, the syntax is:
moduleVariable.LINK_FIELD(result, sourceRecordTemplate, sourceFieldName, sourceFieldValue, destinationRecordTemplate, attractorName, attractorValue, scope, deleteFlag[, segmentName])
Note
When using the second syntax, all parameters must be specified in the provided order.
In the example above, the field was moved from one record to another because:
- The template of the destination record has a field with the same name of the source field.
- The destination record contains the attractor field SUPERHERO_NAME that was extracted from the same scope (PARAGRAPH) of the source field.
- The last parameter of the invocation of the method was set to
true
to delete the field from the source record after the copy.
VALIDATE_FIELD
The VALIDATE_FIELD method is used to delete records from the extraction results when they don't have "validation" fields. Removal can be inhibited by specifying supplemental fields whose presence counterbalances the absence of the validation fields.
For example, with this template:
TEMPLATE(RUBIK_S_CUBE)
{
@CUBE,
@INVENTOR,
@INVENTION_YEAR
}
and this rule:
SCOPE SENTENCE
{
IDENTIFY(RUBIK_S_CUBE)
{
@CUBE[LEMMA("Rubik's cube")]
<>
@INVENTION_YEAR[TYPE(DAT)]
}
}
applied to this input text:
The Rubik's cube was invented in 1975.
you will get this record:
Template: RUBIK_S_CUBE
Field | Value |
---|---|
@CUBE | Rubik's cube |
@INVENTION_YEAR | 1975 |
With this code:
function onFinalize(result) {
linkPost.VALIDATE_FIELD(result, {
templateName: "RUBIK_S_CUBE",
validatorFields: "INVENTOR"
});
return result;
}
or with this other one:
function onFinalize(result) {
linkPost.VALIDATE_FIELD(result, "RUBIK_S_CUBE", "INVENTOR")
return result;
}
you will get no output: the record based on the RUBIK_S_CUBE template has been removed from the extraction output because it doesn't contain the INVENTOR field, acting as a validator.
The VALIDATE_FIELD
method requires:
moduleVariable
is the variable corresponding to the module and set withrequire()
.result
is the object containing the analysis results.-
arguments
is an object containing the parameters to be used. Such parameters are:templateName
is the template name of the record to filter.validatorFields
is a name or an array of names of validating fields. If one or more of the fields has not been extracted, the entire record is removed.inhibitorFields
(optional) is a name or an array of names of inhibiting fields. If one or more of the fields has been extracted, the record is not removed, even if it doesn't contain validating fields.
The syntax of VALIDATE_FIELD
with the second example code is:
moduleVariable.LINK_FIELD(result, templateName, validatorFields[, inhibitorFields]
Note
- The parameter in square brackets is optional and refers to
inhibitorFields
. - When using the second syntax, all parameters must be specified in the provided order.
For example, with the same template, rule and text used above, this code:
function onFinalize(result) {
linkPost.VALIDATE_FIELD(result, {
templateName: "RUBIK_S_CUBE",
validatorFields: "INVENTOR",
inhibitorFields: "CUBE"
});
return result;
}
or this other one:
function onFinalize(result) {
linkPost.VALIDATE_FIELD(result, "RUBIK_S_CUBE", "INVENTOR", "CUBE")
return result;
}
produces this extraction:
Template: RUBIK_S_CUBE
Field | Value |
---|---|
@CUBE | Rubik's cube |
@INVENTION_YEAR | 1975 |
because even though the validator field INVENTOR is missing, the inhibitor field CUBE is present.
REMOVE_WEAK_FIELD
The REMOVE_WEAK_FIELD
method is used when two or more fields share the same value and one of them defined as strong field is kept and the other(s) defined as weak field(s) is deleted.
For example, with this template:
TEMPLATE(GRAMMAR_CLASSES)
{
@ADJECTIVE,
@NOUN
}
and this extraction rule:
SCOPE SENTENCE
{
IDENTIFY(GRAMMAR_CLASSES)
{
@ADJECTIVE[LEMMA("blue")]
OR
@NOUN[LEMMA("blue")]
}
}
applied to this input text:
The sky is blue.
you will get this output:
Template: GRAMMAR_CLASSES
Field | Value |
---|---|
@NOUN | blue |
@ADJECTIVE | blue |
With this code:
function onFinalize(result) {
linkPost.REMOVE_WEAK_FIELD(result, {
templateName: "GRAMMAR_CLASSES",
strongField: "ADJECTIVE",
weakField: "NOUN"
});
return result;
}
or with this other one:
function onFinalize(result) {
linkPost.REMOVE_WEAK_FIELD(result, "GRAMMAR_CLASSES", "ADJECTIVE", "NOUN");
return result;
}
you will get this output:
Template: GRAMMAR_CLASSES
Field | Value |
---|---|
@ADJECTIVE | blue |
As you can see, the NOUN field has been removed because it is considered as a weak field in comparison with the ADJECTIVE field.
The REMOVE_WEAK_FIELD
method requires:
moduleVariable
is the variable corresponding to the module and set withrequire()
.result
is the object containing the analysis results.-
arguments
is an object containing the parameters to be used. Such parameters are:templateName
is the template name of the record.strongField
is the name of the strong field.-
weakField
is the name of the weak field(s) to remove. It can be:- A string in case of a single field.
- An array of strings in case of multiple fields.
null
: all fields with a name different thanstrongField
—but with its same value—will be removed.
-
caseInsensitive
is an optional boolean, false by default. If set to true, the weak field deletion is case insensitive.
The syntax of REMOVE_WEAK_FIELD
with the second example code is:
moduleVariable.REMOVE_WEAK_FIELD(result, templateName, strongField, weakField [, caseInsensitive]
Note
- The parameter in square brackets is optional.
- When using the second syntax, all parameters must be specified in the provided order.
load
The load
method prepares one or more of the operations that can be attained with the methods above, but using as its source a configuration file generated when importing a project created with a legacy edition of Studio. Prepared operations are then applied using the apply
method.
Warning
The use of the load
method is not required in cases other than those described below and the import procedure already generates the appropriate statements inside the main.jr file, so there are basically no cases in which you have to write code that uses this method.
For example, when importing an old project, Studio may generate this code:
var linkPost = require("modules/linkPost");
function initialize(cmdline) {
if (!linkPost.load('Config.xml')) {
CONSOLE.error(linkPost.getLastError());
return false;
}
return true;
}
function onFinalize(result) {
result = linkPost.apply(result);
return result;
}
The syntax is:
moduleVariable.load(configPath)
where:
moduleVariable
is the variable corresponding to the module and set withrequire()
.configPath
is the path of the configuration file generated by the import procedure.
The method returns true
in case of success, false
otherwise. In case of failure it sets an error message you can retrieve with the getLastError
method.
apply
The apply
method performs all the operations prepared with the invocation of the load
method.
It must be used in the onFinalize
function when extractions results are available.
For example:
function onFinalize(result) {
result = linkPost.apply(result);
return result;
}
The syntax is:
moduleVariable.apply(result)
where:
moduleVariable
is the variable corresponding to the module and set withrequire()
.result
is the object containing the analysis results.
getLastError
The getLastError
method retrieves the message corresponding to the last error that occurred when the load
method fails. Use it to display the error message.
For example:
function initialize(cmdline) {
if (!linkPost.load('Config.xml'))) {
CONSOLE.error(linkPost.getLastError());
return false;
}
}
The syntax is:
moduleVariable.getLastError()
where moduleVariable
is the variable corresponding to the module and set with require()
.
close
The close
method is used to free up the resources allocated by the linkPost module object.
It's not mandatory to invoke this method, but if you decide to do it, invoke it inside the shutdown
function.
For example:
function shutdown() {
linkPost.close();
}
The syntax is:
moduleVariable.close()
where moduleVariable
is the variable corresponding to the module and set with require()
.