Skip to content

SCRIPT attribute

Overview

The SCRIPT attribute allows the user to integrate scripting functions into categorization and extraction rules.

A function is invoked in the attribute and its value—true or false— is the function return value.

By using SCRIPT in combination with all other standard attributes, it is possible to perform very powerful and project-oriented reasoning.

The syntax for the SCRIPT attribute is:

SCRIPT("functionName")

In order to be referenced, a scripting function must be defined inside the main.jr file:

function functionName(token_index, param) {
    // place code here
    return true;
}

where:

  • token_index is the token ordinal number in the disambiguation output.
  • param is an optional parameter that can be passed by the attribute, it will always be a string.

Info

It is not possible to pass additional parameters directly to functions called within the SCRIPT attribute. To address this limitation, consider including extra information within the param string. Afterwards, you can split the string and manipulate its components according to your specific needs.

The SCRIPT attribute expects a function to return either a true of a false.

Wrapped calls to separate JR modules are also supported, for example:

function functionName(token_index, param) {
    return myScript.functionName(token_index, param);
}

This will apply within a rule the functionName function exported from the module myScript.

An alternative syntax for the SCRIPT attribute is:

SCRIPT("functionName[:parameter", "function2Name:parameter",...])

where parameter is the function parameter and corresponds to the value of the param parameter.

In case of more functions separated by a comma—acting as an OR operator—one of the functions must return true so that the SCRIPT attribute is valid.

Info

If an exception occurs during the execution of the specified functions, the engine does not stop, but the SCRIPT attribute is evaluated as false.

Being able to navigate the whole disambiguation output, the function can add very specific constraints to the attribute to which it relates.

Consider the following extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(PEOPLE)
    {
        @NAME[TYPE(NPH) + SCRIPT("excludingLemma:John Smith")]
    }
}

It aims to extract people's names except John Smith.
The following function cancels the extraction of the LEMMA passed as parameter.

function excludingLemma(token_index, lemma) {
    var token = DIS.getToken(token_index);
    if (token.lemma==lemma)
        return false;
    return true;
}

By leveraging the disambiguation output and being able to perform a check on the base forms, this function is able to avoid the extraction of John Smith even when an abbreviated form of the name occurs in the text.

Using SCRIPT alone and in combination

You can use the SCRIPT attribute both alone or in combination with other attributes. The second case is recommended. For example, consider this user-defined function:

function textIsTitleCase(index) {
    // constant regex
    var title_case = /^([A-ZÀ-Ÿ][a-zà-ÿ]*)\b/;
    // Get text from token
    var text = DIS.getTokenText(index);
    return title_case.test(text)
}

This function will extract tokens only in title case.

For example, with this template:

TEMPLATE(WORD_CASE)
{
    @TITLE_CASE
}

this extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(WORD_CASE)
    {
        @TITLE_CASE[SCRIPT("textIsTitleCase")]
    }
}

applied to this text:

John lives in London. His brother lives in Manchester.

you will get these records:

Template: WORD_CASE

Field Value
@TITLE_CASE John

Template: WORD_CASE

Field Value
@TITLE_CASE London

Template: WORD_CASE

Field Value
@TITLE_CASE His

Template: WORD_CASE

Field Value
@TITLE_CASE Manchester

With this rule:

SCOPE SENTENCE
{
    IDENTIFY(WORD_CASE)
    {
        @TITLE_CASE[TYPE(ADJ:p) + SCRIPT("textIsTitleCase")]
    }
}

you will get this record:

Template: WORD_CASE

Field Value
@TITLE_CASE His

While in the first case all tokens are analyzed by the script, in the last case the script acts on a filtered list of tokens thanks to the TYPE attribute.