Skip to content

SCRIPT attribute

Overview

The SCRIPT attribute allows the user to integrate scripting functions into categorization and extraction rules.

A function is invoked in the attribute and its value—true or false— is the function return value.

By using SCRIPT in combination with all other standard attributes, it is possible to perform very powerful and project-oriented reasoning.

The syntax for the SCRIPT attribute is:

SCRIPT("functionName")

In order to be referenced, the script function must have been defined in a script file, typically main.jr:

function function_name(token_index, param) {
}

where:

  • token_index is the token ordinal number in the disambiguation output.
  • param is an optional parameter that can be passed by the attribute.

An alternative syntax for the SCRIPT attribute is:

SCRIPT("functionName[:parameter", "functionName:parameter",...])

where parameter is the function parameter and corresponds to the value of the param parameter.

In case of more functions separated by a comma—acting as an OR operator—one of the functions must return true so that the SCRIPT attribute is valid.

Info

If an exception occurs during the execution of the specified functions, the engine does not stop, but the SCRIPT attribute is evaluated as FALSE.

Being able to navigate the whole disambiguation output, the function can add very specific constraints to the attribute to which it relates.

Consider the following extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(PEOPLE)
    {
        @NAME[TYPE(NPH) + SCRIPT("ExcludingLemma:John Smith")]
    }
}

It aims to extract people's names except John Smith.
The following function cancels the extraction of the LEMMA passed as parameter.

function ExcludingLemma(token_index, lemma) {
    var token = DIS.getToken(token_index);
    if (token.lemma==lemma)
        return false;
    return true;
}

By leveraging the disambiguation output and being able to perform a check on the base forms, this function is able to avoid the extraction of John Smith even when an abbreviated form of the name occurs in the text.

Using SCRIPT alone and in combination

You can use the SCRIPT attribute both alone or in combination with other attributes. The second case is recommended. For example, consider this user-defined function:

function textIsTitleCase(index) {
    // constant regex
    var title_case = /^([A-ZÀ-Ÿ][a-zà-ÿ]*)\b/;
    // Get text from token
    var text = DIS.getTokenText(index);
    return title_case.test(text)
}

This function will extract tokens only in title case.

For example, with this template:

TEMPLATE(WORD_CASE)
{
    @TITLE_CASE
}

this extraction rule:

SCOPE SENTENCE
{
    IDENTIFY(WORD_CASE)
    {
        @TITLE_CASE[SCRIPT("textIsTitleCase")]
    }
}

applied to this text:

John lives in London. His brother lives in Manchester.

you will get:

With this rule:

SCOPE SENTENCE
{
    IDENTIFY(WORD_CASE)
    {
        @TITLE_CASE[TYPE(ADJ:p) + SCRIPT("textIsTitleCase")]
    }
}

you will get:

While in the first case all tokens are analyzed by the script, in the last case the script acts on a filtered list of tokens thanks to the TYPE attribute.