Skip to content

SCRIPT

Overview

SCRIPT transformation option allows changing extracted values with scripting functions. Both built-in or user-defined functions can be used.

Consider this example using the toUpper built-in function:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[LEMMA("twenty")|[SCRIPT("toUpper")]
    }
}

If the rule is applied to this input text:

Jane and Markus married twenty years ago

you will get:

The rule extracts TWENTY, uppercase, when extracted text originally was lowercase.

SCRIPT can be combined with the other transformation options with the plus (+) sign. Such a combination allows a sequential action of the transformers (see an example for the winnerTag built-in function below).

The syntax of the SCRIPT transformation option in an extraction rule is:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[SCRIPT("function1 name[:parameter]" [, "function2 name[:parameter]" ...])]
    }
}

where parameter is an optional parameter of the function.

Apart from those around attribute and SCRIPT, all the other square brackets indicate optional parts.

When more functions are specified, one function acts on the outcome of the previous and the value that is actually extracted is that coming out from the last function.

User-defined functions

User-defined functions must be defined in the main.jr file.
Their definition must have this syntax:

function name(tokenID, extraction, parameter)

The name of the parameters does not matter, but their position corresponds to their role and they must all be declared, even if not used in the body of the function.

In the first parameter the text intelligence engine, when it invokes the function during the extraction of the field, passes the ID of the text token it is examining. This, combined with the methods of the DIS pre-defined object, allows for sophisticated transformations based on the properties of the token.

In the second parameter the engine passes, as a string, the value extracted up to that moment.

In the third parameter, the engine passes the value of any parameter specified in the rule, after the name of the function and the colon. This allows for parametric transformations, calling the same function, but with different parameters based on the condition or rule.

The function must return a string, and the engine uses that return value as the new value of the current extraction.

Built-in functions

These are the built-in functions that can be used with the SCRIPT transformation option:

  • toUpper
  • toLower
  • replaceString
  • winnerTag

toUpper

toUpper turns the extracted value into uppercase. Consider the example above to see how it works.

This function does not have a parameter.

toLower

toLower turns the extracted value into lowercase. Consider this example:

SCOPE SENTENCE
{
    IDENTIFY(TEST)
    {
        @FIELD1[KEYWORD("ROSE")]|[SCRIPT("toLower")]
    }
}

If this rule is applied to this text:

He bought me a ROSE.

you get:

This function does not have a parameter.

replaceString

replaceString replaces all the occurrences of a string with another in the extracted value.

For example, if this rule:

SCOPE SENTENCE
{
    IDENTIFY(BUSINESS_STATS)
    {
        @LENGHT_OF_TIME[KEYWORD("qt")]|[SCRIPT("replaceString:qt|qtr")]
    }
}

is applied to this input text:

Profit is up 12% in 3rd qt.

you get the extraction of qtr instead of qt.

The replaceString function has a parameter, the syntax is:

replaceString:stringToReplace|replacementString

winnerTag

The winnerTag function returns the names of token's tags that match its parameter.
It is used in combination with the TAG transformation option, when the extraction corresponds to a token with multiple tags.

Consider this example where this situation occurs

TEMPLATE(INJURY)
{
   @TYPE_OF_INJURY,
   @LOCATION_OF_INJURY
}

TAGS
{
    @INJURY_TYPE,
    @INJURY_LOCATION
}

...

SCOPE SENTENCE
{
    TAGGER()
    {
        @INJURY_TYPE[SYNCON(105781808)] // backache
    }

    TAGGER()
    {
        @INJURY_LOCATION[SYNCON(105781808)] // backache
    }

    IDENTIFY(INJURY)
    {
        @TYPE_OF_INJURY[TAG(INJURY_TYPE)]|[TAG + SCRIPT("winnerTag:INJURY_TYPE")]
    }
}

If the extraction rule is applied to this text:

The patient suffers from backache.

the backache token is tagged twice since it indicates both the type of injury and its location. If the extraction rule was:

IDENTIFY(INJURY)
{
    @TYPE_OF_INJURY[TAG(INJURY_TYPE)]|[TAG]
}

the extraction would be:

that is the concatenation of both tags.
The winnerTag functions selects only the tag specified as its parameter, so the value of the extraction becomes the name of that tag. i.e. INJURY_TYPE.

The syntax of winnerTag is:

winnerTag:tagName1[|tagName2 ... ]

or:

winnerTag:tagListPath[|tagListPath2 ... ]

where tagListPath refers to the path of the list file containing tag names.