SCRIPT
Overview
SCRIPT
transformation option allows changing output values with scripting functions. Both built-in or user-defined functions can be used.
Consider this example using the toUpper
built-in function:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
@FIELD1[LEMMA("twenty")|[SCRIPT("toUpper")]
}
}
If the rule is applied to this input text:
Jane and Markus married twenty years ago
you will get this record:
Template: TEST
Field | Value |
---|---|
@FIELD1 | TWENTY |
The rule extracts TWENTY, uppercase, when extracted text originally was lowercase.
SCRIPT
can be combined with the other transformation options with the plus (+
) sign. Such a combination allows a sequential action of the transformers.
The syntax in an extraction rule is:
SCOPE scopeOption
{
IDENTIFY(templateName)
{
@field[attribute]|[SCRIPT("function1 name[:parameter]" [, "function2 name[:parameter]" ...])]
}
}
The syntax in a tagging rule is:
SCOPE scopeOption
{
TAGGER(tagLevel)
{
@tag[attribute]|[SCRIPT("function1 name[:parameter]" [, "function2 name[:parameter]" ...])]
}
}
where parameter
is an optional parameter of the function.
Apart from those around attribute
and SCRIPT
, all the other square brackets indicate optional parts.
When more functions are specified, one function acts on the outcome of the previous and the final output is that coming out from the last function.
User-defined functions
User-defined functions must be defined in the main.jr file.
Their definition must have this syntax:
function name(tokenID, extraction, parameter)
The name of the parameters does not matter, but their position corresponds to their role and they must all be declared, even if not used in the body of the function.
In the first parameter the text intelligence engine, when it invokes the function during the extraction of the field, passes the ID of the text token it is examining. This, combined with the methods of the DIS
pre-defined object, allows for sophisticated transformations based on the properties of the token.
In the second parameter the engine passes, as a string, the value extracted up to that moment.
In the third parameter, the engine passes the value of any parameter specified in the rule, after the name of the function and the colon. This allows for parametric transformations, calling the same function, but with different parameters based on the condition or rule.
The function must return a string, and the engine uses that return value as the new value of the current extraction.
Built-in functions
These are the built-in functions that can be used with the SCRIPT
transformation option:
toUpper
toLower
replaceString
toUpper
toUpper
turns the extracted value into uppercase. Consider the example above to see how it works.
This function does not have a parameter.
toLower
toLower
turns the extracted value into lowercase. Consider this example:
SCOPE SENTENCE
{
IDENTIFY(TEST)
{
@FIELD1[KEYWORD("ROSE")]|[SCRIPT("toLower")]
}
}
If this rule is applied to this text:
He bought me a ROSE.
you get this record:
Template: TEST
Field | Value |
---|---|
@FIELD1 | rose |
This function does not have a parameter.
replaceString
replaceString
replaces all the occurrences of a string with another in the extracted value.
For example, if this rule:
SCOPE SENTENCE
{
IDENTIFY(BUSINESS_STATS)
{
@LENGHT_OF_TIME[KEYWORD("qt")]|[SCRIPT("replaceString:qt|qtr")]
}
}
is applied to this input text:
Profit is up 12% in 3rd qt.
you get the extraction of qtr instead of qt.
The replaceString
function has a parameter, the syntax is:
replaceString:stringToReplace|replacementString