The scripting language
As mentioned in the introduction to Studio languages, scripting is a completely optional but very powerful way to customize the document analysis pipeline beyond the possibilities offered by the categorization and extraction rule languages.
The project script, split into event handling functions, is executed during the various phases of document analysis.
The main.jr file
Text intelligence engines created with Studio execute the script defined in the
By default, when you create a project with Studio, the
main.jr file only defines the
initialize and the
shutdown functions and contains the commented out prototypes of other functions.
In this state, the script doesn't affect the engine's results, which are thus solely determined by rules, but if you uncomment one or more functions and put specific code inside them, you can control and extend the document analysis pipeline.
All of the predefined or commented out functions in
main.jr are event handlers, namely portions of code automatically executed before or after a specific processing event.
initialize function is executed when the engine starts, while the
shutdown function is executed immediately before the engine is stopped. The other functions are called at specific moments of the document analysis pipeline.
The phases of the pipeline and the events that are fired after those phases are shown in the following figure. Events are listed inside the dashed area.
The handling functions corresponding to events are listed in the following table.
|Event handling function
The following articles in this section describe what you can do within each of these event handling functions, while specific articles are devoted to:
- Predefined functions
- Predefined objects:
CTXobject returns information about the categorization and extraction processes.
DISobject gives access to the results of the disambiguation phase.
LAYobject gives access to the PDFs layout.
REXobject allows for regular expression-based find & replace operations.
UTLobject provides helper utilities.
XMLobject is used to navigate a Studio project taxonomy file.
- Predefined and custom modules.
The script can be debugged with the Studio built-in debugger.