Prepare documents
It is sometimes necessary to pre-process input documents before analyzing them.
For example, if the input document is the result of OCR, a find-and-replace operation can fix misinterpretations like lowercase "l" letter exchanged for digit "1". Also, if dealing with social media messages, it may be useful to replace abbreviations and acronyms with words that facilitate linguistic analysis.
Document pre-processing can be performed by an external process being run before the text intelligence engine or by the text intelligence engine itself using the onPrepare
scripting function.
Every text intelligence engine you produce and deploy with Studio will invoke the onPrepare
function each time a document is submitted to it and before text analysis. That function is consequently the right place to put script code that manipulates the text to improve the subsequent analysis.
Interactive analysis commands like Analyze, however, do not trigger the onPrepare
function, and they act on the files inside the test
folder considering them as already prepared. So, in order to simulate pre-processing, use the document preparation procedure described below.
-
Put or create original documents in the
documents
folder.Tip
You can organize the files in sub-folders.
-
In the Project window, select the files and/or folders you want to pre-process. If you select the
documents
folder, all its contents will be pre-processed. -
Right-click any of the selected items and select Prepare Selection. If your selection includes sub-folders of the
documents
folder, they are re-created in thetest
folder.Warning
If the
test
folder already contains files and/or folders with the same name and location as items you have prepared, the items in thetest
folder are overwritten. If you are interested in keeping them, then, make a backup copy.
The outcome of the operation will be displayed in the Event Log tool window and in the Output panel of the Console tool window.
If two ore more documents were prepared, a report will be produced too and it will be accessible through the Report tool window.