A project can consist of one or more modules and each module corresponds to a separate text intelligence engine.
The suggestion is to have projects containing a single module, so that each project corresponds to an engine. For this reason, the terms module and project are often used interchangeably in this manual.
Each module of the project corresponds to a file system folder with sub-folders. The tree structure can be browsed and managed in the Project tool window.
The main sub-folders are:
This folder is reserved for IntelliJ IDEA configuration files and you don't need to access it.
ann folder contains annotation files.
Annotation file have
.ann extension, a specific icon () and each of them is associated with a test document.
Annotation files are automatically created and updated while annotating target categories, target extractions and sections for test documents.
Annotation files are text files in brat standoff format, so you can open them in the editor, but it is not recommended to edit them by hand.
documents folder is where you put the original documents that need to be prepared to become test documents.
The folder is initially empty and it is up to you to put files in it if necessary. You can organize the files in sub-folders if you want.
gen folder is where Studio puts analysis output files in JSON format for debugging purposes.
After the analysis, you will find three files for each test document that's been analyzed:
XXXX.txt.dis.json.gz, containing the deep linguistic analysis output.
XXXX.txt.ctx.json.gz, containing the categorization and extraction output.
XXXX.txt.api.json.gz, containing the simulation of the final overall output.
XXXX is the name of the test document.
To open these files in the editor, right-click and select Unzip and open in Editor.
package folder contains the files generated by the deploy action.
analysisfolder contains the reports generated during an all-documents analysis.
comparefolder contains the reports generated during the comparison of two analysis reports.
preparefolder contains the reports of the documents preparation process.
Reports are XML files, but you can also find
.csv files. They are the produced when you export a report in CSV format.
rules folder contains the source code of the project module. Keep your rules, lists and script files here.
dic sub-folder contains the Knowledge Graph: do not touch this folder.
Rules files (icon: ) have the
config.cr file is available in new projects.
config.cr file is a good place to put declarations and options, however you are free to put them in other rules files if you prefer.
List files (icon: ) have the
Script files (icon: ) have the
modules sub-folder contains the script file of ready-to-use modules.
Rules and script file are plain text files and you use the Studio editor to manage their contents, but it's always possible editing them from outside Studio.
This is the folder in which to put the documents you use to develop and test the project.
Analysis command act on the files of this folder this way:
- Analyze All Documents analyzes all the files in this folder
- Analyze Selection analyzes selected files
- Analyze Document analyzes the file that is open and in focus inside the editor
Text files that are outside this folder are not taken into account by the analysis operation.
Test documents must be UTF-8 encoded plain-text files with
You can organize files into sub-folders as you like.
If you are unsure about file encoding, you can use the Ensure Charset UTF-8 command.
This folder also contains the files that are the result of document preparation.
Text files are represented by the following icons, according to their status:
|Not annotated and not validated
You can ignore any other folders and files in the project structure. However, you may be interested to know that:
taxonomy.xmlfile contains the category tree for categorization projects. Its is suggested that you manage it through the Taxonomy tool window, however you are free to edit the file by hand.
sensigrafo.xmlfile contains Knowledge Graph information.
.platformfile contains information about the Platform integration (Enterprise Edition only).