Project structure
A project can consist of one or more modules and each module corresponds to a separate text intelligence engine.
The suggestion is to have projects containing a single module, so that each project corresponds to an engine. For this reason, the terms module and project are often used interchangeably in this manual.
Each module of the project corresponds to a file system folder with sub-folders. The tree structure can be browsed and managed in the Project tool window.
The main sub-folders are:
.idea
ann
documents
gen
package
reports
rules
test
.idea
This folder is reserved for IntelliJ IDEA configuration files and you don't need to access it.
ann
The ann
folder contains annotation files.
Annotation file have .ann
extension, a specific icon () and each of them is associated with a test document.
Annotation files are automatically created and updated while annotating target categories, target extractions and sections for test documents.
Annotation files are text files in brat standoff format, so you can open them in the editor, but it is not recommended to edit them by hand.
documents
The documents
folder is where you put the original documents that need to be prepared to become test documents.
The folder is initially empty and it is up to you to put files in it if necessary. You can organize the files in sub-folders if you want.
gen
The gen
folder is where Studio puts analysis output files in JSON format for debugging purposes.
After the analysis, you will find three files for each test document that's been analyzed:
XXXX.txt.dis.json.gz
, containing the deep linguistic analysis output.XXXX.txt.ctx.json.gz
, containing the categorization and extraction output.XXXX.txt.api.json.gz
, containing the simulation of the final overall output.
where XXXX
is the name of the test document.
To open these files in the editor, right-click and select Unzip and open in Editor.
package
The package
folder contains the files generated by the deploy action.
reports
This folder is where Studio stores all analysis, comparison and preparation reports. This folder contains the following sub-folders:
- The
analysis
folder contains the reports generated during an all-documents analysis. - The
compare
folder contains the reports generated during the comparison of two analysis reports. - The
prepare
folder contains the reports of the documents preparation process.
Reports are XML files, but you can also find .csv
files. They are the produced when you export a report in CSV format.
rules
The rules
folder contains the source code of the project module. Keep your rules, lists and script files here.
The dic
sub-folder contains the knowledge graph: do not touch this folder.
Note
It is possible to replace or extend and patch the project's knowledge graph.
Rules files (icon: ) have the .cr
extension.
An empty config.cr
file is available in new projects.
Tip
The config.cr
file is a good place to put declarations and options, however you are free to put them in other rules files if you prefer.
List files (icon: ) have the .cl
extension.
Script files (icon: ) have the .jr
extension.
The modules
sub-folder contains the script file of ready-to-use modules.
Note
Rules and script file are plain text files and you use the Studio editor to manage their contents, but it's always possible editing them from outside Studio.
test
This is the folder in which to put the documents you use to develop and test the project.
Analysis command act on the files of this folder this way:
- Analyze All Documents analyzes all the files in this folder
- Analyze Selection analyzes selected files
- Analyze Document analyzes the file that is open and in focus inside the editor
Info
Text files that are outside this folder are not taken into account by the analysis operation.
Test documents must be UTF-8 encoded plain-text files with .txt
extension.
You can organize files into sub-folders as you like.
Tip
If you are unsure about file encoding, you can use the Ensure Charset UTF-8 command.
This folder also contains the files that are the result of document preparation.
Text files are represented by the following icons, according to their status:
Icon | Status |
---|---|
Not annotated and not validated | |
Annotated | |
Validated |
Other items
You can ignore any other folders and files in the project structure. However, you may be interested to know that:
- The
taxonomy.xml
file contains the category tree for categorization projects. Its is suggested that you manage it through the Taxonomy tool window, however you are free to edit the file by hand. - The
sensigrafo.xml
file contains knowledge graph information. - The
.platform
file contains information about the Platform integration.