Skip to content

Project structure

A project can consist of one or more modules and each module corresponds to a separate text intelligence engine.

The suggestion is to have projects containing a single module, so that each project corresponds to an engine. For this reason, the terms module and project are often used interchangeably in this manual.

Each module of the project corresponds to a file system folder with sub-folders. The tree structure can be browsed and managed in the Project tool window.

The main sub-folders are:

  • .idea
  • ann
  • documents
  • gen
  • package
  • reports
  • rules
  • test

.idea

This folder is reserved for IntelliJ IDEA configuration files and you don't need to access it.

ann

The ann folder contains annotation files.
Annotation file have .ann extension, a specific icon () and each of them is associated with a test document.
Annotation files are automatically created and updated while annotating target categories, target extractions and sections for test documents.

Annotation files are text files in brat standoff format, so you can open them in the editor, but it is not recommended to edit them by hand.

documents

The documents folder is where you put the original documents that need to be prepared to become test documents.
The folder is initially empty and it is up to you to put files in it if necessary. You can organize the files in sub-folders if you want.

gen

The gen folder is where Studio puts analysis output files in JSON format for debugging purposes.
After the analysis, you will find three files for each test document that's been analyzed:

  • XXXX.txt.dis.json.gz, containing the deep linguistic analysis output.
  • XXXX.txt.ctx.json.gz, containing the categorization and extraction output.
  • XXXX.txt.api.json.gz, containing the simulation of the final overall output.

where XXXX is the name of the test document.

To open these files in the editor, right-click and select Unzip and open in Editor.

package

The package folder contains the files generated by the deploy action.

reports

This folder is where Studio stores all analysis, comparison and preparation reports. This folder contains the following sub-folders:

  • The analysis folder contains the reports generated during an all-documents analysis.
  • The compare folder contains the reports generated during the comparison of two analysis reports.
  • The prepare folder contains the reports of the documents preparation process.

Reports are XML files, but you can also find .csv files. They are the produced when you export a report in CSV format.

rules

The rules folder contains the source code of the project module. Keep your rules, lists and script files here.

The dic sub-folder contains the knowledge graph: do not touch this folder.

Note

It is possible to replace or extend and patch the project's knowledge graph.

Rules files (icon: ) have the .cr extension.
An empty config.cr file is available in new projects.

Tip

The config.cr file is a good place to put declarations and options, however you are free to put them in other rules files if you prefer.

List files (icon: ) have the .cl extension.

Script files (icon: ) have the .jr extension.

The modules sub-folder contains the script file of ready-to-use modules.

Note

Rules and script file are plain text files and you use the Studio editor to manage their contents, but it's always possible editing them from outside Studio.

test

This is the folder in which to put the documents you use to develop and test the project.
Analysis command act on the files of this folder this way:

  • Analyze All Documents analyzes all the files in this folder
  • Analyze Selection analyzes selected files
  • Analyze Document analyzes the file that is open and in focus inside the editor

Info

Text files that are outside this folder are not taken into account by the analysis operation.

Test documents must be UTF-8 encoded plain-text files with .txt extension.
You can organize files into sub-folders as you like.

Tip

If you are unsure about file encoding, you can use the Ensure Charset UTF-8 command.

This folder also contains the files that are the result of document preparation.

Text files are represented by the following icons, according to their status:

Icon Status
Not annotated and not validated
Annotated
Validated

Other items

You can ignore any other folders and files in the project structure. However, you may be interested to know that:

  • Thetaxonomy.xml file contains the category tree for categorization projects. Its is suggested that you manage it through the Taxonomy tool window, however you are free to edit the file by hand.
  • The sensigrafo.xml file contains knowledge graph information.
  • The .platform file contains information about the Platform integration.