Skip to content

Manage libraries

To change the library

  1. Select Select project library.

  2. Select the library of your interest.

The libraries dashboard

Manage the libraries of the extraction project using the appropriate dashboard.

To access the dashboard:

  1. Open the Select project library drop-down menu on the main toolbar.

  2. Select Manage libraries.

The dashboard shows:

  • The project libraries list in the Libraries panel.
  • Various data about the library in focus in the Edit Library panel.

Change the library in focus

To change the library in focus, select the library in the Libraries panel list.

Add a new library in the project

To add a new library:

  1. Select Add library in the Libraries panel.
  2. Enter the library name in New library and select:



    • Generic library, if you want to create a generic library.
  3. Select Create.

  4. In Corpora and folders, select the source for the library. You can select an existing corpus or upload documents from the file system.

    If you choose an existing corpus:

    • Select the corpus.


      If you want to use a corpus, you can use these tools to find it:

      • Use the search bar to look for a corpus. Your search must contain at least three characters.
      • Select Show table view to view your corpora in a table format.
      • Select Show card view to view your corpora in a card format.
      • When in card view, you can sort items by selecting one of the options from the drop-down menu.
      • When in table view, you can sort items by selecting the desired column header.

      The information displayed in the existing corpora is the same displayed in the Corpus info sub-panel of the main dashboard.


      Corpora displayed are related to the Tech version selected previously in the the New categorization project dialog.

    If you choose to upload documents:

    1. Select Upload > Add files to add the files you need. Multiple selection is allowed.
    2. In the Settings tab:
      1. Switch on Pdf document view to process with the Extract technology and view the PDF files in order to work on them in their original format. If not switched, all the documents are converted with the Apache Tika toolkit and used as txt files.
        1. Select or deselect Enable or disable table and title detection to enable or disable table and title detection.
        2. Select or deselect Enable or disable OCR extraction to enable or disable OCR extraction.
      2. Switch off Autodetect language to disable automatic language detection. Select the preferred language manually from the Select language drop-down list.
      3. Switch off Autodetect encoding to disable the automatic character encoding detection. Select manually the preferred encoding from the Select encoding drop-down list.
      4. Select Save as corpus to save your library as a corpus and type a name for it.
    3. In the Documents tab:
      1. Display the documents and folders list to upload. They can be deleted by clicking on the X button at the right of the file name.
    4. Select Upload.

    When the upload is complete, a temporary uploaded corpus is created and made available in the window.

    Supported formats and limits

    Supported document formats are those managed by the Apache Tika toolkit. Documents are automatically converted to plain text files during upload.

    Documents are ignored if:

    • They are empty.
    • They mainly consist of nonsense words.
    • Their language is unrecognized or not supported (in case of automatic language recognition).
    • They exceed the following values:

      • 200 MB for .zip files.
      • 1 MB for .txt files.
      • 100 MB for other file types.
  5. Select Next.


A library imported into a new project that was previously exported from another project will only be available as a folder for this new one.

Import an annotated library

It is possible to upload annotated libraries on Platform. To upload a new library, see the dedicated section above.

Before uploading the library, an annotated library on your computer has two sub-folders inside it, each one with a specific function:

  • test
  • ann

The test sub-folder contains all the text files belonging to the annotated library, while the ann sub-folder contains all the files of the previous sub-folder, but with a different extension, .ann.

The main difference is the following: if you open a file located in the ann sub-folder, you will see all the classes that have been annotated in the corresponding text file located in the test sub-folder. You will find the annotations when looking at the documents either in the list view or in the detail view after the upload.


This is a typical example of what you can find in an .ann file:

T1 Ingredients.Legumes 116 122 lentils
  • The letter T and the number—for example 1—stand for the first extraction that you will see in the text in the Documents tab.
  • Ingredients is the name of the group the class belongs to.
  • Legumes is the class name belonging to the group.
  • 116 122 is the position of the value.
  • lentils is the annotated value.
  • In case of ungrouped classes, the group name will have the same name of the associated class.
  • In case a class or a group consists of only numbers, both of them will be preceded by X_.

Check the library source

Watch the Document sources area.

Add a library as favorite

To add a library as favorite, select Mark as favorite library.

Add documents

To add documents to the library:

  1. Select the library in Libraries panel.
  2. In the Edit Library panel, select Upload documents .

    • Select Show advanced settings:

      • If you want to disable automatic language detection: turn off Autodetect language and choose the language from the Select language drop-down list.
      • If you want to disable automatic character encoding detection: turn off Autodetect encoding and choose the encoding from the Select encoding drop-down list.
      • When done, select Hide advanced settings.
  3. Select Add files.

  4. Select the files to upload.
    The selected files are displayed in a list and can be deleted by clicking on the "X" button at the right of the file name.
  5. Select Upload.

When the upload is complete, a temporary uploaded corpus is created and immediately merged with the current library. After a while the library will be automatically refreshed and the new documents will be available.

Export a library

To export a library:

  1. In the Edit Library panel, select Export library .
  2. In the Export documents window, Export tab, enter the filename or confirm the suggested one and select the format in Extension.
  3. Documents are pre-processed when uploaded and spacing is affected. For example, a sequence of empty lines is compressed to one. If you want to export documents with their original spacing, check Include original text spaced documents.
  4. Select the Export filter, to filter the document set to export:

    • Confirm All documents to download the whole corpus. (Set by default).


    • Select Filtered documents to download a filtered list of the documents according to the following available filters:

      • Documents with annotations.
      • Documents with extractions.
  1. Select Export.
  2. In the Download tab or in the notification in the lower right corner, select Download.

Edit library name

To edit the library name, select Edit library name in the Edit Library panel.

Delete a library

To delete a library, select Delete library in the Edit Library panel.

Change library type

To change the library type, in the Edit Library panel, select the icon at the left of the library name, then choose one of the following types:

Information about documents count, languages and coverage

Watch the right part of the Edit Library panel.

Library usage


To check the library usage in the experiments, watch the Experiments strip.

You can see the:

  • Experiment name.
  • Performance date and time.
  • Author of the experiment in the upper right corner.
  • Library name and type.
  • Model type (if any).
  • Engine type.
  • Match strategy.
  • Preferred Metrics policy.
  • Precision and Recall percentages.

Double-click the experiment card to view it in detail.

Generated models

To check the generated model with the selected library, watch the Generated models strip.

You can see the:

  • Model name.
  • Engine type.
  • Performance date and time.
  • A dot in the upper right corner that is green if the model is published.

Export a model

To export a model:

  1. Select Export model .
  2. In the dialog, type a name and select Export model .
  3. Select Download either in the dialog or in the notification in the lower right corner.

Start an experiment

To start an experiment based on a model:

  1. Select Quick start an experiment .
  2. In the dialog, type an optional name and select a test library, then select Next.
  3. Check the summary and select the match strategy from the Metrics Matching Strategy drop-down menu, then select Start.

Publish and unpublish a model

To publish a model:

  1. Select Publish model .
  2. In the dialog, in case of first publishing, type an optional name and select Publish Model.


    The dot in the upper right corner will turn into green.

To unpublish a model, select Unpublish model .