Skip to content

Make experiments in extraction projects

Overview

Once annotations have been created for all defined information classes, both in a training library and in a test library, you can perform experiments which consist of creating the extraction model and applying it to a test library.

The training library must meet the following requirements:

  • At least ten annotated documents.
  • At least one class with ten annotations.

Platform provides the following model types for extraction projects:

  • Auto-ML Extraction.
  • Explainable Extraction.
  • Studio, that is an imported CPK.

The Auto-ML Extraction model type creates a Machine Learning model.

The Explainable Extraction model type creates a symbolic model.

The Studio model type creates a model based on an expert.ai Studio CPK imported model.

To start an experiment:

  1. Select Start an experiment on the toolbar of the project dashboard.
    The experiment wizard starts.
  2. In the Start an experiment dialog:

    2.1. Enter the experiment name in Name or leave empty for automatic assignment.

    2.2. Select the test library in the Test library drop-down menu.

    2.3. Select an available model type:

    Or:

    Or:

    and then follow the wizard. Select Back to go back to the previous stage or Cancel to quit.

  3. In Summary select or confirm the matching strategy in the Metrics Matching Strategy drop-down menu that can be set in the project settings, then select Start to start the experiment.

    Note

    The match strategy affects the difference in values ​​in the result metrics.

The experiment progress window is displayed during the engine process.

To terminate the process before its end, select Delete experiment.

The process consists of six sequential stages:

  1. Initialization
  2. Model generation preparation
  3. Model generation
  4. Document analysis preparation
  5. Document analysis
  6. Experiment wrap-up

Note

If the experiment fails, the Info tab appears displaying information and the type of errors. You can also check the Activity log tab for further information.

Once the process is completed, the analytics are displayed in the Experiments tab, Statistics sub-tab, where it is possible to analyze and interpret the results.

Note

Experiment results are associated to the test libraries you chose in the experiment wizard, so it is common that the Experiments tab is disabled for other libraries.

Auto-ML Extraction engine procedure

  1. Select Hide advanced parameters, in the bottom of the dialog, if you don't want to display the advanced parameters and thus to avoid to set it manually. In the wizard they are marked with a blue caption in italics.
  2. Select the training library in the Training library drop-down menu.
  3. Select the Training documents selection policy among:

    • Only validated annotated documents (strict)
    • Only validated or annotated documents (strict) (Selected by default)
    • Prefer validated documents
    • Prefer annotated documents
    • Random selection
  4. Select Ignore non-annotated areas if you want to consider for training only the sentences around annotated text in non-validated documents.

  5. If Ignore non-annotated areas and Hide advanced parameters are switched-on, enter a value in Annotated area window size, that is the number of sentences around the annotated text to be included in the training set. Number between 0 and 10.
  6. Select Next to go on.
  7. Select the Machine Learning model type (It is possible to select maximum three models to have multiple experiments in one time).

    Warning

    • If you select the SVM sliding window model, the following configuration windows:

      • Hyperparameters
      • F-Beta

    will not be available in the experiment.

    Warning

    • If you select more than one model type, the following configuration windows:

      • Feature space
      • Hyperparameters
      • F-Beta
      • Auto ML parameters

    will not be available in the experiment.

  8. Select Next to go on.

  9. Switch-off or switch-on Automatic features selection in Feature space (advanced): which data elements to use in feature vector creation to enable or disable the automatic selection of the best parameters combination.
  10. Select Next to go on.
  11. Select Activate Auto-ML on every parameter in Model-specific hyperparameters to enable automatic hyperparameters configuration. Deselected by default.
  12. Select Next to go on.
  13. Select the Precision and recall balance parameters, then select Next to go on.
  14. Select the Machine Learning automatic self-tuning process parameters, then Next to go on.

Explainable Extraction engine procedure

  1. Select the training library in the Training library drop-down menu.
  2. Select the Training documents selection policy among:

    • Only validated annotated documents (strict)
    • Only validated or annotated documents (strict) (Selected by default)
    • Prefer validated documents
    • Prefer annotated documents
    • Random selection
  3. Select Hide advanced parameters if you don't want to display the advanced parameters and thus to avoid to set it manually. In the wizard they are marked with a blue caption in italics.

  4. Select Next to go on.
  5. Select the Support, confidence and tolerance parameters, then Next to go on.
  6. Select the Active features options, then Next to go on.
  7. Select the Options for selecting best rules, then Next to go on.

Studio engine procedure

  1. Select the model in Model selection, then Next to go on.
  2. Check the remap in Remapper, then Next to go on.
  3. In Summary, select the match strategy in the Metrics matching strategy drop-down menu and, from the PDF document view drop-down menu, select:
    • Never to analyze documents in plain text format.
    • Mixed to analyze documents in Extract format when available, plain text otherwise.
    • Strict to analyze only documents in Extract format.