Skip to content

Report

Overview

The Report tool window displays and allows managing the reports produced by multi-document preparation and analysis operations.
It also allows comparing the reports of two analyses to see if any improvements or regressions have occurred.

The window contains a table with these columns:

Name Description
Type Report type (A = analysis, P = preparation, C = comparison )
ID Report ID
Description Report name
Date Operation time
Duration Operation duration
Files Document count
Success Success rate expressed as a percentage
Categorization/Precision Categorization Precision expressed as a percentage
Categorization/Recall Categorization Recall expressed as a percentage
Categorization/F-Measure Categorization F-Measure expressed as a percentage
Extraction/Precision Extraction Precision expressed as a percentage
Extraction/Recall Extraction Recall expressed as a percentage
Extraction/F-Measure Extraction F-Measure expressed as a percentage

In case of comparison reports, an icon on the left of the description indicates the qualitative trend, i.e. the difference in quality between the two analyzes compared:

Icon Description
Progress: the second analysis yielded better results, there was an improvement
Stability: overall quality was the same for both analyses, tie
Regression: the second analysis produced worse results

The percentage values can be displayed in a different color in case they reach a target value. Target values and highlighted colors can be set in Studio Settings > Project > Quality > Target reached color.

The context menu contains:

Value Description
Edit Description Change the report description
Repeat Analysis Repeat report with the same files of the previous report

Available mouse commands are:

Command Description
Click a column header Change sort order
Double-click a row Show the Analysis Details window for A type reports, an XML file for P type reports and the Analysis Comparison window for C type reports.

The toolbar contains:

Icon Name Description
Module list Module list
Filter list Report type filter
02_1.png Categorization Quality Data Display or hide categorization data
02_2.png Extraction Quality Data Display or hide extraction data
compare.png Compare Reports Compare two analysis reports creating a comparison report
View Reports Display the report details in the Analysis Details window
.png Delete Reports Delete the selected report
ref.png Refresh Refresh report list

It is also possible to delete a report with the Del key.

The info bar shows the reports count.

Analysis Details

The Analysis Details window shows the details of a type A report.

The window contains the panels described below and a common command for all panels:

Icon Name Description
Position Switch Minimize the window. Select it again to re-open the window

Documents

This panel shows file-by-file data for the selected report, each row represents a document.

The columns are the following:

Name Description
Validation Status Validated or not validated file or file not found
File File name
Size File size in bytes
Duration Analysis duration
Success Analysis outcome
Error Error
Categories Number of winner categories
Extractions Number of extractions
Categorization TP Categorization true positives, i.e. number of target categories matched
Categorization FP Categorization false positives, i.e. number of unexpected results
Categorization FN Categorization false negatives, i.e. number of target categories not matched
Categorization Precision Categorization Precision expressed as a percentage
Categorization Recall Categorization Recall expressed as a percentage
Categorization F-Measure Categorization F-Measure expressed as a percentage
Extraction TP Extraction true positives, i.e. number of target extractions matched
Extraction FP Extraction false positives, i.e. number of unexpected results
Extraction FN Extraction false negatives, i.e. number of target extractions not matched
Extraction Precision Extraction Precision expressed as a percentage
Extraction Recall Extraction Recall expressed as a percentage
Extraction F-Measure Extraction F-Measure F-Measure expressed as a percentage

The toolbar contains:

Icon Name Description
Filter by result Analysis outcome filter (FAILURE or not)
Filter by file name Filter report by file name
Categorization Quality Data Display or hide categorization data
03_2.png Extraction Quality Data Display or hide extraction data
03_4.png ** Error Column ** Display or hide the Error column
03_5.png ** Export CSV ** Export all the files data in Comma-separated values (CSV) format

Available mouse commands are:

Command Description
Click a column header Change sort order
Double-click a row Display the file in the editing area

The info bar shows the files count.

Taxonomy

This panel shows the results of the categorization against the project taxonomy.

It contains two areas. The upper area shows taxonomy information and contains a table with a row for each domain.
Children nodes are separated from father nodes by a slash (see picture above with Economy/Political_Economy).

The table has these columns:

Name Description
Path Domain name
Label Domain label
Annotations Number of annotations
TP True positives, i.e. number of times the category was returned as a result and matched an annotations (matches)
FP False positives, i.e. number of times the category was returned as a result, but was not annotated as a categorization target (unexpected results)
FN False negatives, i.e. number of documents for which the category was annotated as a categorization target, but didn't come out as a result (missed matches)
Precision Categorization Precision expressed as a percentage
Recall Categorization Recall expressed as a percentage
F-Measure Categorization F-Measure expressed as a percentage

The only toolbar command is Export CSV , that allows you to export the taxonomy with annotations, results and metrics in a .csv format.

The lower area shows data for all analyzed documents related to the category selected in the upper area.
It contains a table with these columns:

Name Description
Validated Validated document
File Document file name
Annotations 1 if the selected category was annotated as a target categorization result for the document, 0 otherwise
Results 1 if the selected category was returned as a categorization result for the document, 0 otherwise
TP True positive: 1 if the selected category was annotated as a target categorization result for the document and was also returned as a categorization result for the document (match), 0 otherwise
FP False positive: 1 if the category was returned as a categorization result for the document, but was not annotated as a target categorization result for the document (unexpected result), 0 otherwise
FN False negative: 1 if the selected category was annotated as a target categorization result for the document, but was not returned as a categorization result for the document (missed match), 0 otherwise

It is possible to filter the node-related documents with the Filter toolbar:

Icon Name Description
TP Select only hits that are true positives
FP Select only hits that are false positives
FN Select only hits that are false negatives
AN Select only hits from annotated documents
Reset filters Show the complete file list without filters
Create Subset of Files Copy listed files to a new folder under the test directory

The info bar shows the files count.

Templates

This panel shows the extraction results against the defined templates.

It contains two areas. The upper area shows templates information in a table.
The template name is grayed out, while the fields are separated from the template name by a slash (see picture above). The table has these columns:

Name Description
Template/Field Template or field name
Attributes Field attributes
Annotations Number of annotations (for fields only)
Results Total number of annotations (for template names)
TP True positives, i.e. number of times actual extractions matched annotations (matches)
FP False positives, i.e. number of times actual extractions didn't match any annotation (unexpected results)
FN False negatives, i.e. number of annotations that were not matched by actual extractions (missed matches)
Precision Extraction Precision expressed as a percentage
Recall Extraction Recall expressed as a percentage
F-Measure Extraction F-Measure expressed as a percentage

The only toolbar command is Export CSV , that allows you to export the template list with annotations, results and metrics in a .csv format.

The lower area shows data for documents with annotations or actual extractions related to the template or field selected in the upper area.
It contains a table with these columns:

Name Description
Validated Validated documents
File Document file name
Annotations Number of annotations
Results Number of actual extractions
TP True positives: number of actual extractions that matched annotations (matches)
FP False positives: number of actual extractions that didn't match any annotations (unexpected results)
FN False negatives: number of annotations that were not matched by actual extractions (missed matches)

It is possible to filter the field-related documents with the Filter toolbar:

Icon Name Description
TP Select only hits that are true positives
FP Select only hits that are false positives
FN Select only hits that are false negatives
AN Select only hits from annotated documents
Reset filters Show the complete file list without filters
Create Subset of Files Copy listed files to a new folder under the test directory

The info bar shows the files count.

Properties

This panel shows a lot of information about the report grouped as follows:

  • Module: details of the project module
  • Report: information on the selected report
  • Build: information about the software version and the build operation
  • Rules: number of rules per type
  • Files: number of files per type
  • Statistics: statistical information on the analysis
  • Timings: break-down of the times required for the various phases of the analysis
  • Options: Document analysis options

Profiling

The Profiling tab allows you to keep a statistical profile of your report in terms of the slowest attributes impacting the report analysis time.

Note

To view results in this table, in the Studio Settings, General group, set Enable Analysis Debug Info to true.

This tab has two panels: one on the left showing you the slowest attributes and one on the right, which is a rule preview panel.

The panel on the left has the following columns:

Name Description
Rule attribute Attribute of the rule
Source file Rule source file
Begin Rule beginning line number
End Rule ending line number
Count Number of rule hits
Elapsed Time Extra elapsed time
Frequency Number of rule hits increasing the report time

The right panel shows a preview of the selected rule on the left one.

  • Double-click one of the attributes on the left panel to jump to the source file.
  • To sort the attributes according to a column header, select a column header.

Note

You can also find the ten slowest attributes in the .ctx file of the gen folder introduced by "attr_stats".

This tab has a single command:

Icon Name Description
Export CSV Export the report profile in a .csv format.

Analysis Comparison

The Analysis Comparison window shows the details of a type C report, i.e. the comparison of two analysis report.

There are two tabs:

  • All Documents, showing the metrics for the whole reports documents.
  • Common Documents, showing the metrics for the common documents between the reports.

This is the information shown in both tabs:

Name Description
Module Project module name
Trend Quality trend considering the changes from the first to the second report
All Documents/Common Documents Number of documents of the reports, separated by an arrow/Number of common documents between the reports
Analysis Date ID and time of the two analysis reports
Properties Report properties comparison
Extraction Extraction performance metrics
Categorization Categorization performance metrics

The Details buttons open windows that show side-by-side comparison of report data. These windows are described below.

Properties

The Properties window shows a side-by-side comparison of the properties of the two reports.

The information for each report is the same as in the Properties panel of the Analysis Details window.

Extraction results

The Extraction results window shows a detailed comparison of extraction results.

It contains two areas. The upper area shows templates information in a table.
The table is initially collapsed and can be expanded row by row with the expand and collapse commands on the left side of the row or with the toolbar commands. First-level rows correspond to templates, second-level rows correspond to template's fields.
The table has these columns:

Name Description
Name Template or field name
Attributes Field attribute
Annotations Number of annotations
Results Total number of extractions
TP True positives counters
FP False positives counters
FN False negatives counters
Precision Precision data
Recall Recall data
F-Measure F-Measure data

By default, columns TP, FP, FN, Precision, Recall and F-Measure display only the difference or delta (Δ) between the metrics of the two reports. The delta symbol is colored to indicate quality trend:

  • Green: progress
  • Black: stability
  • Red: regression

The header of these columns act as a toggle switch to display or hide the values in addition to the difference.

The info bar shows first-level nodes count.

Toolbar commands are:

Icon Name Description
Expand All Expand all the tree nodes
Collapse All Collapse all the tree nodes
Toggle Attribute Visibility Display or hide the Attributes column
Export table to CSV Export templates, fields and their quality results in CSV format

The lower area shows data for documents with annotations or actual extractions related to the template or field selected in the upper area.
It contains a table with these columns:

Name Description
Validated Validated document
File Document file name
Annotations Number of annotations
Results Number of actual extractions
TP True positives: number of actual extractions that matched annotations (matches)
FP False positives: number of actual extractions that didn't match any annotations (unexpected results)
FN False negatives: number of annotations that were not matched by actual extractions (missed matches)

Numbers between brackets refer to the older report, the other numbers are from the newer report.

The info bar shows the files count.

The toolbar contains these controls:

Icon Name Description
Docs In Filter the list to show only documents that have actual categorization results as for the newer report and did not have any categorization result as for the older report
Docs Out Filter the list to show only documents that don't have actual categorization results as for the newer report, but had categorization results as for the older report
Docs Won Filter the document list to show only documents that have won true positives
Docs Lost Filter the document list to show only documents that have lost true positives
Docs Changed Filter the document list to show only documents that have changed between the reports
Reset filters Remove the filters and display the complete list
Export selection to CSV Export a filtered document list in CSV format according to the field and/or one of the other filters in this table
Export unfiltered results to CSV Export the full document list in CSV format

Categorization results

The Categorization results window shows a detailed comparison of categorization results.

It contains two areas. The upper area shows taxonomy information in a table.
The table is initially collapsed and can be expanded row by row with the expand and collapse commands on the left side of the row or with the toolbar commands. The table has these columns:

Name Description
Name Domain name
Label Domain label
Annotations Number of documents in which the domain was annotated as a target categorization result
TP True positives counters
FP False positives counters
FN False negatives counters
Precision Precision data
Recall Recall data
F-Measure F-Measure data

By default, columns TP, FP, FN, Precision, Recall and F-Measure display only the difference or delta (Δ) between the metrics of the two reports.
The header of these columns act as a toggle switch to display or hide the values in addition to the difference.

The info bar shows first-level nodes count.

Toolbar commands are:

Icon Name Description
Expand All Expand all the tree nodes
Collapse All Collapse all the tree nodes
Toggle Attribute Visibility Display or hide the Attributes column
Export table to CSV Export nodes and their quality results in CSV format

The lower area shows data for documents with annotations or actual categorization results relative to the category selected in the upper area.
It contains a table with these columns:

Name Description
Validated Validated document
File Document file name
Annotations 1 if the selected category was annotated as a target categorization result for the document, 0 otherwise
Results 1 if the selected category was returned as a categorization result for the document, 0 otherwise
TP True positive: 1 if the selected category was annotated as a target categorization result for the document and was also returned as a categorization result for the document (match), 0 otherwise
FP False positive: 1 if the category was returned as a categorization result for the document, but was not annotated as a target categorization result for the document (unexpected result), 0 otherwise
FN False negative: 1 if the selected category was annotated as a target categorization result for the document, but was not returned as a categorization result for the document (missed match), 0 otherwise

Numbers between brackets refer to the older report, the other numbers are from the newer report.

The info bar shows the files count.

The toolbar contains these controls:

Icon Name Description
Docs In Filter the list to show only documents that have actual extractions as for the newer report and did not have any extraction as for the older report
Docs Out Filter the list to show only documents that don't have actual extractions as for the newer report, but had extractions as for the older report
Docs Won Filter the document list to show only documents that have won true positives
Docs Lost Filter the document list to show only documents that have lost true positives
Docs Changed Filter the document list to show only documents that have changed between the reports
Reset filters Remove the filters and display the complete list
Export selection to CSV Export a filtered document list in CSV format according to the field and/or one of the other filters in this table
Export unfiltered results to CSV Export the full document list in CSV format