Report
Overview
The Report tool window displays and allows managing the reports produced by multi-document preparation and analysis operations.
It also allows comparing the reports of two analyses to see if any improvements or regressions have occurred.
The window contains a table with these columns:
Name | Description |
---|---|
Type | Report type (A = analysis, P = preparation, C = comparison ) |
ID | Report ID |
Description | Report name |
Date | Operation time |
Duration | Operation duration |
Files | Document count |
Success | Success rate expressed as a percentage |
Categorization/Precision | Categorization Precision expressed as a percentage |
Categorization/Recall | Categorization Recall expressed as a percentage |
Categorization/F-Measure | Categorization F-Measure expressed as a percentage |
Extraction/Precision | Extraction Precision expressed as a percentage |
Extraction/Recall | Extraction Recall expressed as a percentage |
Extraction/F-Measure | Extraction F-Measure expressed as a percentage |
In case of comparison reports, an icon on the left of the description indicates the qualitative trend, i.e. the difference in quality between the two analyzes compared:
Icon | Description |
---|---|
Progress: the second analysis yielded better results, there was an improvement | |
Stability: overall quality was the same for both analyses, tie | |
Regression: the second analysis produced worse results |
The percentage values can be displayed in a different color in case they reach a target value. Target values and highlighted colors can be set in Studio Settings > Project > Quality > Target reached color.
The context menu contains:
Value | Description |
---|---|
Edit Description | Change the report description |
Repeat Analysis | Repeat report with the same files of the previous report |
Available mouse commands are:
Command | Description |
---|---|
Click a column header | Change sort order |
Double-click a row | Show the Analysis Details window for A type reports, an XML file for P type reports and the Analysis Comparison window for C type reports. |
The toolbar contains:
Icon | Name | Description |
---|---|---|
Module list | Module list | |
Filter list | Report type filter | |
Categorization Quality Data | Display or hide categorization data | |
Extraction Quality Data | Display or hide extraction data | |
Compare Reports | Compare two analysis reports creating a comparison report | |
View Reports | Display the report details in the Analysis Details window | |
Delete Reports | Delete the selected report | |
Refresh | Refresh report list |
It is also possible to delete a report with the Del
key.
The info bar shows the reports count.
Analysis Details
The Analysis Details window shows the details of a type A report.
The window contains the panels described below and a common command for all panels:
Icon | Name | Description |
---|---|---|
Position Switch | Minimize the window. Select it again to re-open the window |
Documents
This panel shows file-by-file data for the selected report, each row represents a document.
The columns are the following:
Name | Description |
---|---|
Validation Status | Validated or not validated file or file not found |
File | File name |
Size | File size in bytes |
Duration | Analysis duration |
Success | Analysis outcome |
Error | Error |
Categories | Number of winner categories |
Extractions | Number of extractions |
Categorization TP | Categorization true positives, i.e. number of target categories matched |
Categorization FP | Categorization false positives, i.e. number of unexpected results |
Categorization FN | Categorization false negatives, i.e. number of target categories not matched |
Categorization Precision | Categorization Precision expressed as a percentage |
Categorization Recall | Categorization Recall expressed as a percentage |
Categorization F-Measure | Categorization F-Measure expressed as a percentage |
Extraction TP | Extraction true positives, i.e. number of target extractions matched |
Extraction FP | Extraction false positives, i.e. number of unexpected results |
Extraction FN | Extraction false negatives, i.e. number of target extractions not matched |
Extraction Precision | Extraction Precision expressed as a percentage |
Extraction Recall | Extraction Recall expressed as a percentage |
Extraction F-Measure | Extraction F-Measure F-Measure expressed as a percentage |
The toolbar contains:
Icon | Name | Description |
---|---|---|
Filter by result | Analysis outcome filter (FAILURE or not) | |
Filter by file name | Filter report by file name | |
Categorization Quality Data | Display or hide categorization data | |
Extraction Quality Data | Display or hide extraction data | |
** Error Column ** | Display or hide the Error column | |
** Export CSV ** | Export all the files data in Comma-separated values (CSV) format |
Available mouse commands are:
Command | Description |
---|---|
Click a column header | Change sort order |
Double-click a row | Display the file in the editing area |
The info bar shows the files count.
Taxonomy
This panel shows the results of the categorization against the project taxonomy.
It contains two areas. The upper area shows taxonomy information and contains a table with a row for each domain.
Children nodes are separated from father nodes by a slash (see picture above with Economy/Political_Economy).
The table has these columns:
Name | Description |
---|---|
Path | Domain name |
Label | Domain label |
Annotations | Number of annotations |
TP | True positives, i.e. number of times the category was returned as a result and matched an annotations (matches) |
FP | False positives, i.e. number of times the category was returned as a result, but was not annotated as a categorization target (unexpected results) |
FN | False negatives, i.e. number of documents for which the category was annotated as a categorization target, but didn't come out as a result (missed matches) |
Precision | Categorization Precision expressed as a percentage |
Recall | Categorization Recall expressed as a percentage |
F-Measure | Categorization F-Measure expressed as a percentage |
The only toolbar command is Export CSV , that allows you to export the taxonomy with annotations, results and metrics in a .csv
format.
The lower area shows data for all analyzed documents related to the category selected in the upper area.
It contains a table with these columns:
Name | Description |
---|---|
Validated | Validated document |
File | Document file name |
Annotations | 1 if the selected category was annotated as a target categorization result for the document, 0 otherwise |
Results | 1 if the selected category was returned as a categorization result for the document, 0 otherwise |
TP | True positive: 1 if the selected category was annotated as a target categorization result for the document and was also returned as a categorization result for the document (match), 0 otherwise |
FP | False positive: 1 if the category was returned as a categorization result for the document, but was not annotated as a target categorization result for the document (unexpected result), 0 otherwise |
FN | False negative: 1 if the selected category was annotated as a target categorization result for the document, but was not returned as a categorization result for the document (missed match), 0 otherwise |
It is possible to filter the node-related documents with the Filter toolbar:
Icon | Name | Description |
---|---|---|
TP | Select only hits that are true positives | |
FP | Select only hits that are false positives | |
FN | Select only hits that are false negatives | |
AN | Select only hits from annotated documents | |
Reset filters | Show the complete file list without filters | |
Create Subset of Files | Copy listed files to a new folder under the test directory |
The info bar shows the files count.
Templates
This panel shows the extraction results against the defined templates.
It contains two areas. The upper area shows templates information in a table.
The template name is grayed out, while the fields are separated from the template name by a slash (see picture above).
The table has these columns:
Name | Description |
---|---|
Template/Field | Template or field name |
Attributes | Field attributes |
Annotations | Number of annotations (for fields only) |
Results | Total number of annotations (for template names) |
TP | True positives, i.e. number of times actual extractions matched annotations (matches) |
FP | False positives, i.e. number of times actual extractions didn't match any annotation (unexpected results) |
FN | False negatives, i.e. number of annotations that were not matched by actual extractions (missed matches) |
Precision | Extraction Precision expressed as a percentage |
Recall | Extraction Recall expressed as a percentage |
F-Measure | Extraction F-Measure expressed as a percentage |
The only toolbar command is Export CSV , that allows you to export the template list with annotations, results and metrics in a .csv
format.
The lower area shows data for documents with annotations or actual extractions related to the template or field selected in the upper area.
It contains a table with these columns:
Name | Description |
---|---|
Validated | Validated documents |
File | Document file name |
Annotations | Number of annotations |
Results | Number of actual extractions |
TP | True positives: number of actual extractions that matched annotations (matches) |
FP | False positives: number of actual extractions that didn't match any annotations (unexpected results) |
FN | False negatives: number of annotations that were not matched by actual extractions (missed matches) |
It is possible to filter the field-related documents with the Filter toolbar:
Icon | Name | Description |
---|---|---|
TP | Select only hits that are true positives | |
FP | Select only hits that are false positives | |
FN | Select only hits that are false negatives | |
AN | Select only hits from annotated documents | |
Reset filters | Show the complete file list without filters | |
Create Subset of Files | Copy listed files to a new folder under the test directory |
The info bar shows the files count.
Properties
This panel shows a lot of information about the report grouped as follows:
- Module: details of the project module
- Report: information on the selected report
- Build: information about the software version and the build operation
- Rules: number of rules per type
- Files: number of files per type
- Statistics: statistical information on the analysis
- Timings: break-down of the times required for the various phases of the analysis
- Options: Document analysis options
Profiling
The Profiling tab allows you to keep a statistical profile of your report in terms of the slowest attributes impacting the report analysis time.
Note
To view results in this table, in the Studio Settings, General group, set Enable Analysis Debug Info to true.
This tab has two panels: one on the left showing you the slowest attributes and one on the right, which is a rule preview panel.
The panel on the left has the following columns:
Name | Description |
---|---|
Rule attribute | Attribute of the rule |
Source file | Rule source file |
Begin | Rule beginning line number |
End | Rule ending line number |
Count | Number of rule hits |
Elapsed Time | Extra elapsed time |
Frequency | Number of rule hits increasing the report time |
The right panel shows a preview of the selected rule on the left one.
- Double-click one of the attributes on the left panel to jump to the source file.
- To sort the attributes according to a column header, select a column header.
Note
You can also find the ten slowest attributes in the .ctx file of the gen folder introduced by "attr_stats"
.
This tab has a single command:
Icon | Name | Description |
---|---|---|
Export CSV | Export the report profile in a .csv format. |
Analysis Comparison
The Analysis Comparison window shows the details of a type C report, i.e. the comparison of two analysis report.
There are two tabs:
- All Documents, showing the metrics for the whole reports documents.
- Common Documents, showing the metrics for the common documents between the reports.
This is the information shown in both tabs:
Name | Description |
---|---|
Module | Project module name |
Trend | Quality trend considering the changes from the first to the second report |
All Documents/Common Documents | Number of documents of the reports, separated by an arrow/Number of common documents between the reports |
Analysis Date | ID and time of the two analysis reports |
Properties | Report properties comparison |
Extraction | Extraction performance metrics |
Categorization | Categorization performance metrics |
The Details buttons open windows that show side-by-side comparison of report data. These windows are described below.
Properties
The Properties window shows a side-by-side comparison of the properties of the two reports.
The information for each report is the same as in the Properties panel of the Analysis Details window.
The Filter properties with identical values toggle allows you to turn on and off the display of properties that have the same value in both reports.
Extraction results
The Extraction results window shows a detailed comparison of extraction results.
It contains two areas. The upper area shows templates information in a table.
The table is initially collapsed and can be expanded row by row with the expand and collapse commands on the left side of the row or with the toolbar commands. First-level rows correspond to templates, second-level rows correspond to template's fields.
The table has these columns:
Name | Description |
---|---|
Name | Template or field name |
Attributes | Field attribute |
Annotations | Number of annotations |
Results | Total number of extractions |
TP | True positives counters |
FP | False positives counters |
FN | False negatives counters |
Precision | Precision data |
Recall | Recall data |
F-Measure | F-Measure data |
By default, columns TP, FP, FN, Precision, Recall and F-Measure display only the difference or delta (Δ) between the metrics of the two reports. The delta symbol is colored to indicate quality trend:
- Green: progress
- Black: stability
- Red: regression
The header of these columns act as a toggle switch to display or hide the values in addition to the difference.
The info bar shows first-level nodes count.
Toolbar commands are:
Icon | Name | Description |
---|---|---|
Expand All | Expand all the tree nodes | |
Collapse All | Collapse all the tree nodes | |
Toggle Attribute Visibility | Display or hide the Attributes column | |
Export table to CSV | Export templates, fields and their quality results in CSV format |
The lower area shows data for documents with annotations or actual extractions related to the template or field selected in the upper area.
It contains a table with these columns:
Name | Description |
---|---|
Validated | Validated document |
File | Document file name |
Annotations | Number of annotations |
Results | Number of actual extractions |
TP | True positives: number of actual extractions that matched annotations (matches) |
FP | False positives: number of actual extractions that didn't match any annotations (unexpected results) |
FN | False negatives: number of annotations that were not matched by actual extractions (missed matches) |
Numbers between brackets refer to the older report, the other numbers are from the newer report.
The info bar shows the files count.
The toolbar contains these controls:
Icon | Name | Description |
---|---|---|
Docs In | Filter the list to show only documents that have actual categorization results as for the newer report and did not have any categorization result as for the older report | |
Docs Out | Filter the list to show only documents that don't have actual categorization results as for the newer report, but had categorization results as for the older report | |
Docs Won | Filter the document list to show only documents that have won true positives | |
Docs Lost | Filter the document list to show only documents that have lost true positives | |
Docs Changed | Filter the document list to show only documents that have changed between the reports | |
Reset filters | Remove the filters and display the complete list | |
Export selection to CSV | Export a filtered document list in CSV format according to the field and/or one of the other filters in this table | |
Export unfiltered results to CSV | Export the full document list in CSV format |
Categorization results
The Categorization results window shows a detailed comparison of categorization results.
It contains two areas. The upper area shows taxonomy information in a table.
The table is initially collapsed and can be expanded row by row with the expand and collapse commands on the left side of the row or with the toolbar commands.
The table has these columns:
Name | Description |
---|---|
Name | Domain name |
Label | Domain label |
Annotations | Number of documents in which the domain was annotated as a target categorization result |
TP | True positives counters |
FP | False positives counters |
FN | False negatives counters |
Precision | Precision data |
Recall | Recall data |
F-Measure | F-Measure data |
By default, columns TP, FP, FN, Precision, Recall and F-Measure display only the difference or delta (Δ) between the metrics of the two reports.
The header of these columns act as a toggle switch to display or hide the values in addition to the difference.
The info bar shows first-level nodes count.
Toolbar commands are:
Icon | Name | Description |
---|---|---|
Expand All | Expand all the tree nodes | |
Collapse All | Collapse all the tree nodes | |
Toggle Attribute Visibility | Display or hide the Attributes column | |
Export table to CSV | Export nodes and their quality results in CSV format |
The lower area shows data for documents with annotations or actual categorization results relative to the category selected in the upper area.
It contains a table with these columns:
Name | Description |
---|---|
Validated | Validated document |
File | Document file name |
Annotations | 1 if the selected category was annotated as a target categorization result for the document, 0 otherwise |
Results | 1 if the selected category was returned as a categorization result for the document, 0 otherwise |
TP | True positive: 1 if the selected category was annotated as a target categorization result for the document and was also returned as a categorization result for the document (match), 0 otherwise |
FP | False positive: 1 if the category was returned as a categorization result for the document, but was not annotated as a target categorization result for the document (unexpected result), 0 otherwise |
FN | False negative: 1 if the selected category was annotated as a target categorization result for the document, but was not returned as a categorization result for the document (missed match), 0 otherwise |
Numbers between brackets refer to the older report, the other numbers are from the newer report.
The info bar shows the files count.
The toolbar contains these controls:
Icon | Name | Description |
---|---|---|
Docs In | Filter the list to show only documents that have actual extractions as for the newer report and did not have any extraction as for the older report | |
Docs Out | Filter the list to show only documents that don't have actual extractions as for the newer report, but had extractions as for the older report | |
Docs Won | Filter the document list to show only documents that have won true positives | |
Docs Lost | Filter the document list to show only documents that have lost true positives | |
Docs Changed | Filter the document list to show only documents that have changed between the reports | |
Reset filters | Remove the filters and display the complete list | |
Export selection to CSV | Export a filtered document list in CSV format according to the field and/or one of the other filters in this table | |
Export unfiltered results to CSV | Export the full document list in CSV format |