The Documents tab
Below you will find the peculiarities of the Documents tab of the project dashboard of thesaurus projects.
The characteristics of the tab that are common to all project types are described in a dedicated article.
Sort the document list
In addition to the sorting options common with other project types, the Extractions sorting option available in the list view of the Documents tab, in the context of an experiment, allows you to sort the list of documents based on the number of concept extractions.
Detail view's central panel
In the default visualization of the detail view, the third bar from the top in the central panel contain buttons displaying:
-
First group:
- Extractions : the overall number of mentions of concepts that were extracted in the current experiment.
- Annotations : the number of concepts' annotations.
-
Second group:
- Rules : the number of mentions of concepts that were extracted by rules.
-
Third group:
- True positive : the number of true positives.
- False positive : the number of false positives.
- False negative : the number of false negatives.
The buttons also act as toggle switches: to toggle the highlight of all the extractions, annotations, true positives, etc., select the corresponding button.
You can select one or more buttons in each group, but groups are mutually exclusive: if you select a button of one group, any buttons selected in the other groups are deselected and their effect cancelled.
The Thesaurus tab
In list view
In list view, the Thesaurus tab is in the left panel.
Under Extractions you'll find the concepts that were detected during the selected experiment, if any, or when the document were analyzed during the creation process after the definition of the initial taxonomy.
Under Annotations are listed the concepts that were annotated as expected results.
The numbers beside the Extractions and Annotations headings are, respectively, the number of distinct concepts that have been extracted and the number of distinct concepts that have been annotated in the current, possibly filtered, set of documents.
The number beside each concept under Extractions is the number of documents in the current list from which that concept was extracted. Similarly, the number beside each concepts under Annotations is the number of documents in which that concept was annotated.
- To filter the lists, type a value or the initial part of it in the Filter list box and press
Enter
. The match is case sensitive. Select the X icon inside the box to cancel the filter. - To change the sort order, select the desired option from the dropdown menu at the top right of the list.
- Double-click an item to insert it in the search bar as criteria for document search.
If a list appears truncated, select Open beside its name to give the list maximum space. Select Close to revert to previous visualization.
- To switch to the Resources tab of the project dashboard and show the detail of a concept, hover over the class and select Show in resources .
- To show more information about a concept, like concept relations, labels and hierarchy hover over it and select Show information . If the concept has a relation, select Show resources to view it in the Resources tab.
- To switch to the context view, hover over the concept and select Context view .
In detail view
The Thesaurus tab on the right of the detail view shows the concepts for which there annotations and extractions.
In case of a new project with no annotations and extractions, no details are available.
- To filter the lists, type a value or the initial part of it in the Filter list box and press
Enter
. The match is case sensitive. Select the X icon inside the box to cancel the filter. -
To change the sort order, select the desired option from the dropdown menu at the top right of the list. You can sort in ascending or descending order by:
- Frequency: the total of true positives, false positives and false negatives of a concept.
- Name: sort the concept name alphabetically.
- Score: concept extraction score beside its name. The score is visible after an experiment.
Note
You can also apply the sort in the project settings, Documents tab.
-
Double-click an item to insert it in the search bar as criteria for document search.
- To switch to the Resources tab of the project dashboard and show the detail of a concept, hover over the class and select Show in resources .
- To show more information about a concept, like concept relations, labels and hierarchy hover over it and select Show information . If the concept has a relation, select Show resources to view it in the Resources tab.
You can use this tab as a tool to annotate. More details in the dedicated page.
Debug extractions
Toggle debug
If you want explanations about the extractions of the current experiment, you can have them in the detail view.
Select Toggle debug extractions on the toolbar of the first bar from the top in the central panel.
The right panel, including annotation controls, is replaced by the Debug panel and the collapsed INSPECTOR panel.
Note
Select the expanding and collapsing to expand the INSPECTOR panel.
To go back to the initial view, select Toggle debug extractions again.
The lists
In the Debug panel you can find information under these lists:
- Thesaurus: all the taxonomy concepts whose mentions were found in the document's text and extracted.
The number next to the list heading is the number of distinct concepts that were extracted.
For each concept in the list, the rightmost number is the number of extractions, while the one closest to the concept name is the confidence score. - Kill lists: the kill lists that were triggered by the text, no matter if they cancelled some extraction or not.
The number next to the list heading is the number of kill lists that were triggered.
For each kill list, the number to its right is the number of hits, that is the number of times the kill list was triggered. - Other matches: parent concepts whose child concepts have currently been extracted in the document.
- Rules: the rules that were triggered by the text, effectively producing extractions.
The number next to the list heading is the number of rules that were triggered.
For each rule, the rightmost number is the number of hits, while the one closest to the rule name is the confidence score. - Rule concept entities: the rule concepts entities that were mentioned in the text, weather or not they are part—directly or via rule concepts—of any rule. All the rule concept entities found in the text are listed, independently of their contribution to actual extractions.
The number next to the list heading is the number of distinct entities mentioned in the text.
For each entity in the list, the number to its right is the number of hits, that is the number of mentions. - Rule concepts: the rule concepts that were mentioned in the text via their rule concepts entities, weather or not they are part of any rule. All the rule concepts found in the text are listed, independently of their contribution to actual extractions.
The number next to the list heading is the number rule concepts that were triggered.
For each rule concept in the list, the number to its right is the number of hits, that is the number of times it was triggered.
Each list can be expanded and collapsed selecting the arrow icons to the left of their names.
Filter the lists
To filter the items in the lists type something in the search box above the lists and press Enter
. Lists are filtered to show only items with matching names, the match is case insensitive.
When the filter is active, the counters next to the list headings show the number of items selected by the filter versus, in brackets, the total number of items.
To cancel the filter select the X icon in the right part of the search box.
Highlight items
The items selected in the lists are highlighted in the text. Multiple selection with the Ctrl
key is possible.
Each list corresponds to a different ink or background color:
- Thesaurus concepts and other matches: blue ink for mentions, light blue background for the extraction scope
- Kill lists: red ink
- Rules: orange ink for the operands, light blue background for the scope of the rule
- Rule concept entities: violet ink
- Rule concepts: green ink
The total number of selected items in both tabs is displayed in the upper right corner of the Debug panel.
To de-select all the items, removing the corresponding highlights, select the X icon inside the counter.
To get contextual information about a highlighted item, select the highlight in the text and:
- Select Show info to have, in a pop-up, information relating to the thesaurus, if any.
- Select Show extraction debug info to get, in a pop-up, details about all the kill lists, rule concept entities, rule concepts and rules corresponding to the highlighted text.
Inspect items
The INSPECTOR panel, on the right edge of the page, shows contextual information about the items selected in the Debug panel, possibly with navigation shortcuts.
To toggle the INSPECTOR panel select the expand and collapse icons at the top of the panel.
If you select more items in the Debug panel, only information for the last is shown in the INSPECTOR panel.
When a rule is selected, inside the INSPECTOR panel the operands that triggered the rules are highlighted. Select a highlighted operand to see its details and select Back to restore the previous view.
If a concept of a selected rule defined in the Advanced extraction tab has the label INCLUDE SUBENTITIES below it and the concept is reported in the Other matches list of the Debug panel, some child concepts have currently been extracted in the document text. For example, in the picture above, the customs (trade) concept belongs to the rule 2CP 3020. Such a concept is reported in the Other matches list of the Debug tab but what has been extracted in the document is not the concept itself, but its child concepts. Evidence is the letter S between square brackets before the concept name in the Other matches list.
Show rule info
In detail view, to have information about mentions of a concept that were found and extracted by rules:
- From the Thesaurus panel on the right, select a concept. The mentions of the concept will be highlighted in the document text.
-
Select a mention in the text: in the contextual menu, Show extraction debug info is available if the mention was extracted by a rule.
Select it: a pop-up will appear showing information about the rule.
Change document language
If you chose a wrong language during the upload or you chose the language auto-detection option, but you don't agree with the choice made by the system, you can change the language of a document.
This can be done in the list view.
To change the language of a document:
- Choose the document from the list and then select Change language on the toolbar above the list.
Or:
-
Select Change language, corresponding to the chip with the language code, in the document strip.
In both cases you can choose from all the supported languages of the tech version underlying the project.
To change the language of multiple documents, select them with Ctrl+Click
then select Change language on the toolbar above the list.
Info
The language change operation requires the repetition of the linguistic-semantic analysis for affected documents and the reorganization of the document indexes, therefore it is carried out in the background and can take a considerable amount of time.
During the operation, affected documents are grayed out.
At the end of the operation, a notification appears inviting you to reload the corpus to see the effects of the change.
Check or change matching strategy
In the detail view, when in the context of an experiment, you can check the matching strategy and change it using the dropdown list in the lower toolbar.
The possible change only affects the visualization of the results, it serves to show how the true positives, false positives and false negatives would have been if the experiment had been carried out with a different strategy; the metrics of the experiment are unaffected.
Sections
In the detail view and its variants you can toggle the display of sections. To do this, use the Toggle sections toggle switch on the toolbar next to the document name. The button is enabled only if sections have been defined.
In the absence of explicit annotations, all text belongs to the standard section.
In the dedicated article you will find information on how to annotate sections.
Generate IDF values
While in list or context view for an experiment, it is possible to generate—or display previously generated—IDF values for the extracted concepts. Values can then be used by the Documents, thesaurus and matches based scoring algorithm of another experiment.
To generate IDF values:
-
Select Generate IDF values on the toolbar above the document list. The Inverse document frequency dialog appears.
-
Select Generate. After a while, another dialog with the same name, showing the results, appears.
The Concepts tab has these columns:
Name Description Concept label List of all the extracted concepts IDF IDF value for each concept Select the column headers to sort the list by concept label or IDF value. A search bar is available to look for specific concepts.
The JSON tab contains the JSON that can be used by the scoring algorithm.
- Select the expanding and the collapsing arrows to expand and collapse the JSON.
- Select Copy JSON to clipboard to copy the JSON to the clipboard.
Note that:
- Once generated, IDF values are stored and can be retrieved during the experiment wizard, so you don't need to copy and paste them.
- The results of an experiment are immutable, so once you generate IDF values for it and the values are stored, you cannot—and don't need to—generate them again, the operation is "one shot". Generated values are read only and can be displayed again.
If the IDF values have already been generated, you can display them by selecting Generate IDF values on the toolbar above the document list. The Inverse document frequency dialog appears, see its description at point 2 above.
In this case, if a newer experiment exists, the Inverse document frequency dialog will also show:
- The New experiment available area with the information about the latest experiment.
- The Regenerate button that allows you to generate IDF values for the concepts extracted during the latest experiment.
The resulting IDF values are then displayed as described above.
It is also possible to generate IDF values for previous experiments, if not done already:
- Go to the experiment dashboard and select an experiment from the Experiments panel on the left.
- Select Browse documents to see document statistics for the experiment.
- Repeat the procedure explained above.