Interpret disambiguation information
Introduction
In addition to the information shown in the Semantic Analysis tool window, deep linguistic analysis (also referred to as disambiguation) produces other information that is shown in the Disambiguation Info tool window.
Browse standard domains
Disambiguator's categorization
Deep linguistic analysis, performed by the disambiguator, produces the information upon which categorization and extraction rules can be built, but the disambiguator also carries out its own categorization of the input text: in the document analysis pipeline, this happens before rules' evaluation and has nothing to do with categorization rules.
In Studio, the results of this categorization are shown in the Standard Domains panel of the Disambiguation Info tool window.
While categorization rules "produce" domains from the project taxonomy, disambiguator's categorization returns encyclopedic knowledge domains defined in the Knowledge Graph. You can find the list of these domains in the Studio languages reference.
Example
Considering the text:
BMW released Tuesday the details of an electric concept car, with production of the vehicle expected to start in 2021.
In an interview with CNBC Tuesday, CEO Oliver Zipse described the BMW Concept i4 vehicle as bringing "electromobility to the heart of the BMW brand".
"It's fast, it has an acceleration of less than four seconds from zero to 100 kilometers an hour, it has a range of 600 kilometers (about 373 miles)", Zipse, who was speaking to CNBC's Annette Weisbach, said. The price is yet to be announced.
He went on to note that the company sold more than 140,000 electrified vehicles last year, stating he was "quite positive about our profitability in the future, even with electromobility".
the disambiguator returns domains such as commerce, motor vehicle and economics.
The semantic analysis phase of deep linguistic analysis determines the syncons, i.e. the meaning, for nouns, verbs, adjectives and adverbs.
Inside the Knowledge Graph, one or more encyclopedic domains can be associated with each syncon, with a specific weight.
Semantic analysis then returns all the domains associated with identified syncons, with a score that depends on the amount of syncons associated with each domain and the weights of the syncon-domain associations.
Info
The domains returned by the disambiguator are called standard because they come from the Knowledge Graph and are the same for all languages. On the contrary, those "produced" by categorization rules can be considered custom, as they are "made to measure" and differ from project to project.
When writing rules, you can make use of this categorization to define scope constraints.
Main domains, score and relevance
The score of a domain, shown in the Score column, is a function of:
- The number of syncons associated with that domain.
- The weight of the syncon-domain association.
- The score of each associated syncon, which is directly proportional to the frequency of the syncon in the analyzed text.
The domains with the highest scores and more than a minimum number of associated syncons are considered the main domains, while the others are considered less important, even if they have a high score.
The main domains are highlighted in yellow.
The relevance of a domain, shown in the Relevance column, is a function of the number of syncons associated with the domain.
To toggle the display of the relatively irrelevant domains select Toggle Irrelevant Domains on the toolbar.
Use standard domains as scope constraint
As mentioned above, the results of disambiguator's categorization can be used in categorization and extraction rules. In particular, you can write rules that are triggered only if the text is about a certain topic, i.e. if the disambiguator returns a given standard domain. This condition is called constraint and is specified after the rule's scope.
You can write the rule by hand and specify any domain you want or use the results of an analysis to create the template of a generic rule having one of the output domains as its constraint.
To create this kind of template and insert it in the file that's currently in focus inside the editor, right-click a domain, select Create a rule on domain "XXXX" and insert into current rule file and select the rule's scope from the sub-menu.
To create the rule template and copy it to the clipboard, right-click a domain, select Create a rule on domain "XXXX" and copy to clipboard and select the rule's scope from the sub-menu.
For more information about defining rules' scope constraints, read the Studio languages reference.
Other commands
To copy all domain names to the clipboard, right-click any domain and select Copy all Domains.
To copy the names of the relevant domains to the clipboard, right-click any domain and select Copy Relevant Domains.
To copy a domain name to the clipboard, right-click the domain and select Copy Domain "XXXX", where XXXX is the domain name.
Browse relevant information
The deep linguistic analysis performed by the disambiguator determines many features of the text, for example the sentences it is composed of, the base form—or lemmas—of the terms, the meaning of the terms, rendered as Knowledge Graph syncons, compound terms recognized heuristically, etc. To these is added the categorization described above.
Of all these output data, Some of these characteristics are marked as relevant based on frequency, relationship with the main standard domains (in turn determined as described above), position and other properties.
Relevant features of the text are shown in the Relevant Information panel, grouped by type.
Type-level commands
To highlight the occurrences of all the items of a given type—except for domains—select the type name.
To copy the data of all the items to the clipboard as a list of tab-separated values, right-click any type and select Copy All Relevants Informations.
To copy the data of all the items of a given type to the clipboard as a list of tab-separated values, right-click the type and select Copy All Relevants Informations of Type "XXXX", where XXXX is the information type.
Single item commands
To highlight the occurrences of an item—except in the case of a domain—select the item.
If an item corresponds to a Knowledge Graph syncon, the syncon ID is displayed in the item's row. in this case, to show the syncon in the Knowledge Graph tool window:
- Double-click the item.
Or:
- Right-click the item and select View Syncon in Knowledge Graph.
To copy the syncon ID to the clipboard, right-click the item and select Copy Syncon ID.
To open the Knowledge Graph tool window and perform a search using the item's value as search criteria, right-click the item and select Find lemma in Knowledge Graph.
To copy the base form of an item to the clipboard, right-click the item and select Copy Base Form.
Browse named entities
Named entities recognized during analysis are shown in the Entities tab, grouped by type.
The Count column shows the number of occurrences, including anaphoras.
To highlight the occurrences of all the entities of the same type, select the type.
To highlight all the occurrences of an entity, select the entity.
If an entity corresponds to a Knowledge Graph syncon, the syncon ID is displayed in the item's row.
To show the syncon in the Knowledge Graph tool window, double-click the entity.
To copy summary information for all the entities to the clipboard as a list of tab-separated values, right-click an entity type and select Copy All Entities.
To copy summary information for all the entities of a given type to the clipboard as a list of tab-separated values, right-click an entity type and select Copy All Entities of Type "XXXX", where XXXX is the entity type.
Additional info
To display additional info about entities set the Enable Advanced Disambiguation Info configuration property to true
.
If additional info corresponds to a Knowledge Graph syncon, the syncon ID is displayed in the info's row.
To show the syncon in the Knowledge Graph tool window:
- Double-click the info.
Or:
- Right-click the info and select View Syncon in Knowledge Graph.
To copy the syncon ID to the clipboard, right-click the info and select Copy Syncon ID.
To open the Knowledge Graph tool window and perform a search using the info's lemma, right-click the info and select Find lemma in Knowledge Graph.
To copy the base form of an info to the clipboard, right-click the info and select Copy Base Form.
Browse SAO groups
Semantic analysis detects full and partial subject-action-object (SAO) groups. They are displayed in the Actions (SAO) tab of the Disambiguation Info tool window if the Enable Advanced Disambiguation Info configuration property is set to true
.
Group commands
To highlight the group in the text, select the group row. The sentence in which the group was found is highlighted too.
To open the Knowledge Graph tool window and perform a search using the action's lemma, right-click the group and select Find lemma in Knowledge Graph.
To copy the SAO group to the clipboard as a pipe separated list, right-click the group and select Copy SAO.
To copy the action's base form to the clipboard, right-click the group and select Copy Base Form.
To copy the action's syncon ID to the clipboard, right-click the group and select Copy Syncon ID.
Details commands
To toggle the display of group components and other complements select the arrow to the left of the group.
To highlight an item and its sentence, just select the item.
If an item corresponds to a Knowledge Graph syncon, the syncon ID is displayed.
To show the syncon in the Knowledge Graph tool window:
- Double-click the item.
Or:
- Right-click the item and select View Syncon in Knowledge Graph.
To copy the syncon ID to the clipboard, right-click the item and select Copy Syncon ID.
To open the Knowledge Graph tool window and perform a search using the item's lemma, right-click the item and select Find lemma in Knowledge Graph.
To copy the item's base form to the clipboard, right-click the item and select Copy Base Form.