Contexts for document analysis
The API document analysis resources operate within a context.
The context determines the type of Knowledge Graph to use.
To date, the API has only one context (named
standard) using universal, all-purpose Knowledge Graphs. Other contexts may be added in the future, equipped with domain-specific Knowledge Graphs.
A context can have more Knowledge Graphs of the same type to support as many languages.
This is the overview of the capabilities and languages available for the standard context:
|Deep linguistic analysis||✔||✔||✔||✔||✔|
|Named entity recognition||✔||✔||✔||✔||✔|
See the page about API endpoints in the reference section to learn about specifying the context and the language in resources URLs.
In the same page you'll find the description of API resources returning information about the available contexts and the languages they support.
The Knowledge Graph
The expert.ai Knowledge Graph is a concept-based representation of universal or domain-specific knowledge for a given language.
Each entry in the Knowledge Graph corresponds to a concept.
There are entries for common nouns, proper nouns, verbs, adjectives and adverbs.
Other parts of speech like punctuation, conjunctions, articles, prepositions and pronouns are not modeled in the Knowledge Graph because the text analysis software has its own ability to recognize these parts.
Ideally, the Knowledge Graph should have an entry for each of the concepts that can be expressed in the given language. This is practically feasible only in the case of relatively little or consolidated knowledge domains. However, the coverage of universal Knowledge Graphs is vast and expert.ai constantly takes care of keeping them updated.
Each entry contains all the information on the concept expression and on the concept itself, such as:
- The terms that can be used to express the concept, for example hand, pass, pass on, hand off, turn over, reach.
- the corresponding part-of-speech, for example (to) climb → verb, and other grammatical information on the terms.
- The topics to which the concept corresponds, for example soprano → opera, singing.
- References to external knowledge bases such as Wikidata, DBpedia, GeoNames, etc.
- Extended proprieties, for example the coordinates of places.
Modeling concepts that can be expressed in a language are not sufficient to enable the text analysis software to interpret ambiguous terms alone.
For example, consider that in the universal Knowledge Graph for the English language there are more than 20 entries for the verb (to) put.
The single entry has statistical information indicating the frequency with which the concept is used in a reference corpus compared to other concepts that can be expressed with the same word. This is useful information, but still insufficient. Using statistics alone can lead to incorrect interpretations and to a textual analysis of low quality and even lower usefulness.
What really improve the results are the relationships between concepts, hence the term Knowledge Graph. A single entry is linked to one or more other entries and, as such, relationships can be noumerous.
For example, a concept can be connected to other concepts in the hierarchical relationship "IS-A". So:
sodium IS A alkaline metal IS A metal IS A element
Or it can be a "part-whole" relationship:
wheel IS A PART OF car clutch IS A PART OF car dashboard IS A PART OF car
The relationships are designed to be navigated in both directions, so from the concept of car it is possible to discover the parts that make it up (wheels, clutch, dashboard, etc.). In the same way, for the "IS-A" relationship, from alkaline metal it is possible to discover which elements are "types of" the parent concept (sodium, cesium, lithium, etc.).
Relationships can be one-to-many. If this is obvious for the "part-whole" relations if read from the "whole" to the parts and for the "IS A" relationship if read from the more generic concept to the more specific ones, it is not obvious in the opposite direction, yet it can be, for example:
cat IS A feline
cat IS A pet
So, even if the relationship is hierarchical, a concept can have multiple "parents".
The relationships between Knowledge Graph entries are fundamental for disambiguation.
Suppose the text contains a form of the verb (to) put. The standard English Knowledge Graph contains more than 20 different concepts that can be expressed with (to) put: which is the right one?
Relationships can help. For each of the over 20 concepts, the text analysis software can explore its relationships to find out if the concept is linked to the other concepts expressed in the same text. The concept with more links to other concepts is a good candidate for the "right concept".
The disambiguation of one word helps to disambiguate the others, but the text analysis software is always free to "go back" and correct its previous clarification choices as it proceeds with the analysis of the other words of the text, with a chain effect on other disambiguations.
The name used to designate a Knowledge Graph entry in the API resources' output is syncon.