Models output keys
Introduction
The previous page describes the overall structure of the output JSON object of any model block in a workflow.
Here is the description of all the keys that can be present in the output: actual keys depend on the setting of the functional parameters of the block.
categories
The categories
array, a property of the document
object, is present in the output of models performing document classification.
It is the list of categories predicted by the model.
In case of an ML model, the properties of the array items are:
id
: category ID inside the taxonomy.score
: prediction score.winner
: a boolean flag set totrue
if the category was considered particularly important.
For example:
{
"id": "20000851",
"score": 4005.0,
"winner": true
}
In case of symbolic models or the symbolic step of an ML or Knowledge Model, the array items are like this:
{
"frequency": 70.62,
"hierarchy": [
"Sport",
"Competition discipline",
"Basketball"
],
"id": "20000851",
"label": "Basketball",
"namespace": "iptc_en_1.0",
"positions": [
{
"end": 14,
"start": 0
},
{
"end": 53,
"start": 35
},
{
"end": 139,
"start": 136
}
],
"score": 4005.0,
"winner": true
}
where:
namespace
is the software module name carrying out the categorization.id
,label
andhierarchy
identify the category in the taxonomy.score
is the cumulative score that was attributed to the category.frequency
is the percentage ratio of the category score to the sum of all categories scores.winner
is a boolean flag set totrue
if the category was considered particularly important.positions
is an array containing the positions of the text blocks that "explain" the category.
content
The content
key is a property of the document
object and it's the text that has been analyzed.
document
The document
key is an object that contains the results of a document analysis. It is common to all model blocks.
{
"document": {
analysis results
}
}
entities
The entities
array is a property of the document
object.
It is the result of the named entity recognition activity performed by the symbolic engine.
Each item in the array represents a named entity like this:
{
"lemma": "National Basketball Association",
"positions": [
{
"end": 139,
"start": 136
}
],
"syncon": 206693,
"type": "ORG",
}
where:
-
The
syncon
and thelemma
properties are respectively the outcome of the semantic analysis and lemmatization:syncon
is the ID of the Knowledge Graph entry corresponding to the entity.
The value -1 means the entity was heuristically recognized since there's no Knowledge Graph entry for it.lemma
is the lemma—or base form—of the entity name.
-
positions
is an array containing the positions of the entity occurrences in the text. type
is the entity type abbreviation.
extractions
The extractions
array is a property of the document
object and is present in the output of models performing information extraction and tesaurus models.
In case of information extraction, array items are like this:
{
"fields": [
{
"name": "ingredients",
"positions": [
{
"end": 243,
"start": 229
}
],
"value": "dark chocolate"
}
],
"template": "ingredients"
}
where:
template
is the group name.fields
is an array of class extractions.
Each item of the fields
array represents the extraction of a class, where:
name
is the class name.value
is the class value.positions
is an array containing the positions of the text blocks that were extracted.
In case of thesaurus models, array items are like this:
{
"fields": [
{
"name": "concept",
"positions": [
{
"end": 19,
"start": 15
}
],
"value": "sofa"
}
],
"namespace": "122-1642779466",
"template": "thesaurus"
}
Where:
namespace
is the software module name carrying out the analysis.template
is set to the constant value ofthesaurus
.fields
is an array of concept occurrences.
Each item of the fields
array represents an occurrence of a concept in the text, where:
name
is set to the constant value ofconcept
.value
is the text of the concept occurrence.positions
is an array containing the positions of the concept occurrences in the text.
extraData
extraData
object is s a property of the document
object.
In case of a thesaurus model, it has this structure:
"extraData": {
"thesaurusData": {}
}
If normalizeToConceptId
is inserted and set to true
in the API request to the workflow, then thesaurusData
contains detailed information on the extracted concepts, otherwise it's empty.
The option also affects extractions: the value of extracted fields becomes a pointer to a property of the extraData
object, for example:
No option or option set to false:
"extractions": [
{
"fields": [
{
name: "concept",
value: "planet"
...
}
...
],
...
},
...
],
"extrdata": {
"thesaurusData": {}
}
Option set to true:
"extractions": [
{
"fields": [
{
name: "concept",
value: "12345678"
...
}
...
],
...
},
...
],
"extrdata": {
"thesaurusData": {
"12345678": thesaurus and project data about concept "planet",
...
}
}
In case of other models, the value of extraData
varies on a case-by-case basis: typically the key contains data only if the model has been produced or modified with Studio, because Studio allows producing this "extra" output via scripting.
knowledge
The knowledge
array contains Knowledge Graph data information about syncons.
The items in this array:
tokens
manSyncons
entities
relations
items
(in thesentiment
object)
may have a syncon
property. In that case, there's a corresponding entry in the knowledge
array.
The link between those items and the corresponding items in the knowledge
array is the value of the the syncon
property both have in common.
For example, if this is an item of the tokens
array:
{
"atoms": [
{
"end": 45,
"lemma": "basketball",
"start": 35,
"type": "NOU"
},
{
"end": 53,
"lemma": "player",
"start": 46,
"type": "NOU"
}
],
"dependency": {
"head": 2,
"id": 6,
"label": "nmod"
},
"end": 53,
"lemma": "basketball player",
"morphology": "Number=Plur",
"paragraph": 0,
"phrase": 2,
"pos": "NOUN",
"sentence": 0,
"start": 35,
"syncon": 41583,
"type": "NOU"
}
the corresponding entry in the knowledge
array could be:
{
"label": "person.athlete.basketball_player",
"properties": [
{
"type": "WikiDataId",
"value": "Q3665646"
}
],
"syncon": 41583
}
The knowledge
array is a reference table: more than one item in the tokens
, relations
and sentiment
arrays can have the same syncon ID, but there's always one entry in the knowledge
array for a given syncon (it's a many-to-one relationship).
For example, if a text contains several occurrences of basketball player, each occurrence corresponds to a separate item in the tokens
array, but all tokens "point" to the same entry in the knowledge
array.
Items with the syncon property set to -1 have no corresponding entry in the knowledge
array. This is because those concepts were heuristically recognized and they are not present in the Knowledge Graph, there is no previous "knowledge" about them.
Each entry in the array is like this:
{
"label": "person",
"properties": [
{
"type": "WikiDataId",
"value": "Q215627"
}
],
"syncon": 73282
}
where:
- The
label
property is a textual rendering of the general conceptual category for the syncon in the Knowledge Graph. -
The
properties
array contains the outcome of knowledge linking. Each item has two properties:type
specifies the knowledge base.value
is the property value.
-
syncon
specifies the internal syncon ID managed in the Knowledge Graph.
Possible knowledge bases and interpretations of the value
property follow.
type |
value |
---|---|
Coordinate |
Latitude and longitude |
WikiDataId |
Wikipedia article ID |
DBpediaId |
URL of the DBPedia content |
GeoNamesId |
ID of the record in the GeoNames database |
language
The language
key, a property of the document
object, is present in the output of symbolic models, symbolic steps of ML models and knowledge models.
The key value is the ISO 639-1 code of the document language.
mainLemmas
The mainLemmas
array is a property of the document
object.
It contains the text main lemmas.
Each array item is an object that represents a lemma like this:
{
"positions": [
{
"start": 1152,
"end": 1162
},
{
"start": 1163,
"end": 1167
},
{
"start": 1239,
"end": 1249
},
{
"start": 1335,
"end": 1345
},
{
"start": 1394,
"end": 1404
}
],
"score": 6.5,
"value": "locomotive"
}
where:
value
is the lemma.score
is the measure of the lemma importance.positions
is an array containing the positions of the lemma occurrences in the text.
mainPhrases
The mainPhrases
array is a property of the document
object.
It contains the text main phrases.
Each array item is an object that represents a phrase like this:
{
"positions": [
{
"start": 883,
"end": 903
}
],
"score": 8,
"value": "four-cylinder engine"
}
where:
value
is the phrase.score
is the measure of the phrase importance.positions
is an array containing the positions of the phrase occurrences in the text.
mainSentences
The mainSentences
array is a property of the document
object.
It contains the text main sentences.
Each array item is an object that represents a sentence like this:
{
"end": 936,
"score": 13.3,
"start": 740,
"value": "The machine is held until ready to start by a sort of trap to be sprung when all is ready; then with a tremendous flapping and snapping of the four-cylinder engine, the huge machine springs aloft."
}
where:
value
is the sentence.score
is the measure of the sentence importance.start
is the position of the first character of the sentence.end
is the position of the first character after the sentence.
mainSyncons
The mainSyncons
array is a property of the document
object.
It contains information about the main Knowledge Graph concepts expressed in the text.
Each array item is an object that represents a Knowledge Graph concept like this:
{
"lemma": "experiment",
"positions": [
{
"end": 224,
"start": 213
},
{
"end": 2830,
"start": 2820
}
],
"score": 5.8,
"syncon": 2496
}
where:
-
The
syncon
and thelemma
properties are respectively the outcome of the semantic analysis and the lemmatization.syncon
is the ID of the Knowledge Graph entry expressed in the text.lemma
is the lemma—or base form—of the concept expression (for example:scarf
is the lemma forscarves
).
-
score
is the measure of the concept importance in the text. positions
is an array containing the positions of the concept occurrences in the text.
paragraphs
The paragraphs
array is a property of the document
object.
It contains information about the text paragraphs.
Each array item is an object that represents a paragraph like this:
{
"end": 176,
"sentences": [
0,
1
],
"start": 0
}
where:
start
is the position of the first character of the paragraph.end
is the position of the first character after the paragraph.- The
sentences
array contains the zero-based indexes of the constituent sentences, whose information is found in thesentences
array.
phrases
The phrases
array is a property of the document
object.
It contains information about the text phrases.
Each array item is an object that represents a phrase like this:
{
"end": 65,
"start": 54,
"tokens": [
7,
8,
9
],
"type": "PP"
}
where:
-
type
is the phrase type. Possible phrase types are:Code Description AP
Adjective Phrase CP
Conjunction Phrase CR
Blank lines DP
Adverb Phrase NA
Not Applicable (usually indicates punctuation) NP
Noun Phrase PN
Nominal Predicate PP
Preposition Phrase RP
Relative Phrase VP
Verb Phrase -
start
is the position of the first character of the phrase. end
is the position of the first character after the phrase.- The
tokens
array contains the zero-based indexes of the constituent tokens, whose information is found in thetokens
array.
relations
Introduction
Each item of the relations
array represents a verb plus the text elements that are in a semantic relation with it. These elements may specify arguments, adjuncts or subordinate clauses.
For example, given this input text:
John sent a letter to Mary.
the relations
array can contain an item like this:
{
"verb": {
"text": "sent",
"lemma": "send",
"syncon": 68296,
"phrase": 1,
"type": "",
"relevance": 15
},
"related": [
{
"relation": "sbj_who",
"text": "John",
"lemma": "John",
"syncon": -1,
"type": "NPH",
"phrase": 0,
"relevance": 15
},
{
"relation": "obj_what",
"text": "a letter",
"lemma": "letter",
"syncon": 29131,
"type": "wrk",
"phrase": 2,
"relevance": 10
},
{
"relation": "to_who",
"text": "to Mary",
"lemma": "Mary",
"syncon": -1,
"type": "NPH",
"phrase": 3,
"relevance": 10
}
]
}
Common properties
The verb
object and the items of the related
array share some properties.
text
is the portion of text corresponding to the element.
phrase
is the index of the phrase containing the element. The value must be interpreted as a pointer to an item of the phrases
array, where the positions of the first and the last character of the phrase can be found. This information can be used for text highlighting.
From the phrase, it is possible to go back to the sentence it belongs to—using the sentences
array—and from the sentence to the paragraph—using the paragraphs
array—or, going to the opposite direction, to find the tokens contained in the phrase —using the tokens
array.
Subordinate clauses—related items having the relation
property set to sub
—do not have a one-to-one correspondence with a phrase. In that case, phrase
has the conventional value -1.
The syncon
and lemma
properties are respectively the outcome of the semantic analysis and the lemmatization. Value -1 for syncon
means the concept doesn't have a correspondent in the expert.ai Knowledge Graph. This can happen with:
- Entities having a proper noun that are heuristically recognized (for example John Smith).
- Parts-of-speech that are not mapped in the Knowledge Graph like pronouns (for example them).
- Subordinate clauses like quotes (for example John said: "I will do it!").
In cases 1 and 2, lemma
is an empty string.
relevance
is an indicator of the importance of the element in the text. Its value ranges from 1 to 15. When the element importance cannot be determined, relevance
has the conventional value -1.
verb
The verb
object is always present and it represents the verb.
type
is the verb type. When set, it can be one of the following:
Verb type | Description |
---|---|
CPL |
to be used as a connection as in John is a smart guy |
MOV |
Verb of movement like to go |
SAY |
Verb of communication like to say |
related
The items of the related
array represent text elements related to the verb.
relation
is the type of relation and can be one of the following:
Possible values of relation |
---|
sbj_who |
sbj_what |
obj_who |
obj_what |
is_who |
is_what |
to_who |
to_what |
using_what |
preposition* + _what |
preposition* + _who |
sub ** |
when |
where |
to_where |
from_where |
in_where |
which_way |
how |
of_age |
limited_to |
* Prepositions are expressed in the language of the text intelligence engine. For example, a possible value in case of German could be auf_what
. Multi-word names of prepositional expressions like according to, in front of, etc., are written in compact form without spaces between words (accordingto
, infrontof
).
** The sub
relation type is used for subordinate clauses.
type
identifies the kind of element. Possible values can be uppercase or lowercase. Uppercase corresponds to named entities, lowercase to generic entities.
Relations can be recursive: a related item can be related to another item and so on. In this case, an item of the related
array can contain a related
array.
For example, given this input text:
Mireille placed the plant pot on the landing at the top of the stairs.
relations can be like this:
"relations": [
{
"related": [
{
"lemma": "Mireille",
"phrase": 0,
"relation": "sbj_who",
"relevance": 14,
"syncon": -1,
"text": "Mireille",
"type": "NPH"
},
{
"lemma": "pot",
"phrase": 2,
"relation": "obj_what",
"relevance": 15,
"syncon": 18506,
"text": "the plant pot",
"type": "prd"
},
{
"lemma": "landing",
"phrase": 3,
"relation": "on_what",
"relevance": 5,
"syncon": 16859,
"text": "on the landing",
"type": "bld"
},
{
"lemma": "top",
"phrase": 4,
"related": [
{
"lemma": "stairs",
"phrase": 5,
"relation": "of_what",
"relevance": 1,
"syncon": 20016,
"text": "of the stairs",
"type": "bld"
}
],
"relation": "at_what",
"relevance": -1,
"syncon": 37732,
"text": "at the top",
"type": ""
}
],
"verb": {
"lemma": "place",
"phrase": 1,
"relevance": 15,
"syncon": 68498,
"text": "placed",
"type": ""
}
}
]
sections
The sections
array contains the data of the text sections specified in the request, with possibly modified positions due to differences between input text and analyzed text.
Each item in the array has this format:
{
"namespace": (string) namespace,
"name": (string) section name,
"positions": [
range(s)
]
}
where:
namespace
is the name of the software module carrying out document classification inside the text intelligence engine.name
is the name of the section.-
The
positions
array indicates the range (or ranges) of characters that make up the section. Each item of the array is an object with this format:{ "start": (integer) zero-based position of the first character in the section "end": (integer) zero-based position of the first character after the section }
For example:
"sections": [
{
"namespace": "iptc_en_1.0",
"name": "TITLE",
"positions": [
{
"start": 0,
"end": 4
}
]
},
{
"namespace": "iptc_en_1.0",
"name": "BODY",
"positions": [
{
"start": 6,
"end": 10
}
]
}
]
segments
The segments
array is a property of the document
object.
It contains information about the segments defined in the imported CPKs that are generated with expert.ai Studio.
It has a structure like this:
"segments": [
{
"name": "SEGMENT1",
"namespace": "segments",
"positions": [
{
"end": 137,
"start": 0
},
{
"end": 477,
"start": 250
}
]
},
{
"name": "SEGMENT2",
"namespace": "segments",
"positions": [
{
"end": 137,
"start": 0
},
{
"end": 577,
"start": 479
}
]
}
]
sentences
The sentences
array is a property of the document
object.
It contains information about the text sentences.
Each array item is an object that represents a sentence and has a structure like this:
{
"end": 66,
"phrases": [
0,
1,
2,
3,
4,
5
],
"start": 0
}
where:
start
is the position of the first character of the sentence.end
is the position of the first character after the sentence.- The
phrases
array contains the zero-based indexes of the constituent phrases, whose information is found in thephrases
array.
sentiment
The sentiment
object contains three scores indicating the tone of the whole text:
positivity
: the amount of positivity.negativity
: the amount of negativity.overall
: the overall sentiment score, which is a combination of the scores above.
All sentiment scores are expressed in a range from -100 (extremely negative) to 100 (extremely positive).
The sentiment
object contains an items
array whose elements, in turn, can contain nested items
arrays. These items represent the clusters of text elements that give a positive or negative contribution to the sentiment.
For example, given this input text:
The road was bad.
items clusters can be like this:
"items": [
{
"lemma": "road",
"sentiment": -7,
"syncon": 19001,
"items": [
{
"lemma": "bad",
"sentiment": -7,
"syncon": 81195
}
]
}
]
sentiment
is the sentiment score of the cluster or leaf-item. The sentiment score of a cluster is a function of the child items' scores and the possible modifiers, which are not returned as separate items, but are nevertheless taken into account.
Take, for example, a slight change introduced in the sample text:
The road was really bad.
the really modifier makes the score worse:
"items": [
{
"lemma": "road",
"sentiment": -8.8,
"syncon": 19001,
"items": [
{
"lemma": "bad",
"sentiment": -8.8,
"syncon": 81195
}
]
}
]
On the other hand, a not before bad can invert the sentiment polarity from negative to positive. The sentiment value can be zero.
The syncon
and lemma
properties are respectively the outcome of the semantic analysis and the lemmatization.
An item having nested items can be an "unnamed cluster": in that case, the lemma
property is an empty string.
If the intrinsic item polarity—positive or negative—is opposite to that of the paragraph it belongs to, this marker:
[*]
is added as a suffix to the the lemma.
For example, given this input text:
The road was not bad.
The lemma bad is marked with the "opposite polarity" sign because it is negated by not:
"items": [
{
"items": [
{
"lemma": "bad[*]",
"sentiment": 7,
"syncon": 87597
}
],
"lemma": "road",
"sentiment": 7,
"syncon": 19001
}
]
Another possibility occurs when a lemma "attracts" other words in the same phrase. For example, given the input text:
Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.
a value of lemma
could be:
stand-out;skill
In this case the merged terms are separated by a semi-colon (;
).
Value -1 for syncon
means the concept doesn't have a correspondent in the expert.ai Knowledge Graph.
tokens
The tokens
array is a property of the document
object.
It contains information about the tokens in which the text was divided during the analysis.
A token is either a single word, a collocation or punctuation.
Each array item is an object that represents a token like this:
{
"atoms": [
{
"end": 24,
"lemma": "credit",
"start": 18,
"type": "NOU"
},
{
"end": 29,
"lemma": "card",
"start": 25,
"type": "NOU"
}
],
"dependency": {
"head": 2,
"id": 4,
"label": "obj"
},
"end": 29,
"lemma": "credit card",
"morphology": "Number=Sing",
"paragraph": 0,
"phrase": 2,
"pos": "NOUN",
"sentence": 0,
"start": 18,
"syncon": 54956,
"type": "NOU"
}
where:
- The
syncon
property is the outcome of the semantic analysis process. Its value is the ID of the corresponding entry in the Knowledge Graph or -1 if there's no corresponding entry. type
is the type label.lemma
is the result of the lemmatization. It is the lemma—or base form—of the token text, for example:scarf
is the lemma forscarves
andbe
is the lemma forwas
.pos
is the result of part-of-speech tagging, the process that marks up each token with the corresponding Universal POS tag.-
dependency
is the result of syntactic analysis, the parsing process that detects the universal dependency relation between each token and the sentence root token or another token.The process assigns a dependency relation label to each token.
For example, for this sentence:The company has developed an entirely new category of products.
syntactic analysis determines the head token index and the dependency label as follows:
Token index Token text Head token index Universal dependency label 0 The
1 det
1 company
3 nsubj
2 has
3 aux
3 developed
3 root
4 an
7 det
5 entirely
7 advmod
6 new
7 amod
7 category
3 obj
8 of
9 case
9 product
7 nmod
10 .
3 punct
Dependencies can be represented in various ways, such as a tree or arrow arcs.
Inside
dependency
:id
represents the index of the token in the text.dep
specifies the dependency relation with another token according to the Universal Dependencies conventions.head
identifies the token that receives the relation. Its value corresponds to the value of theid
property of another token, the only exception being the root token—the one with thedep
property set toroot
—for whichhead
andid
have the same value.
-
morphology
is the result of morphological analysis, the process that determines lexical and grammatical features of each token in addition to the part-of-speech.The result of the analysis is a list of Universal features.
For example, the morphological analysis of the first token of this sentence:
I saw a dandelion on my lawn.
gives:
Case=Nom|Number=Sing|Person=1|PronType=Prs
which is a list of feature-value pairs corresponding to:
Pair Feature label Feature description Value label Value description Case=Nom
Case
Case Nom
Nominative Number=Sing
Number
Number Sing
Singular Person=1
Person
Person 1
First PronType=Prs
PronType
Pronoun type Prs
Personal -
start
is the position of the first character of the token. end
is the position of the first character after the token.phrase
is the phrase containing the token; it's the zero-based index of the phrase in thephrases
array.sentence
is the sentence containing the token; it's the zero-based index of the sentence in thesentences
array.paragraph
is the paragraph containing the token; it's the zero-based index of the paragraph in theparagraphs
array.-
In case of collocations—for example: credit card—, the token object can contain the
atoms
array. This array contains an item for every word of the collocation and has these properties:type
is the type label for the word.lemma
is the lemma of the word.start
is the position of the first character of the word.end
is the position of the first character after the word.
If the semantic analysis recognizes a token as a named entity—for example: a person's name—without a corresponding entry in the Knowledge Graph, syncon
is set to -1 and the token object has an additional vsyn
(virtual syncon) property like this:
{
"syncon": -1,
"vsyn": {
"id": -436106,
"parent": 73303
},
"start": 0,
"end": 19,
"type": "NPR.NPH",
"lemma": "Mauricio Pochettino",
...
vsyn
is an object with these properties:
id
is a negative number assigned to all tokens considered as occurrences of the same entity. It is not the ID of a Knowledge Graph entry.parent
is the ID of the Knowledge Graph entry which, conceptually, is the parent of the concept expressed by the token. For example, if the token has been recognized as a person's name,parent
is the ID of the concept person.
topics
The topics
array is a property of the document
object.
It lists the Knowledge Graph topics the text is about.
Each array item is an object that represents a Knowledge Graph topic like this:
{
"id": 223,
"label": "mechanics",
"score": 3.5,
"winner": true
}
where:
id
is the topic ID.label
is the topic name.score
is the measure of the text topic importance.winner
is a boolean value set totrue
if the topic is considered particularly important.
version
The version
key is a property of the document
object.
The key value is the software module version that performed the analysis.
Type labels
The labels below are used for the type
property of tokens and tokens' atoms.
Code | Description |
---|---|
ADJ |
Adjective |
ADV |
Adverb |
ART |
Article |
AUX |
Auxiliary verb |
CON |
Conjunction |
NOU |
Noun |
NOU.ADR |
Street address |
NOU.DAT |
Date |
NOU.HOU |
Hour |
NOU.MAI |
Email address |
NOU.MEA |
Measure |
NOU.MON |
Money |
NOU.PCT |
Percentage |
NOU.PHO |
Phone number |
NOU.WEB |
Web address |
NPR |
Proper noun |
NPR.ANM |
Proper noun of an animal |
NPR.BLD |
Proper noun of a building |
NPR.COM |
Proper noun of a business/company |
NPR.DEV |
Proper noun of a device |
NPR.DOC |
Proper noun of a document |
NPR.EVN |
Proper noun of an event |
NPR.FDD |
Proper noun of a food/beverage |
NPR.GEA |
Proper noun of a physical geographic feature |
NPR.GEO |
Proper noun of an administrative geographic area |
NPR.GEX |
Proper noun of an extra-terrestrial or imaginary place |
NPR.LEN |
Proper noun of a legal/fiscal entity |
NPR.MMD |
Proper noun of a mass media |
NPR.NPH |
Proper noun of a human being |
NPR.ORG |
Proper noun of an organization/society/institution |
NPR.PPH |
Proper noun of a physical phenomena |
NPR.PRD |
Proper noun of a product |
NPR.VCL |
Proper noun of a vehicle |
NPR.WRK |
Proper noun of a work of human intelligence |
PNT |
Punctuation mark |
PRE |
Preposition |
PRO |
Pronoun |
PRT |
Particle |
VER |
Verb |
Unlike Universal POS tag, used for the pos
property of tokens, type labels combine part-of-speech information with entity type information and also apply to atoms.
Positions
The output of symbolic models and symbolic steps of ML models contains the position of text blocks (for example paragraphs, sentences, phrases, parts of text that "explain" predicted categories or extractions, named entities, text tokens, words, lemmas).
All these positions are referred to the content
property of the document
object.
The starting position is returned in the start
property and the ending position in the end
property.
The value of the start
property is the zero-based index of the first block character.
For example, if the text is:
Michael Jordan was one of the best basketball players of all time.
the start position of the phrase of all time is 54:
Michael Jordan was one of the best basketball players of all time.
↑
01234567890123456789012345678901234567890123456789012345678901234567890
0 1 2 3 4 5 6 7
The value of the end
position is the zero-based index of the first character after the text block.
In the example above, the end position of the phrase is 65:
Michael Jordan was one of the best basketball players of all time.
↑
01234567890123456789012345678901234567890123456789012345678901234567890
0 1 2 3 4 5 6 7