Text subdivisions objects
Overview
The DIS
object provides methods that return objects corresponding to the text subdivisions that disambiguation identified.
The properties of these objects are the attributes that disambiguation detected.
The methods are listed below.
Subdivision type | Method |
---|---|
Section | getSection |
Paragraph | getParagraph |
Sentence | getSentence |
Clause | getClause |
Phrase | getPhrase |
Token | getToken |
For example, this statement:
sentence = DIS.getSentence(3);
sets the sentence
variable to an object representing the 4th sentence of the document's text.
Syntax
All methods have a similar syntax:
DIS.method(index)
where:
method
is the method used to get text subdivisions.index
is the integer number representing the zero-based index of a specific text subdivision in the sequence of all the subsequent subdivisions of the same type that the disambiguation identified.
Sections
The objects returned by the getSection
method have these properties:
Property | Description |
---|---|
name |
Section name |
position |
Position of the first character of the first section in the text |
length |
Section length |
sentenceBegin |
Index of the first sentence of the section |
sentenceEnd |
Index of the last sentence of the section |
phraseBegin |
Index of the first phrase of the section |
phraseEnd |
Index of the last phrase of the section |
tokenBegin |
Index of the first token of the section |
tokenEnd |
Index of the last token of the section |
Paragraphs
The objects returned by the getParagraph
method have these properties:
Property | Description |
---|---|
position |
Position of the first character of the paragraph in the text |
length |
Paragraph length |
sentenceBegin |
Index of the first sentence of the paragraph |
sentenceEnd |
Index of the last sentence of the paragraph |
phraseBegin |
Index of the first phrase of the paragraph |
phraseEnd |
Index of the last phrase of the paragraph |
tokenBegin |
Index of the first token of the paragraph |
tokenEnd |
Index of the last token of the paragraph |
Sentences
The objects returned by the getSentence
method have these properties:
Property | Description |
---|---|
position |
Position of the first character of the sentence in the text |
length |
Sentence length |
phraseBegin |
Index of the first phrase of the sentence |
phraseEnd |
Index of the last phrase of the sentence |
tokenBegin |
Index of the first token of the sentence |
tokenEnd |
Index of the last token of the sentence |
Clauses
The objects returned by the getClause
method have these properties:
Property | Description |
---|---|
position |
Position of the first character of the phrase in the text |
length |
Phrase length |
index |
Clause ID |
tokenBegin |
Index of the first token of the phrase |
tokenEnd |
Index of the last token of the phrase |
phraseType |
Phrase type |
clauseType |
Clause type if any, empty otherwise |
Phrases
The objects returned by the getPhrase
method have these properties:
Property | Description |
---|---|
position |
Position of the first character of the phrase in the text |
length |
Phrase length |
tokenBegin |
Index of the first token of the phrase |
tokenEnd |
Index of the last token of the phrase |
phraseType |
Phrase type |
clauseType |
Clause type if any, empty otherwise |
clauseId |
Clause ID if any, -1 otherwise |
mainToken |
Index of the main token of the phrase if any, -1 otherwise |
Tokens
The objects returned by the getToken
method have these properties:
Property | Description |
---|---|
position |
Position of the first character of the token in the text |
length |
Token length |
index |
Index of the token |
grammarType |
Word class |
typeClass |
In case of proper nouns, the entity type |
lemma |
Token's lemma |
lemmaId |
Id of the token's lemma inside the reference Knowledge Graph |
synId |
Legacy ID of the token's syncon inside the reference Knowledge Graph (don't use). |
externalSynIds |
Array of the concept IDs expressed by the token. The first element of the array is the main ID, the others are secondary ones. |
dadId |
Reserved for future use |
phrase |
Index of the phrase containing the token |
sentence |
Index of the sentence containing the token |
paragraph |
Index of the paragraph containing the token |
section |
Index of the section containing the token |
titleLevelId |
Index of the title containing the token |
cellId |
Index of the cell containing the token |
blockId |
Index of the block containing the token |
isToken |
true if the object corresponds to a token subdivision, false if it corresponds to an atom (sub-token) |
isAtom |
true if the object corresponds to an atom of a token, false if it corresponds to a token |