Skip to content

Text subdivisions objects

Overview

The DIS object provides methods that return objects corresponding to the text subdivisions that disambiguation identified.
The properties of these objects are the attributes that disambiguation detected.
The methods are listed below.

Subdivision type Method
Section getSection
Paragraph getParagraph
Sentence getSentence
Clause getClause
Phrase getPhrase
Token getToken

For example, this statement:

sentence = DIS.getSentence(3);

sets the sentence variable to an object representing the 4th sentence of the document's text.

Syntax

All methods have a similar syntax:

DIS.method(index)

where:

  • method is the method used to get text subdivisions.
  • index is the integer number representing the zero-based index of a specific text subdivision in the sequence of all the subsequent subdivisions of the same type that the disambiguation identified.

Sections

The objects returned by the getSection method have these properties:

Property Description
name Section name
position Position of the first character of the first section in the text
length Section length
sentenceBegin Index of the first sentence of the section
sentenceEnd Index of the last sentence of the section
phraseBegin Index of the first phrase of the section
phraseEnd Index of the last phrase of the section
tokenBegin Index of the first token of the section
tokenEnd Index of the last token of the section

Paragraphs

The objects returned by the getParagraph method have these properties:

Property Description
position Position of the first character of the paragraph in the text
length Paragraph length
sentenceBegin Index of the first sentence of the paragraph
sentenceEnd Index of the last sentence of the paragraph
phraseBegin Index of the first phrase of the paragraph
phraseEnd Index of the last phrase of the paragraph
tokenBegin Index of the first token of the paragraph
tokenEnd Index of the last token of the paragraph

Sentences

The objects returned by the getSentence method have these properties:

Property Description
position Position of the first character of the sentence in the text
length Sentence length
phraseBegin Index of the first phrase of the sentence
phraseEnd Index of the last phrase of the sentence
tokenBegin Index of the first token of the sentence
tokenEnd Index of the last token of the sentence

Clauses

The objects returned by the getClause method have these properties:

Property Description
position Position of the first character of the phrase in the text
length Phrase length
index Clause ID
tokenBegin Index of the first token of the phrase
tokenEnd Index of the last token of the phrase
phraseType Phrase type
clauseType Clause type if any, empty otherwise

Phrases

The objects returned by the getPhrase method have these properties:

Property Description
position Position of the first character of the phrase in the text
length Phrase length
tokenBegin Index of the first token of the phrase
tokenEnd Index of the last token of the phrase
phraseType Phrase type
clauseType Clause type if any, empty otherwise
clauseId Clause ID if any, -1 otherwise
mainToken Index of the main token of the phrase if any, -1 otherwise

Tokens

The objects returned by the getToken method have these properties:

Property Description
position Position of the first character of the token in the text
length Token length
index Index of the token
grammarType Word class
typeClass In case of proper nouns, the entity type
lemma Token's lemma
lemmaId Id of the token's lemma inside the reference Knowledge Graph
synId Legacy ID of the token's syncon inside the reference Knowledge Graph (don't use).
externalSynIds Array of the concept IDs expressed by the token. The first element of the array is the main ID, the others are secondary ones.
dadId Reserved for future use
phrase Index of the phrase containing the token
sentence Index of the sentence containing the token
paragraph Index of the paragraph containing the token
section Index of the section containing the token
titleLevelId Index of the title containing the token
cellId Index of the cell containing the token
blockId Index of the block containing the token
isToken true if the object corresponds to a token subdivision, false if it corresponds to an atom (sub-token)
isAtom true if the object corresponds to an atom of a token, false if it corresponds to a token