Introduction to the DIS object
The DIS objects and its methods
The predefined DIS
object gives access to the disambiguation results and also allows you to tag or untag text subdivisions.
The DIS
object can be used in the SCRIPT
attribute and also in the following functions, which are executed after disambiguation:
The functionalities of the DIS
object are exposed through its methods, which can be grouped into these categories:
- Count text subdivisions
- Get the text of the whole document or that of text subdivisions
- Get objects corresponding to text subdivisions to explore their properties
- Get the index of the text subdivision of a certain kind which contains a character at a given position with respect to the document text
- Tag and untag text subdivisions
- (Reserved for future use) Access the results of document understanding
Text subdivisions: tokens and atoms
With the exception of the methods that have to do with document understanding, all the other methods of the DIS
object are based on the creation of text subdivisions—with different granularity—operated by disambiguation.
At the token level, subdivisions also include sub-tokens called atoms. Disambiguation lists atoms immediately after the token they are part of.
For example, given this input text:
Michael Jordan was one of the best basketball players of all time.
disambiguation identifies these 15 units as either tokens or atoms:
Index | Text | Sub-token (atom)? |
---|---|---|
0 | Michael Jordan | No |
1 | Michael | Yes |
2 | Jordan | Yes |
3 | was | No |
4 | one | No |
5 | of | No |
6 | the | No |
7 | best | No |
8 | basketball players | No |
9 | basketball | Yes |
10 | players | Yes |
11 | of | No |
12 | all | No |
13 | time | No |
14 | . | No |