Introduction to the DIS object
The DIS objects and its methods
The predefined DIS object gives access to the disambiguation results and also allows you to tag or untag text subdivisions.
The DIS object can be used in the SCRIPT attribute and also in the following functions, which are executed after disambiguation:
The functionalities of the DIS object are exposed through its methods, which can be grouped into these categories:
- Count text subdivisions
- Get the text of the whole document or that of text subdivisions
- Get objects corresponding to text subdivisions to explore their properties
- Get the index of the text subdivision of a certain kind which contains a character at a given position with respect to the document text
- Tag and untag text subdivisions
- (Reserved for future use) Access the results of document understanding
Text subdivisions: tokens and atoms
With the exception of the methods that have to do with document understanding, all the other methods of the DIS object are based on the creation of text subdivisions—with different granularity—operated by disambiguation.
At the token level, subdivisions also include sub-tokens called atoms. Disambiguation lists atoms immediately after the token they are part of.
For example, given this input text:
Michael Jordan was one of the best basketball players of all time.
disambiguation identifies these 15 units as either tokens or atoms:
| Index | Text | Sub-token (atom)? |
|---|---|---|
| 0 | Michael Jordan | No |
| 1 | Michael | Yes |
| 2 | Jordan | Yes |
| 3 | was | No |
| 4 | one | No |
| 5 | of | No |
| 6 | the | No |
| 7 | best | No |
| 8 | basketball players | No |
| 9 | basketball | Yes |
| 10 | players | Yes |
| 11 | of | No |
| 12 | all | No |
| 13 | time | No |
| 14 | . | No |