Skip to content

Manage tagging

Introduction

The pre-defined DIS object provides the following methods to manage tagging:

  • To add tag instances:

    • tagToken
    • tagPhrase
    • tagSentence
    • tagRange
    • tagTokenWithValue
    • tagPhraseWithValue
    • tagSentenceWithValue
    • tagRangeWithValue
  • To untag:

    • untagToken
    • untagPhrase
    • untagSentence
  • To rename tag instances:

    • renameTag
  • To get the the values of all the tag instances of a token:

    • getTokenTagEntry

Methods that add tag instances

All the methods that add tag instances have a camel case name starting with tag.
The second portion of the name (Token, Phrase, etc.) corresponds to the kind of text subdivision that gets covered by the tag instance. All the tokens in the subdivision will be covered by one instance of the tag.
The methods whose names ends with WithValue also override the default value of the tag instance—which is the concatenation of the texts of the tokens—with an arbitrary value.

tagToken(), tagPhrase() and tagSentence() have this syntax:

tagtextSubdivision(subdivisionIndex, tagName)

and add an instance of tag tagName that spans all the tokens of text subdivision textSubdivision with index subdivisionIndex.

tagRange() has this syntax:

tagRange(firstToken, lastToken, tagName)

and adds an instance of tag tagName that spans all the tokens with an index between firstToken and lastToken.

The methods whose names ends with WithValue have an additional argument value, a string representing the value of the tag instance. For example, this code:

DIS.tagSentenceWithValue(1, "CUBE_INVENTOR", "Erno Rubik");

superimposes an instance of tag CUBE_INVENTOR with value Erno Rubik to all the tokens of the second sentence of the text.

The level of the tag instance depends on the function in which the method is used. If used inside onTagger, the level is 999999999, if used inside onTaggerLevel the level is the one specified as the argument of the function.

Untagging methods

Untagging methods make tag instances "negative". When negative, tag instances are replaced by "untags" that are disaplyed in analysis results.
With untagging, the tokens previously covered by the tag instance loose the tag instance itself, its value and the possible alternative syncon ID that came from tag instance, so the original syncon ID of the tokens is restored.

All the untagging methods have a camel case name starting with untag.
The second portion of the name (Token, Phrase, etc.) corresponds to the text subdivision that gets untagged.
The methods have this syntax:

untagtextSubdivision(subdivisionIndex, tagName)

and make the instance of tag tagName that spans all the tokens of text subdivision textSubdivision with index subdivisionIndex negative.

renameTag

renameTag replaces a tag instance of a token with another by adding a new tag instance and untagging the original. The new instance inherits the level of the untagged instance.

Consider for example this definition:

TAGS
{
    @FIRST,
    @SECOND
}

If this tagging rule:

SCOPE SENTENCE 
{
    TAGGER()
    {
        @FIRST[LEMMA("good")]
    }
}

is applied to this text:

Mark is a good boy.

an instance of tag FIRST, covering token good, is added to the analysis results. Then, this JavaScript statement:

DIS.renameTag("FIRST", "SECOND", 3);

untags the instance of the FIRST tag and adds an instance of SECOND tag to the fourth token of the text. Since the instance of tag FIRST was on level 10000, which is the default for rule generated instances, the instance of tag SECOND will seat on the same level.

getTokenTagEntry

getTokenTagEntry returns an array of objects containing the data of all the tag instances of a token.

For example, given this code:

function onTagger() {
DIS.tagTokenWithValue(1, "MY_TAG1", "Christmas");
DIS.tagTokenWithValue(1, "MY_TAG2", "December 25th");
}

applied to this text:

On Xmas, I will buy a new computer.

two tag instances with their tag entry are assigned to the token Xmas, that is the token with zero-based index 1.

With this code:

var tagInstancesData = DIS.getTokenTagEntry(1);

variable tagInstancesData gets this value:

[
    {
        "tag": "MY_TAG1",
        "entry": "Christmas"
    },
    {
        "tag": "MY_TAG2",
        "entry": "December 25th"
    }
]

The syntax of the method is:

getTokenTagEntry(tokenIndex)

where tokenIndex is the index of the token.
The return value is an array of objects with this structure:

{
    "tag": tagName,
    "entry": tagValue
}

where each item of the array corresponds to a tag instance superimposed to the token.