Manage tagging
Introduction
The pre-defined DIS
object provides the following methods to manage tagging:
-
To add tag instances:
tagToken
tagPhrase
tagSentence
tagRange
tagTokenWithValue
tagPhraseWithValue
tagSentenceWithValue
tagRangeWithValue
-
To untag:
untagToken
untagPhrase
untagSentence
-
To rename tag instances:
renameTag
-
To get the the values of all the tag instances of a token:
getTokenTagEntry
Methods that add tag instances
All the methods that add tag instances have a camel case name starting with tag
.
The second portion of the name (Token
, Phrase
, etc.) corresponds to the kind of text subdivision that gets covered by the tag instance. All the tokens in the subdivision will be covered by one instance of the tag.
The methods whose names ends with WithValue
also override the default value of the tag instance—which is the concatenation of the texts of the tokens—with an arbitrary value.
tagToken()
, tagPhrase()
and tagSentence()
have this syntax:
tagtextSubdivision(subdivisionIndex, tagName)
and add an instance of tag tagName
that spans all the tokens of text subdivision textSubdivision
with index subdivisionIndex
.
tagRange()
has this syntax:
tagRange(firstToken, lastToken, tagName)
and adds an instance of tag tagName
that spans all the tokens with an index between firstToken
and lastToken
.
The methods whose names ends with WithValue
have an additional argument value
, a string representing the value of the tag instance. For example, this code:
DIS.tagSentenceWithValue(1, "CUBE_INVENTOR", "Erno Rubik");
superimposes an instance of tag CUBE_INVENTOR with value Erno Rubik to all the tokens of the second sentence of the text.
The level of the tag instance depends on the function in which the method is used. If used inside onTagger
, the level is 999999999, if used inside onTaggerLevel
the level is the one specified as the argument of the function.
Untagging methods
Untagging methods make tag instances "negative". When negative, tag instances are replaced by "untag" instances that are displayed in analysis results.
With untagging, the tokens previously covered by the tag instance loose the tag instance itself, its value and the possible alternative syncon ID that came from tag instance, so the original syncon ID of the tokens is restored.
All the untagging methods have a camel case name starting with untag
.
The second portion of the name (Token
, Phrase
, etc.) corresponds to the text subdivision that gets untagged.
The methods have this syntax:
untagtextSubdivision(subdivisionIndex, tagName)
and make the instance of tag tagName
that spans all the tokens of text subdivision textSubdivision
with index subdivisionIndex
negative.
renameTag
renameTag
replaces a tag instance of a token with another by adding a new tag instance and untagging the original. The new instance inherits the level of the untagged instance.
Consider for example this definition:
TAGS
{
@FIRST,
@SECOND
}
If this tagging rule:
SCOPE SENTENCE
{
TAGGER()
{
@FIRST[LEMMA("good")]
}
}
is applied to this text:
Mark is a good boy.
an instance of tag FIRST, covering token good, is added to the analysis results. Then, this JavaScript statement:
DIS.renameTag("FIRST", "SECOND", 3);
untags the instance of the FIRST tag and adds an instance of SECOND tag to the fourth token of the text. Since the instance of tag FIRST was on level 10000, which is the default for rule generated instances, the instance of tag SECOND will seat on the same level.
getTokenTagEntry
getTokenTagEntry
returns an array of objects containing the data of all the tag instances of a token.
For example, given this code:
function onTagger() {
DIS.tagTokenWithValue(1, "MY_TAG1", "Christmas");
DIS.tagTokenWithValue(1, "MY_TAG2", "December 25th");
}
applied to this text:
On Xmas, I will buy a new computer.
two tag instances with their tag entry are assigned to the token Xmas, that is the token with zero-based index 1.
With this code:
var tagInstancesData = DIS.getTokenTagEntry(1);
variable tagInstancesData
gets this value:
[
{
"tag": "MY_TAG1",
"entry": "Christmas"
},
{
"tag": "MY_TAG2",
"entry": "December 25th"
}
]
The syntax of the method is:
getTokenTagEntry(tokenIndex)
where tokenIndex
is the index of the token.
The return value is an array of objects with this structure:
{
"tag": tagName,
"entry": tagValue
}
where each item of the array corresponds to a tag instance superimposed to the token.