KEYWORD attribute
Overview
The KEYWORD
attribute matches the literal text of one or more consecutive tokens.
The syntax is:
KEYWORD("string1"[, "string2", ...])
where:
KEYWORD
is the attribute name and must be written in uppercase.string#
is a sequence of alphabetical characters, numbers, spaces and other punctuation marks.
By specifying multiple arguments, the attribute is true whenever any of the value of the arguments matches the text—in its entirety—of one or more consecutive tokens.
Match quotation marks and backslashes
If you need to match quotation marks ("
), escape them with the backslash character (\
).
For example:
KEYWORD("\"cool\"")
matches:
That's a "cool" car
but not:
That's a cool car
Note that "cool" is recognized by the disambiguator as three consecutive tokens—punctuation characters are tokens—without separators (for example blank characters) between them:
- "
- cool
- "
so KEYWORD("\"cool\"")
matches the literal text of three tokens.
It's not possible to match backlash characters, use the PATTERN
attribute instead of KEYWORD
if you need to do it.
For example:
PATTERN("\\path")
Case sensitivity
If string#
is written in lowercase, the match is case insensitive.
For example:
KEYWORD("triumph")
matches:
triumph
Triumph
TRIUMPH
triumph
...
To have a case sensitive match of lowercase text, start the string with a question mark followed by a colon (?:
).
For example:
KEYWORD("?:triumph")
matches only triumph.
If string#
contains at least one uppercase character, the match is case sensitive.
For example:
KEYWORD("Triumph")
matches only Triumph.
Applications
The KEYWORD
attribute can be used in a number of cases.
-
To identify a generic string, regardless of its possible meanings and uses.
KEYWORD("card")
In this case, any token—or atom—with a text that matches the string makes the attribute true.
Not only does the attribute match the simple word card, but also card in _credit card, in card game, discount card etc.
On the other hand, postcard is not matched becauseKEYWORD
only matches the text of tokens or atoms in their entirety. Use thePATTERN
attribute to make partial matches using wildcards, for examplePATTERN(".*card")
. -
To identify a proper noun or a collocation that does not exist in the knowledge graph. For example:
KEYWORD("John Smith") KEYWORD("sulphite reductor", "sulphite reductors") KEYWORD("tdi 4.0 awd")
Text like John Smith, sulphite reductor, sulphite reductors and TDI 4.0 AWD, in fact, cannot be matched using the
LEMMA
attribute because they do not appear in the standard knowledge graph for English. They could however be matched if the knowledge graph had been customized to include corresponding lemmas. -
To identify a particular phraseology. For example:
KEYWORD("sulphured hydrogen reduction through Idemitsu process")
Special case: split words
Fed with this text:
I don't spot the difference between the two colors.
the disambiguator splits don't in two tokens with literal text do and n't respectively.
This operand:
KEYWORD("don't")
is true because do and n't, generated by the splitting, do not have any blank between them, so KEYWORD
can behaves like for "cool" (see above). Conversely:
KEYWORD("do")
is false. This is because the KEYWORD
attribute can only match the rightmost token in a sequence of tokens generated by splitting the original word. So:
KEYWORD("n't")
is true.