Skip to content

KEYWORD attribute

Overview

The KEYWORD attribute matches the literal text of one or more consecutive tokens.

The syntax is:

KEYWORD("string1"[, "string2", ...])

where:

  • KEYWORD is the attribute name and must be written in uppercase.
  • string# is a sequence of alphabetical characters, numbers, spaces and other punctuation marks.

By specifying multiple arguments, the attribute is true whenever any of the value of the arguments matches the text—in its entirety—of one or more consecutive tokens.

Match quotation marks and backslashes

If you need to match quotation marks ("), escape them with the backslash character (\).
For example:

KEYWORD("\"cool\"")

matches:

That's a "cool" car

but not:

That's a cool car

Note that "cool" is recognized by the disambiguator as three consecutive tokens—punctuation characters are tokens—without separators (for example blank characters) between them:

  1. "
  2. cool
  3. "

so KEYWORD("\"cool\"") matches the literal text of three tokens.

It's not possible to match backlash characters, use the PATTERN attribute instead of KEYWORD if you need to do it.
For example:

PATTERN("\\path")

Case sensitivity

If string# is written in lowercase, the match is case insensitive.
For example:

KEYWORD("triumph")

matches:

triumph
Triumph
TRIUMPH
triumph
...

To have a case sensitive match of lowercase text, start the string with a question mark followed by a colon (?:).
For example:

KEYWORD("?:triumph")

matches only triumph.

If string# contains at least one uppercase character, the match is case sensitive.
For example:

KEYWORD("Triumph")

matches only Triumph.

Applications

The KEYWORD attribute can be used in a number of cases.

  • To identify a generic string, regardless of its possible meanings and uses.

    KEYWORD("card")
    

    In this case, any token—or atom—with a text that matches the string makes the attribute true.
    Not only does the attribute match the simple word card, but also card in _credit card, in card game, discount card etc.
    On the other hand, postcard is not matched because KEYWORD only matches the text of tokens or atoms in their entirety. Use the PATTERN attribute to make partial matches using wildcards, for example PATTERN(".*card").

  • To identify a proper noun or a collocation that does not exist in the knowledge graph. For example:

    KEYWORD("John Smith")
    KEYWORD("sulphite reductor", "sulphite reductors")
    KEYWORD("tdi 4.0 awd")
    

    Text like John Smith, sulphite reductor, sulphite reductors and TDI 4.0 AWD, in fact, cannot be matched using the LEMMA attribute because they do not appear in the standard knowledge graph for English. They could however be matched if the knowledge graph had been customized to include corresponding lemmas.

  • To identify a particular phraseology. For example:

    KEYWORD("sulphured hydrogen reduction through Idemitsu process")
    

Special case: split words

Fed with this text:

I don't spot the difference between the two colors.

the disambiguator splits don't in two tokens with literal text do and n't respectively.
This operand:

KEYWORD("don't")

is true because do and n't, generated by the splitting, do not have any blank between them, so KEYWORD can behaves like for "cool" (see above). Conversely:

KEYWORD("do")

is false. This is because the KEYWORD attribute can only match the rightmost token in a sequence of tokens generated by splitting the original word. So:

KEYWORD("n't")

is true.