LEMMA attribute, with the difference that
ULEMMA identifies a token by specifying the base form of a word that is not contained in the Knowledge Graph.
Even though you specify the base form, inflected forms are also matched:
- The singular form for nouns.
- The base form for adjectives (that also matches comparatives and superlatives) and adverbs.
- The infinitive form for verbs.
The use of this attribute is recommended for German and Russian: while German extensively uses compound words that may not be in the Knowledge Graph, Russian words graphically vary according to their case.
The syntax is:
ULEMMA("string1"[, "string2", ...])
ULEMMAis the attribute name and must be written in uppercase.
string#refers to any sequence of alphabetical characters, numbers and punctuation marks. Any of the strings to be recognized in a document can be made up of one or several words but must be written between quotation marks.
The match for lemmas is case sensitive. The strings must be typed as they appear in the Knowledge Graph.
For example, this rule with a German lemma not contained in the Knowledge Graph:
applied to these texts1:
Der Patient braucht eine Röntgenkontrolle.
Die Patienten brauchen einige Röntgenkontrollen.
will trigger on both the singular and plural forms of the lemma Röntgenkontrolle.
As you can see, Röntgenkontrolle is written in title case because German nouns must be written like this.
If you have compound words made with the genitive s, this rule:
ULEMMA("Löschungsbewilligung") //deletion permit
applied to this text2:
Der Hausbesitzer beantragte mehrere Löschungsbewilligungen.
will not trigger on Löschungsbewilligungen, because the disambiguator recognizes the base form of the compound without the s. To let the rule trigger, the value in the rule must be written without the genitive s. In this way, this rule:
ULEMMA("Löschungbewilligung") //deletion permit
will trigger when applied to the text above even though the lemma in the text has the genitive s.