Combination of attributes
All available attributes can be combined. This permits more effective and complex matches on documents compared to the use of single attributes.
Attributes can be combined using the symbols + and -, which perform, respectively, the intersection and the difference between two or more attributes.
The syntax is:
attribute1(value1[, value2, ...]) +|- attribute2(value1[, value2, ...]) ...
Given two (or more) attributes combined in a rule, a token will be matched in a document, if it satisfies the first attribute and all those preceded by the
+ sign, but not all those attributes preceded by the
The attributes can be described as a set of values that must be matched by one or more tokens in a text in order for a rule to trigger. When two attributes are combined using the
+ sign, its goal is to perform an intersection between the two sets of values. When two attributes are combined using the
- sign, its goal is to perform the difference between the two sets of values.
Consider the following example:
TYPE(NPH) + ROLE(SUBJECT)
A rule containing this combination will categorize or extract the proper name of a person (
TYPE (NPH)) only if this name is also recognized as the subject of a sentence or clause (
In a sentence such as:
Dale Cregan is accused of the murders of Nicola Hughes and Fiona Bone.
only Dale Cregan would match the rule because it is the only NPH that is also the subject of the sentence. Nicola Hughes and Fiona Bone are also analyzed as NPHs, but their role is not the subject. In other words, the token Dale Cregan is considered the only intersection between the set of proper names and the set of subjects in a sentence.
Now consider the second example:
SYNCON(57720) - LEMMA("ck", "chq")
A rule containing this combination will categorize or extract the concept of bank check (SYNCON(57720)) but not the lemmas ck or chq which are found in the set of synonyms forming syncon 57720. These two lemmas will then be subtracted from the set of lemmas that make up the said syncon.
In a sentence such as:
Checks can now be cashed even after 6 months from the chq. date
only the token Checks would match the given rule because it belongs to the syncon 57720 and is not one of the subtracted lemmas. The lemma chq. is recognized as a part of the syncon 57220 but the rule excludes it from the set and is therefore not generated in the match. In other words, only the token Checks is part of the set of values resulting from the difference between the SYNCON set and the LEMMA set.
The following example:
ROLE(SUBJECT) + TYPE(NPR)
recognizes subjects which are also proper nouns (
LEMMA("dog") - KEYWORD("dogs")
In this example, dogs is an inflected form of the lemma dog and it is possible to subtract this keyword from the lemma set.
ANCESTOR(100000729)//@SYN: #100000729# [animal] + PATTERN("[^A-Z]")
The example above will match animal names belonging to the lexical and conceptual chain of the
ANCESTOR, only if they don't contain capital letters.