Skip to content

Condition expressions and sub-expressions

Within extraction or categorization rules, an expression is a complex linguistic condition composed, in its most basic form, of attributes and operators:

SCOPE scopeOption
{
    DOMAIN(domainName:scoreOption)|IDENTIFY(templateName)
    {
        attribute1
        operator
        attribute2
        ...
    }
}

For further information about the specific features of an extraction rule and how these are integrated into the above generic syntax see the related pages.

However, advanced rule-writing can optionally imply a particular usage of attributes and operators and the awareness of a priority scale prescribing the use of Boolean operators as seen in the following table:

Operator Priority
AND NOT 1
XOR 2
AND 3
OR 4

This means that, in a rule containing more than one Boolean operator, the order in which the elements of the expression are evaluated starts from the operator with the highest priority and proceeds to those with lower priority. For example, a condition such as the following:

LEMMA("flight")
OR
LEMMA ("plane")
AND
LEMMA("passenger")

where AND has a higher priority than OR, would be interpreted as follows:

LEMMA("plane") AND LEMMA("passenger")
OR
LEMMA("flight")

in other words, the engine will look for the simultaneous presence of the lemmas plane and passenger or the presence of the lemma flight by itself.

However, it is possible to use round brackets to specify a different order than the one prescribed by the Boolean operators' priority scale to create what can be called "sub-expressions". For example, if we wanted to rewrite the condition stated above so that it would look for the presence of the lemmas flight or plane in combination with the lemma passenger, the rule would look like this:

(
    LEMMA("flight")
    OR
    LEMMA ("plane")
)
AND
LEMMA("passenger")

This approach can be used with every Boolean operator and can be used in all cases in which the order of operators must be different than the one stated in the table above.

Another advantage of using parentheses is that every sub-expression within them can be used as an operand itself and become part of a more complex expression. That being so, the syntax describing an expression as provided at the beginning of this section can be modified as follows:

SCOPE scopeOption
{
    DOMAIN(domainName:scoreOption)|IDENTIFY(templateName)
    {
        operand1
        booleanOperator
        operand2
        ...
    }
}

where _operand#_ can be:

  • A simple attribute
  • A combination of attributes
  • A sequence of attributes (both positional and logical)
  • A sub-rule
  • A sub-expression.

For example:

ANCESTOR(12828) /*12828: plane, aeroplane, airplane*/ -SYNCON(UNKNOWN)
AND
LEMMA("hijack")
AND NOT
    (
        LEMMA ("fiction") > LEMMA("story")
        OR
        ANCESTOR(30419, 30451)//  30419: film, movie // 30451: television show, television program
    )

In the sample expression above we have 3 operands:

  • The operand in the first line identifies a token in the input text using a combination of attributes ANCESTOR and SYNCON(UNKNOWN).
  • The operand in the third line identifies a token in the input text using a single attribute (a LEMMA).
  • The third operand is a sub-expression composed of two elements: a positional sequence of attributes (LEMMA followed by LEMMA) and an ANCESTOR attribute specifying two concepts.

The expression resulting from the use of the above operands and the three operators can be read as follows:

//A// AND //B// AND NOT //C//

where C is

(//C1// OR //C2//)

To sum up:

//A// AND //B// AND NOT (//C1// OR //C2//)

In other words, the concept of airplane or any of its descendants should be found in the text along with the term hijack whenever the text does not mention a fictional story and/or films, TV shows, etc.

Such a rule could be useful in a news article categorization project when looking for news of hijacked airplanes while excluding any articles about films or TV shows telling stories of airplanes hijacking.

When specific sub-expressions or other complex operands become particularly important or useful in a project, they can be turned into sub-rules, which allow users to define an expression, give it a name and use that name in several rules to reference the expression.