Skip to content

PATTERN attribute overview

Syntax

The PATTERN attribute matches the text of one or more consecutive tokens by means of regular expressions.

The syntax is:

PATTERN("regularExpression1"[, "regularExpression2", ...])

where:

  • PATTERN is the attribute name and must be written in uppercase.
  • regularExpression# refers to a regular expression which must be written in quotation marks.

Behavior

The following rules determine the behavior of a PATTERN attribute:

  • The attribute will be true only if the text it matches completely covers the text of one or more consecutive tokens.
  • Regular expressions can span consecutive tokens within the rule's scope.
  • All instances of regular expressions are matched, and the one matching the highest number of tokens is chosen.
  • If a single regular expression contains alternatives and one alternative is matched, the subsequent alternatives are ignored.

For example, consider these two categorization rules:

SCOPE SENTENCE
{
    DOMAIN(housing)
    {
        PATTERN("hous(e|ed|es)")
    }
}

SCOPE SENTENCE
{
    DOMAIN(housing)
    {
        PATTERN("hous(es|ed|e)")
    }
}

The two PATTERN attributes in the rules seem equivalent: hous followed by any string between e, ed and es, but they do not produce the same effect.

If the text is:

house
housed
houses

The regular expression in the first rule:

hous(e|ed|es)

matches all the lines, but activates the rule in the first line only.
This is because the match is always triggered by the first alternative (hous + e), therefore the other two alternatives are ignored. In other words, the pattern completely covers line 1 so it triggers; it does not, however, cover the last characters of lines 2 and 3 so the match is only partial, thus the PATTERN is false and consequently, the whole rule's condition is false, therefore it does not trigger.

The sequence of operations is the following::

  1. First line (house)
    • Does hous + e match? YES! → ignore subsequent alternatives, the match is full, the PATTERN attribute is true, the condition is true, the rule is activated.
  2. Second line (housed)
    • Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the PATTERN attribute is false, the condition is false, the rule is not activated.
  3. Third line (houses)
    • Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the PATTERN attribute is false, the condition is false, the rule is not activated.

The regular expression in the second rule:

hous(es|ed|e)

matches all lines and activates the rule every time.

The sequence of operations is the following:

  1. First output (house)
    • Does hous + es match? NO.
    • Does hous + ed match? NO.
    • Does hous + e match? YES! → no subsequent alternatives to ignore, the match is full, the PATTERN attribute is true, the condition is true, the rule is activated.
  2. Second output (housed)
    • Does hous + es match? NO.
    • Does hous + ed match? YES! → ignore subsequent alternatives, the match is full, the PATTERN attribute is true, the condition is true, the rule is activated.
  3. Third output (houses)
    • Does hous + es match? YES! → ignore subsequent alternatives, the match is full, the PATTERN attribute is true, the condition is true, the rule is activated.