PATTERN attribute overview
Syntax
The PATTERN
attribute matches the text of one or more consecutive tokens by means of regular expressions.
The syntax is:
PATTERN("regularExpression1"[, "regularExpression2", ...])
where:
PATTERN
is the attribute name and must be written in uppercase.regularExpression#
refers to a regular expression which must be written in quotation marks.
Behavior
The following rules determine the behavior of a PATTERN
attribute:
- The attribute will be true only if the text it matches completely covers the text of one or more consecutive tokens.
- Regular expressions can span consecutive tokens within the rule's scope.
- All instances of regular expressions are matched, and the one matching the highest number of tokens is chosen.
- If a single regular expression contains alternatives and one alternative is matched, the subsequent alternatives are ignored.
For example, consider these two categorization rules:
SCOPE SENTENCE
{
DOMAIN(housing)
{
PATTERN("hous(e|ed|es)")
}
}
SCOPE SENTENCE
{
DOMAIN(housing)
{
PATTERN("hous(es|ed|e)")
}
}
The two PATTERN
attributes in the rules seem equivalent: hous followed by any string between e, ed and es, but they do not produce the same effect.
If the text is:
house
housed
houses
The regular expression in the first rule:
hous(e|ed|es)
matches all the lines, but activates the rule in the first line only.
This is because the match is always triggered by the first alternative (hous + e), therefore the other two alternatives are ignored. In other words, the pattern completely covers line 1 so it triggers; it does not, however, cover the last characters of lines 2 and 3 so the match is only partial, thus the PATTERN
is false and consequently, the whole rule's condition is false, therefore it does not trigger.
The sequence of operations is the following::
- First line (house)
- Does hous + e match? YES! → ignore subsequent alternatives, the match is full, the
PATTERN
attribute is true, the condition is true, the rule is activated.
- Does hous + e match? YES! → ignore subsequent alternatives, the match is full, the
- Second line (housed)
- Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the
PATTERN
attribute is false, the condition is false, the rule is not activated.
- Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the
- Third line (houses)
- Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the
PATTERN
attribute is false, the condition is false, the rule is not activated.
- Does hous + e match? YES! → ignore subsequent alternatives, the match is partial, the
The regular expression in the second rule:
hous(es|ed|e)
matches all lines and activates the rule every time.
The sequence of operations is the following:
- First output (house)
- Does hous + es match? NO.
- Does hous + ed match? NO.
- Does hous + e match? YES! → no subsequent alternatives to ignore, the match is full, the
PATTERN
attribute is true, the condition is true, the rule is activated.
- Second output (housed)
- Does hous + es match? NO.
- Does hous + ed match? YES! → ignore subsequent alternatives, the match is full, the
PATTERN
attribute is true, the condition is true, the rule is activated.
- Third output (houses)
- Does hous + es match? YES! → ignore subsequent alternatives, the match is full, the
PATTERN
attribute is true, the condition is true, the rule is activated.
- Does hous + es match? YES! → ignore subsequent alternatives, the match is full, the