Loose sequence
The loose sequence operator (greater-than sign, >
) requires that the two tokens matched by the operands on its sides are positioned one after the other and in the same sentence, without any token between them or separated only by tokens with low semantic value, such as articles, adjectives, adverbs, prepositions and conjunctions. Nouns and verbs are therefore not allowed.
All types of sequences can act both at the atom or token level of a sentence, according to the attribute after them.
The syntax is:
operand1
>
operand2
Consider the following example:
SCOPE SENTENCE
{
DOMAIN(dom1:NORMAL)
{
LEMMA("disease", "disorder")
>
ANCESTOR(26151)// 26151: nervous system
}
}
The rule's condition matches any loose sequence of two tokens in which the first token has its lemma attribute set to disease or disorder and the next token is a concept descending from syncon 26151 (nervous system).
Consider the following sample text:
Major new epidemiological analyses are focusing attention on disorders of the nervous system as important causes of death and disability around the world. One in every 9 individuals dies of a nervous system disorder.
The rule is triggered just once because of disorders of the nervous system found in the first sentence. The first operand of the condition matches disorders, the second matches nervous system and the tokens in between are acceptable because their semantic value is considered low (of = preposition, the = article).
The first operand of the condition also matches disorder in the second sentence and the second operand matches nervous system in the same sentence, but the overall condition is not met because the tokens are not positioned in the "right" order.
Besides simple attributes, the loose sequence operator can be used with set combinations of attributes as shown in the example below.
SCOPE SENTENCE
{
IDENTIFY (TEST)
{
@company[ANCESTOR(37475) + TYPE(NPR) + ROLE(SUBJECT)]// 37475: company, enterprise, firm, house,
>
LEMMA("produce", "design") + TYPE (VER)
>
@product[ANCESTOR(78687)]// 78687: artifact, artefact
}
}
This extraction rule is meant to extract proper names of companies and the products they manufacture.
The first operand matches any concept descending from syncon 37475 (company), but is limited to proper nouns (+TYPE(NPR)
), thus excluding any common noun like limited liability company.
The operand is further restricted so that only proper nouns playing the role of subject in a sentence or a clause (+ROLE(SUBJECT)
) are matched.
The second operand matches the inflections of the verbs produce and design and the third matches the concepts which descend from syncon 78687 (artifact).
If the rule is run against the following sample text:
Prada produces high-end ready-to-wear clothes for men and women. In addition Prada designs a range of children's clothes, fragrances, cosmetic products and accessories for men and women, including handbags, shoes, wallets and sunglasses.
in the first sentence, Prada is matched by the first operand, produces by the second and clothes by the third. The rule is triggered because:
- Tokens are found in the expected order.
- There are no tokens between Prada and produces (and zero tokens is OK).
- There are only adjectives (high-end, ready-to-wear) between produces and clothes.
Prada designs a range of children's clothes in the second sentence does not trigger the rule even if all three operands find a match in the expected order because of the noun range (nouns have high semantic value).
Right reference
The loose sequence operator with right reference (less-than sign, <
) is perfectly equivalent to the loose sequence operator unless it's combined with negative operands. To understand the effect of the right reference on sequence interpretation read the paragraphs under Right reference operators in the topic about negations in sequences.