Skip to content

Rules language

Overview

Kill lists and rule concept entities are expressions that must be interpreted as conditions that the text of a document must satisfy.

The rules that serve to confirm the extraction of concepts in thesaurus models are composed by combining taxonomy concepts, rule concepts or rule concept entities with specific operators.

The building blocks of this language for writing expressions and rules are described below.

Kill lists e rule concept entities

Operands

Kill lists and rule concept entities have the same structure.

Each operand is compared with the tokens into which the text is divided and takes on a true value if it matches, depending on its type, with the literal text of the token or with token features discovered by NLU analysis.
The type of an operand can be:

  • Pattern: true if the token's text matches a Perl compatible regular expression
  • Keyword: true if the token's text is equal to a string
  • Word: true if the token's text is any of the inflected forms of a lemma
  • Type: true if the token has a given grammatical type or is a named entity of a given type

For all operands except Type there is a sub-condition on the word class or entity type of the token. For Pattern and Keyword, a sub-condition on the case is also available, that is whether the match between the value of the operand and the text of the token must be case sensitive or not.

For example, to this visual builder:

corresponds this portion of the expression:

WORD("firm") + TYPE(NOU)

which is true for every token that is an inflection of the firm lemma and is a noun, as in:

Click here to research the best law firm

Operators

Multiple operands are combined two by two with Boolean or positional operators.

The Boolean operators are:

  • AND: the combination is true only if both the operands are true.
  • AND NOT: the combination is true if the first operand is true and the second operand is false.
  • OR: the combination is true if one of the two operands is true.

AND and AND NOT have precedence over OR, so in:

A OR B AND C

B AND C is evaluated first, then its result is ORed with A.
Therefore, if:

  • A is true
  • B is false
  • C is false

the expression is true, since B AND C is false and A OR false is true. If OR was evaluated first, the expression would be false, since A OR B is true and true AND C is false.

Positional operators are:

  • Strict sequence (>>): the tokens matched by the operands on the sides of the operator must be strictly consecutive, no other token is allowed between them.
  • Loose sequence (>): the tokens matched by the operands on the sides of the operator must be positioned one after the other, but tokens with low semantic value—adjectives, adverbs, conjunctions, articles, punctuation—are allowed between them.
  • Flexible sequence (DISTANCE in the visual builder, <Min,Max> in the expression): the tokens matched by the operands on its sides must be positioned one after the other, in the same sentence, at a distance—in tokens—that falls between the value of the Min parameter and the value of the Max parameter. Punctuation is counted as a token when computing the distance.

Rules

Scope

Rule scope define the span of text in which all the rule's operands must be matched.
The scope can be:

  • A clause
  • One or more consecutive sentences
  • A paragraph

So, in the case of sentences, an option called multiplier specifies the number of consecutive sentences.

Operands

The operands of a rule can be taxonomy concepts, rule concepts, or rule concept entities.

Operators

Multiple operands in a rule are combined two by two with Boolean or positional operators.

The Boolean operators are:

  • AND: the combination is true only if both the operands are true.
  • AND NOT: the combination is true if the first operand is true and the second operand is false.

Positional operators are:

  • Flexible sequence (DISTANCE in the visual builder, <Min,Max> in the expression): the portion of text matched by the first operand must be followed, within the rule's scope, by the portion of text matched by the second operand at a distance—in text tokens—between the value of the Min parameter and the value of the Max parameter. Punctuation is counted as a token when computing the distance.
  • NEXT: the portion of text matched by the first operand must be followed, in any subsequent sentence within the rule's scope, by the portion of text matched by the second operand.
  • NEXT NOT: the portion of text matched by the first operand must not be followed by the portion of text matched by the second operand in all the subsequent sentences within the rule's scope.
  • PREV: the portion of text matched by the first operand must be preceded, in any previous sentence within the rule's scope, by the portion of text matched by the second operand.
  • PREV NOT: the portion of text matched by the first operand must not be preceded by the portion of text matched by the second operand in all the previous sentences within the rule's scope,

Word classes

A word class can be specified as a sub-condition for the Pattern, Keyword and Word operands of kill lists and rule concept entities, while it can be chosen as the value of a Type operand.

Label Description
ADJ Adjective
ADV Adverb
NOU Noun
NPR Proper noun
VER Verb

Named entities types

A named entity type can be specified as a sub-condition for the Pattern, Keyword and Word operands of kill lists and rule concept entities, while it can be chosen as the value of a Type operand.

Label Description Example
ADR Street address Who lived at 221B Baker Street?
ANM Animal Felix is an anthropomorphic black cat.
BLD Building While in London I attended a concert at the Royal Albert Hall.
COM Company, business Tesla Inc. sold 10% of its Bitcoin holdings.
DAT Date Napoleon died on May 5, 1821.
DEV Device My new Galaxy smartphone has seven cameras.
DOC Document I appeal to the Geneva Convention!
ENT Generic entity I have five minutes left.
EVN Event Felice Gimondi won the Tour de France in 1965.
FDD Food, beverage Frank likes to drink Guinness beer.
GEA Physical geographic feature I crossed the Mississipi river with my boat.
GEO Administrative geographic area Alaska is the least densely populated state in the United States.
GEX Extended geography The astronauts have landed on Mars.
HOU Hours The eclipse reached its peak at 3pm.
LEN Legal entity Of course I pay the FICA tax.
MAI Email address For any questions do not hesitate to write to [email protected].
MEA Measure The chest is five feet wide and 40 inches tall.
MMD Mass media I read it in the Guardian.
MON Money I sold half of my stocks and made six hundred thousand dollars.
NPH Person Hakeem Olajuwon dunked effortlessly.
ORG Organization, institution, society Now they threaten to quit the United Nations if they are not heard.
PCT Percentage The richest 10% of adults in the world own 85% of global wealth.
PHO Phone number For poor database design, call (214) 748-3647.
PPH Physical phenomena The COVID-19 infection is slowing down.
PRD Product The Rolex Daytona is a wonderful watch.
VCL Vehicle A Ferrari 250 GTO was the most expensive car ever sold.
WEB Web address Find the best technical documentation at docs.expert.ai.
WRK Work of human intelligence Grease is a funny musical romantic comedy.