Skip to content

RELEVANT attribute

The RELEVANT attribute matches tokens that the disambiguator marked as "relevant".

The lists of relevant elements recognized by the disambiguator are:

  • Keywords
  • Lemmas
  • Syncons
  • Knowledge Graph domains
  • Sentences

However, only keywords, lemmas and syncons can be used with the RELEVANT attribute. In fact, the other two refer to linguistic elements that go beyond the token limits, therefore making them unusable.

The syntax is:

RELEVANT(list1[, list2, ...])

where:

  • RELEVANT is the attribute name and must be written in uppercase.
  • list# refers to one of the lists above, its possible values are:
    • LEMMA
    • SYNCON
    • KEYWORD

Warning

Please note: the RELEVANT attribute, if used alone, is hyper generative. It is highly recommended to use the RELEVANT attribute in conjunction with other attributes.

During the disambiguation process, a complex analysis is performed to identify the most significant elements within the text. The disambiguator identifies all lemmas, syncons and keywords contained in a document and ranks them by relevance on a percentage scale. Only tokens which exceed a predefined threshold score are inserted into one of the three lists. Lemmas and syncons lists contain only elements found in the Knowledge Graph while Keyword lists contain single terms unknown to the Knowledge Graph as well as sequences or terms (both known and unknown) that may go beyond lemma, syncon and phrase limits. Keywords, therefore, do not easily synchronize with other attributes. To be more precise, RELEVANT (KEYWORD) refers to "compound terms" extracted by the disambiguator.

The RELEVANT attribute allows the use of one or more lists in a given statement. A token will be identified in a text, if it matches one of the elements included in the selected list.

It is also possible to define a relevance threshold score to restrict the behavior of the RELEVANT attribute. The syntax is:

RELEVANT(list1:threshold)

where threshold is a percentage between 0 and 100. In this case, the RELEVANT attribute will be validated, only if the token to be matched in a document matches one of the elements included in the selected list and its score is equal or greater than the threshold defined in the rule.

Consider the following examples:

RELEVANT (LEMMA)

This statement will identify any relevant lemma in the corresponding list.

For demonstrative purposes, let's imagine the statement above is used by itself in rule-writing. In a paragraph such as:

Although Congress may leave the details of Medicare savings to be worked out next year, there is already discussion of cutting special payments to teaching hospitals and small rural hospitals. Lawmakers are also considering reducing payments to hospitals for certain outpatient services that can be performed at lower cost in doctors' offices. Medicare pays substantially higher rates for the same services when they are provided in a hospital outpatient department rather than a doctor's office. The differential added $1.5 billion to Medicare costs last year, and as hospitals buy physician practices around the country, the costs are likely to grow, the Medicare commission says.

the elements identified as relevant lemmas (along with their score) would be:

Lemma Score
Medicare 17.9%
hospital 9.7%
doctor 7.3%
outpatient 6.8%
teaching hospital 6.7%
payment 6.3%
discussion 6.0%

Consider the same text processed with a RELEVANT attribute defining a threshold score:

RELEVANT(LEMMA:7%)

In this case, only the first three tokens would be matched.