SCOPE with domain constraints
Introduction
When a text is analyzed, in addition to providing a four-level analysis of the document, the semantic disambiguator also automatically identifies the knowledge graph domains of the text based on the words used and their meaning. These domains correspond to a closed list of standard domains and are available each time a text is analyzed.
The standard domains can be used to define the scope of both categorization and extraction rules, in order to activate or inhibit a rule if a specific domain has been associated to the entire document. In other words, there are specific options that take into consideration the context in which words occur when deciding whether a rule should be activated or inhibited.
The syntax is:
SCOPE scopeOption domainConstraint(knowledgeGraphDomain[:threshold])
{
rules
}
where:
scopeOption
is one of the standard or custom scope options.knowledgeGraphDomain
is the name of a knowledge graph domain of choice.-
domainConstraint
can be:IF DOMAIN
IF NOT DOMAIN
IF RELEVANT DOMAIN
-
threshold
corresponds to either an integer or a decimal percentage. It refers to the percentage that was assigned by the disambiguator when the document was analyzed. Ifthreshold
is specified, the rule will be activated only if the document has been associated to theknowledgeGraphDomain
with a score that is equal to—or greater than—thethreshold
.
Thresholds are optional. More domains can be specified separating them with a comma.
IF DOMAIN
IF DOMAIN
is the constraint that allows enabling a rule only if a specific domain has been associated to the entire input document. Consider for example the following categorization rule:
SCOPE SENTENCE IF DOMAIN(football)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(305045) // Champions League
}
}
This will be activated and match the concept of Champions League, only if the domain football has been associated to the document during the text disambiguation process. If the rule is modified as follows:
SCOPE SENTENCE IF DOMAIN(game:2.5%)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(305045) // Champions League
}
}
the rule will be restricted only to those cases in which the document has been associated to the domain game by at least 2.5%.
IF NOT DOMAIN
IF NOT DOMAIN
is the constraint that inhibits a rule, if a specific domain has been associated to the input document. Consider for example the following extraction rule:
SCOPE SENTENCE IF NOT DOMAIN(military)
{
IDENTIFY(Template1)
{
@Field1[LEMMA("scout")]
}
}
It will be activated and extract the lemma scout, only if the domain military has not been associated to the document during the text disambiguation process. If the rule is modified as follows:
SCOPE SENTENCE IF NOT DOMAIN(military:5%)
{
IDENTIFY(Template1)
{
@Field1[LEMMA("scout")]
}
}
it will be restricted only to those cases in which a document has been associated to the domain military by at least 5%.
IF RELEVANT DOMAIN
IF RELEVANT DOMAIN
is the constraint enabling a rule only if a specific domain has been associated to the input document and this domain belongs to the relevant information identified for that document. Consider for example the following categorization rule:
SCOPE SENTENCE IF RELEVANT DOMAIN(football)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(317175) // FC Barcelona
}
}
This will be activated and will match the concept of FC Barcelona only if the domain football has been associated to the input document during the text disambiguation process and this domain is part of the relevant information for that document.
The difference between this constraint and the simple IF DOMAIN
constraint is the set of domains which is considered: IF DOMAIN
considers all domains associated to a document (even those with an extremely low score), whereas IF RELEVANT DOMAIN
considers only those domains that have been evaluated as the most representative for a document.
If the rule above is modified as follows:
SCOPE SENTENCE IF RELEVANT DOMAIN(football:10%)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(317175) // FC Barcelona
}
}
it is restricted only to those cases in which a the document has been associated to the domain football with a 10% score at least.
Domains with parenthesis
Some of the standard domains include round brackets. In this case, the domain name must be written in quotation marks.
For example:
SCOPE SENTENCE IF DOMAIN("soccer (US)":2%, sports:2%)
{
IDENTIFY(PERSONAL_DATA)
{
@SOCCER_PLAYER[TYPE(NPH)]
}
}
- In case of percentages, they must not be included in quotation marks.
- In case other standard domains with no brackets co-occur with domains with brackets, the former can be enclosed (or not) in quotation marks.
For example, the rule above can also be written like this:
SCOPE SENTENCE IF DOMAIN("soccer (US)":2%, "sports":2%)
{
IDENTIFY(PERSONAL_DATA)
{
@SOCCER_PLAYER[TYPE(NPH)]
}
}