SCOPE with domain constraints
Introduction
When a text is analyzed, in addition to providing a four-level analysis of the document, the semantic disambiguator also automatically identifies the Knowledge Graph domains of the text based on the words used and their meaning. These domains correspond to a closed list of standard domains and are available each time a text is analyzed.
The standard domains can be used to define the scope of both categorization and extraction rules, in order to activate or inhibit a rule if a specific domain has been associated to the entire document. In other words, there are specific options that take into consideration the context in which words occur when deciding whether a rule should be activated or inhibited.
The syntax is:
SCOPE scopeOption domainConstraint(knowledgeGraphDomain[:threshold])
{
rules
}
where:
scopeOption
is one of the standard or custom scope options.knowledgeGraphDomain
is the name of a Knowledge Graph domain of choice.domainConstraint
can be:IF DOMAIN
IF NOT DOMAIN
IF RELEVANT DOMAIN
IF NOT IN SEGMENT
threshold
corresponds to either an integer or a decimal percentage. It refers to the percentage that was assigned by the disambiguator when the document was analyzed. Ifthreshold
is specified, the rule will be activated only if the document has been associated to theknowledgeGraphDomain
with a score that is equal to—or greater than—thethreshold
.
IF DOMAIN
IF DOMAIN
is the constraint that allows enabling a rule only if a specific domain has been associated to the entire input document. Consider for example the following categorization rule:
SCOPE SENTENCE IF DOMAIN(football)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(305045) // Champions League
}
}
This will be activated and match the concept of Champions League, only if the domain football has been associated to the document during the text disambiguation process. If the rule is modified as follows:
SCOPE SENTENCE IF DOMAIN(game:2.5%)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(305045) // Champions League
}
}
the rule will be restricted only to those cases in which the document has been associated to the domain game by at least 2.5%.
IF NOT DOMAIN
IF NOT DOMAIN
is the constraint that inhibits a rule, if a specific domain has been associated to the input document. Consider for example the following extraction rule:
SCOPE SENTENCE IF NOT DOMAIN(military)
{
IDENTIFY(Template1)
{
@Field1[LEMMA("scout")]
}
}
It will be activated and extract the lemma scout, only if the domain military has not been associated to the document during the text disambiguation process. If the rule is modified as follows:
SCOPE SENTENCE IF NOT DOMAIN(military:5%)
{
IDENTIFY(Template1)
{
@Field1[LEMMA("scout")]
}
}
it will be restricted only to those cases in which a document has been associated to the domain military by at least 5%.
IF RELEVANT DOMAIN
IF RELEVANT DOMAIN
is the constraint enabling a rule only if a specific domain has been associated to the input document and this domain belongs to the relevant information identified for that document. Consider for example the following categorization rule:
SCOPE SENTENCE IF RELEVANT DOMAIN(football)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(317175) // FC Barcelona
}
}
This will be activated and will match the concept of FC Barcelona only if the domain football has been associated to the input document during the text disambiguation process and this domain is part of the relevant information for that document.
The difference between this constraint and the simple IF DOMAIN
constraint is the set of domains which is considered: IF DOMAIN
considers all domains associated to a document (even those with an extremely low score), whereas IF RELEVANT DOMAIN
considers only those domains that have been evaluated as the most representative for a document.
If the rule above is modified as follows:
SCOPE SENTENCE IF RELEVANT DOMAIN(football:10%)
{
DOMAIN(dom1:NORMAL)
{
SYNCON(317175) // FC Barcelona
}
}
it is restricted only to those cases in which a the document has been associated to the domain football with a 10% score at least.