Confidence score
Introduction
Text intelligence engines assign a confidence score to each extraction. The confidence score is a decimal number between 0 and 1. The default confidence score is 1.
The confidence score is assigned to both extraction instances and fields.
Score options
Score options allow you to affect the confidence score at the rule level.
The syntax for writing a score option in a rule is:
IDENTIFY(templateName:scoreOption)
where templateName
is the name of the template and scoreOption
is the name of the score option. If you do not specify the score option, the default score of 1 is attributed to all the extractions produced by the rule.
Standard options
Standard score options correspond to a percentage of the default score. They are:
Label | Percentage of the default score |
---|---|
LOW |
25 |
NORMAL |
50 |
HIGH |
75 |
So for example, a rule like this:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA:LOW)
{
@FULL_NAME[TYPE(NPH)]
}
}
will assign a confidence score of 0.25 to all its extractions.
Custom options
It is possible to define custom score options with this syntax:
CONFIDENCE
{
@optionName1:optionValue1,
@optionName2:optionValue2,
...
@optionNameN:optionValueN
}
where:
CONFIDENCE
is a language keyword and must be written in uppercase.optionName#
is the option name.optionValue#
is the option value, which is a number between 1 and 100 corresponding to a percentage of the default score.
For example, this custom score option:
CONFIDENCE
{
@VERYHIGH:80
}
used in this rule:
SCOPE SENTENCE
{
IDENTIFY(PERSONAL_DATA:VERYHIGH)
{
@FULL_NAME[TYPE(NPH)]
}
}
will produce extractions with a confidence score of 0.80.