Skip to content

Confidence score

Introduction

Text intelligence engines assign a confidence score to each extraction. The confidence score is a decimal number between 0 and 1. The default confidence score is 1.

The confidence score is assigned to both extraction instances and fields.

Score options

Score options allow you to affect the confidence score at the rule level.

The syntax for writing a score option in a rule is:

IDENTIFY(templateName:scoreOption)

where templateName is the name of the template and scoreOption is the name of the score option. If you do not specify the score option, the default score of 1 is attributed to all the extractions produced by the rule.

Standard options

Standard score options correspond to a percentage of the default score. They are:

Label Percentage of the default score
LOW 25
NORMAL 50
HIGH 75

So for example, a rule like this:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA:LOW)
    {
        @FULL_NAME[TYPE(NPH)]
    }
}

will assign a confidence score of 0.25 to all its extractions.

Custom options

It is possible to define custom score options with this syntax:

CONFIDENCE
{
    @optionName1:optionValue1,
    @optionName2:optionValue2,
    ...                          
    @optionNameN:optionValueN
}

where:

  • CONFIDENCE is a language keyword and must be written in uppercase.
  • optionName# is the option name.
  • optionValue# is the option value, which is a number between 1 and 100 corresponding to a percentage of the default score.

For example, this custom score option:

CONFIDENCE
{
    @VERYHIGH:80
}

used in this rule:

SCOPE SENTENCE
{
    IDENTIFY(PERSONAL_DATA:VERYHIGH)
    {
        @FULL_NAME[TYPE(NPH)]
    }
}

will produce extractions with a confidence score of 0.80.