A proper match strategy allows you to base your extractions on specific parameters that guarantee higher experiment results.
Match on categories
The match algorithm of a category is very simple, as there are no positions or types.
The only match criteria is therefore the category labels (identifiers) matching.
Match on extractions
An extraction is generally defined by these parameters:
- Type (in the Studio a Template-Field pair);
- Position in the text (start + end);
- Normalized value, if any (for example "credit card" for a text containing "my card").
The basic matching algorithm considers a strict equality on all these parameters. This means: an extraction is matched by an annotation with the same position, the same type and the same normalization (if any).
In some application scenarios this criteria—called strict matching policy—is too strict, therefore other lighter criteria have been introduced:
- Strict match - ignoring values: the strict match is guaranteed without checking the normalized form, and therefore only checking the type-position pair.
- Strict match - ignoring positions: the strict match is guaranteed without checking the position in the text, and therefore only checking the type-value pair.