Extractions from documents
The model produced with a thesaurus project extracts the occurrences of the concepts from documents.
Use the Extraction configuration tab of the Edit Concept panel to change the extraction settings.
Use the Extraction toggle switch to turn concept extraction on and off.
This can be useful to see how the generated model works without a certain resource.
In the EXTRACTION METHOD area you set the method Platform will use to determine the portions of text to extract.
Possible methods are:
- Semantic: Platform extracts all the portions of text expressing the same meaning of the concept labels, in any inflected form. For example, for label sandglass: sandglass, hourglass, sandglasses, hourglasses.
- Base form: Platform extracts all the inflections of the lemma—the base form, for example the dictionary entry—of the concept labels. For example, for label sandglass: sandglass, sandglasses.
- Exact label: Platform extracts text portions that literally match concept labels.
In MANDATORY CONTEXT TERMS and FORBIDDEN CONTEXT TERMS area you can put terms that, respectively, must be present or must not be present in the context set inside the CONTEXT SETTINGS area for the extraction to take place.
For example, you may want the concept of chair to be extracted only if the term president is not also present in the same paragraph. In this case see add president to column FORBIDDEN CONTEXT TERMS and set the context to Paragraph.
- To add a term, select the plus button below the column header, type the term and press
- To edit a term, hover over it and select Edit .
- To delete a term, hover over it and select Delete .
- To change the co-occurrence context, choose your option in CONTEXT SETTINGS.
Not all parts of a text correspond to clauses. For example, titles such as:
are not considered propositions.
Be aware that if you set Clause as context, there may be portions of the document text that contain expressions of the concept that are not extracted.
In the case of extraction with semantic or base form methods, Platform extracts all the inflected forms of the concept labels.
If you want some forms to be ignored, add them to the FORBIDDEN FORMS column.
- To add a form, select the plus button below the column header, type the form and press
- To edit a form, hover over it and select Edit .
- To delete a form, hover over it and select Delete .