Advanced extraction
Overview
Advanced extraction settings are optional rules for identifying and extracting concepts in text.
Rules are arbitrarily complex conditions on the attributes of text tokens: when the condition is satisfied, the rule is triggered. It is sufficient that at least one of the possible n rules is triggered for the extraction to take place.
As for the labels, the extraction of mentions of the concept due to rules is canceled if kill lists are triggered by the same scope (for example, the same sentence) in which mentions were found.
You manage advanced extraction rules in the Advanced extraction tab of the Edit concept panel.
Create
To create a new rule:
- If it's the first rule, select Create rule near the center of the page.
-
If other rules are already defined, select the plus button under Advanced extraction rules.
Note
If you are editing an existing rule, first select Back to the left of the rule name to return to the list of rules.
The Create rule dialog appears. Enter the name of the rule and select Save. The system chooses a name for the rule if you omit it: the focus goes to the rule editor.
Info
Multiple rules are implicitly combined with the Boolean OR operator: it's enough that one of them is triggered to achieve their effect.
Define
You define a rule in the editor which opens automatically in the Advanced extraction tab of the Edit concept panel after you create a new rule or when you edit an existing rule.
The scope of the rule is the text extension within which the condition is checked. It can be a clause, one or more consecutive sentences or an entire paragraph.
Choose the scope from Rule scope. In the case of sentences you must also specify how many by setting Multiplier.
The body of the rule is its condition which is made up of operands combined by operators. An operand can be one or more of:
- Another taxonomy concept alone or a whole sub-tree of concepts
- A rule concept
- A rule concept entity
Note
As stated above, you are free to reference rule concept entities in the rule, so it is not mandatory to group rule concept entities into rule concepts and then use rule concepts in rules.
Hence consider using rule concepts only when you need to reference the same set of rule concept entities in the advanced extraction rules of multiple concepts.
If an operand consists of multiple elements, these elements are implicitly combined with each other with the Boolean operator OR.
- To define an element of an operand, type at least two characters in the Select field: all the taxonomy concepts, rule concepts and rule concept entities with matching names will be listed and you can choose the one you are interested in from the list.
- If you want to use any concept in a sub-tree, after you choose a taxonomy concept check Include subentities on the right of it.
- To delete an element of an operand, select the X icon to the right of it.
- To add an element, select the plus button above the first element of the operand: a new Select field appears, proceed as above to fill it.
After defining an operand, you can choose the operator that links it to any subsequent operand. The operator is chosen from the button panel under the operand and once the choice has been made:
- The button panel is replaced by the chosen operator.
- The visual builder of the new operand appears below the operator.
To go back to the operator choice, select the operator name.
Note
The DISTANCE operator has two parameters, Min and Max, which represent the minimum and maximum distance between the text tokens corresponding to the two operands.
The rule is automatically saved after each change.
If at least two operands are defined, to delete one, select the trash bin icon to the right of the operand name.
Set rules as alternative to labels
To ignore concept labels and only use the rules to identify—and, possibly, extract—mentions of the concept, turn on Rule extraction only.
In this mode all labels are ignored, and basic extraction settings are ignored too.
Edit
To edit an existing rule select the pencil icon to the right of the rule strip inside the list of rules.
Note
If you are already editing a rule, select Back to the left of the rule name to return to the list of rules.
Inspect
The INSPECTOR panel, on the right edge of the page, contains two tabs:
- Guide: a guide to scopes and operators
- Inspector: contextual information about the item selected in the adjacent panel, possibly with navigation shortcuts
To toggle the INSPECTOR panel select the expand and collapse icons at the top of the panel.
Delete
To delete a rule select the trash bin icon to the right of the rule strip inside the list of rules.