Skip to content

Work with categorization rules

Steps

1

Create a categorization rule

Place the cursor inside the recently created rule file.

Choose the domain you want to write a rule for, then hover over that domain in the Taxonomy tool window, right-click it and select Create rule.

The structure of the rule will be generated where the cursor was placed. Notice that it already contains the reference to the domain you choose.

2

Use the KEYWORD attribute in your categorization rule

Use the shortcut `Ctrl+Shift+K` to automatically set the structure of the `KEYWORD` attribute. Then write your keyword(s) between the quotation marks. Remember that a keyword is a specific string of characters and it could be case insensitive when everything is written in lowercase; otherwise it is case sensitive.

Move to the upper-right side of the GUI and select Build to compile the project.

Pick a test file and open it in the editor, then go back to the upper-right side of the GUI and select Analyze Document.

Move to the Categorization tool window at the bottom to check if the rule has triggered. Click the category in the results to highlight in the text where the rule has triggered.

3

Use the LEMMA attribute in your categorization rule

Use the shortcut `Ctrl+Shift+L` to automatically set the structure of the `LEMMA` attribute. You can also write more than one lemma in the same rule by separating them with a comma.

Move to the upper-right side of the GUI and select Build to compile the project.

Pick a test file and open it in the editor, then go back to the upper-right side of the GUI and select Analyze Document.

Move to the Categorization tool window at the bottom to check if the rule has triggered. Click on the category in the results to highlight in the text where the rule has triggered

Notice that with the LEMMA attribute you detect also inflected forms of the words.

4

Apply your categorization rules to test documents and see results

Move to the top-right side of the screen, select Analyze All Documents button and choose a name for the analysis report.

Now open the Report tool window and you will find your analysis report. Select it and in the bottom part of the tool window you will see the list of the analyzed documents with information about the results. You can filter the results and leave for example only the information about categorization.

Open the documents by double clicking on them and check the categorization results in the Categorization tool window at the bottom.

Tips & tricks

Choose the proper score for your rule

Open a file containing categorization rules; change the score inside brackets by replacing `NORMAL` (10 points) with `LOW` (3 points) or `HIGH` (15 points).

Select Build and then Analyze Document at the top-right of the GUI. Hover to the Categorization tool window down below and have a look at the results.

Consider how the category score changes according to the different scores in the rules.

Change the score to `HIGH` when the rule is particularly relevant for your category. Choose `LOW` when the rule is relevant, yet not as relevant as the "normal" ones, so it should reach a considerable score only if it triggers more than once.