Tech versions
What are they?
In the authoring application any project or corpus is based upon a tech version.
A tech version is a set of basic—that is: without rules and scripting—instances of NL Core, at least one for each language supported by the application.
If enough computing slots are available, multiple instances of NL Core for a given language can be included in a tech version so to process multiple documents simultaneously.
There can be multiple tech versions, projects can be based on different versions.
Tech versions are managed by users belonging to the Owner role and at least one tech version must be created and activated before users can start working on projects and corpora.
Where are they used?
As stated above, every corpus is project is associated with one tech version.
An NL Core instance from the project's tech version:
- When run against a text, extracts features like named entities, main phrases, keywords, lemmas, syncons, main lemmas, main syncon labels, main topics, collectively dubbed "entities and tokens".
- Provides its knowledge graph together with all the accessory look-up and navigation functionalities.
So:
-
In corpora, categorization projects, extraction projects and thesaurus projects, NL core instances from the project tech version extract features when documents are uploaded to the project.
These features get indexed and:- Can be seen and used in the UI to search and drill-down documents to make any sort of interesting discovery about them.
- Allow users to find related documents.
-
Constitute a virtual information layer behind visible text. This information:
- In categorization and extraction projects, when annotating a document, automatically enriches the annotations. For the documents of a training library, this enrichment impacts model training, because a model can also learn from invisible information instead of just plain text.
- In extraction projects, enables automatic propagation of annotations and active learning based on the similarity of features between different portions of text, in the same or in separate documents.
-
In categorization, extraction and thesaurus projects, a copy of the tech version's instance of NL Core is copied inside every generated model so to perform feature extraction—which is preliminary to the prediction process because it creates the input for the prediction algorithm—whenever the model will run during an experiment or in an NL Flow workflow.
- In thesaurus projects, the NL Core instance for any of the project languages can suggest broader, narrower or related labels and concepts.
- In knowledge graph customization projects the tech version's instance of NL Core for the project language provides the initial knowledge graph.
Computing slots
A computing slot is the amount of computing resources (CPU and RAM) allocated to an instance of NL Core when it runs.
At installation time, a pool of computing slots is defined by specifying their number and their size, which is the same for all the slots. The pool represents the maximum amount of computing resources that can be allocated to tech versions: there can be a virtually unlimited number of tech versions, but only those that are associated with the slots of the pool are active.
An active tech version occupies at least as many computing slots as the languages supported by the application, because at least one instance of NL Core is required for each language.
If there are slots available, however, it is possible to allocate more slots to a language, increasing the number of instances of NL Core when more computing power is needed to analyze documents during uploads and experiments.
Multiple tech versions can be active at the same time as long as there are enough available slots to accommodate all of them.