Model types and properties
Overview
Model blocks are the key components of NL Flow workflows.
Their main task is to use a model—either ML or symbolic—to make predictions about the input text in terms of categories—the topics covered in the text or the typology of the document inferred from its text—or extracted information. Depending on their type, however, model blocks can yield many other useful information.
Model types
These types of model can be used as blocks when designing a workflow:
Type | Block icon | Description |
---|---|---|
Basic mode ML models | Platform-generated ML models, which embed NL Core for feature extraction | |
Advanced mode ML models | Platform-generated ML models used without NL Core | |
Symbolic models | Symbolic models, either generated with the Platform authoring application or with Studio | |
Knowledge Models | NL Flow built-in symbolic models |
Basic mode ML models
ML models are generated in Platform authoring application. Basic mode refers to the way the corresponding blocks are put in the workflow, that is without any changes to the model structure, which embeds NL Core.
A basic mode ML model block is made of two modules, as illustrated below: one or more replicas of NL Core and one or more replicas of the predictive ML model, as illustrated below.
The input to the block is a JSON containing plain text, unless the model has been trained with annotations referring to the position of the text in the graphical rendering of the document: in that case the input JSON must contain text enriched with graphical layout information, that is the output of the Extract Converter processor.
Input can also include options affecting NL Core.
NL Core performs NLU analysis of the text, extracting the text features with which the ML prediction model is then fed. The main output of the block are the predictions (categories or extractions) plus, optionally, selected portions of the output of NL Core, determined by the block's functional parameters.
Advanced mode ML models
Advanced mode ML models are ML models that are put in a workflow with the Advanced mode option. The effect of this option is to remove NL Core, leaving only the predictive model. The inner structure of an advanced mode workflow block for a ML model is illustrated below.
The input to the model block is a JSON containing text features. In fact, predictions are based on text features, but since the block doesn't contain NL Core, it cannot extract them by itself.
To provide the block with the necessary features, a symbolic model or a knowledge model like NLP Core is placed in the workflow upstream of the advanced mode ML model block and its output mapped to the input of the ML model.
The main output of the block are the predictions (categories or extractions) plus, optionally, the echo of the input text features.
Advanced mode is meant for workflows requiring more than one ML model: if every ML model is put in the workflow with the advanced mode option, one upstream symbolic model is enough to feed all the advanced mode ML model blocks with text features, as illustrated in the picture below.
The result is a leaner and faster workflow: leaner because advanced mode ML models, lacking the NLU engine, require less computing resources to be published; faster because NLU analysis is performed only once per workflow activation instead of once per model.
Note
Advanced mode is available for ML models based on expert.ai ML technology version 3.0 or higher.
Symbolic models
Symbolic models use NL Core to get symbolic information about the input text and to evaluate categorization or extraction rules that use that symbolic information as their operands.
Note
Since the rules are human readable, it is always possible for a human to understand the reason for a result, hence these models are called explainable.
Platform-generated symbolic models
Platform authoring application allows for the automatic generation of symbolic models for categorization, extraction and thesaurus projects. Symbolic rules are produced as the result of model training.
The inner structure of the workflow block for a Platform-generated symbolic model is illustrated below.
The model block is made of an instance of NL Core.
It expects a JSON containing plain text as its input, any other input is ignored. Input can also include options affecting NL Core.
NL Core performs NLU analysis of the text, extracting text features that are then used by NL Core itself as the operands of automatically generated symbolic rules. The type of rules—categorization or extraction—depends on the Platform project type: categorization for categorization projects, extraction for extraction and thesaurus projects.
The main output of the block are the predictions (categories or extractions) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more of those features, based on the block's functional parameters.
Studio-generated symbolic models
If more human supervision is needed, symbolic models can be refined or even designed form scratch using Studio. Generated models can then be uploaded to the Platform's authoring application, to test them, and to NL Flow to use them in workflows.
When needed, Studio lets you generate models that combine categorization and extraction rules, something that is not doable with Platform generated models. Studio also allows you to fully exploit all the capabilities of NL Core, like segmentation rules and custom JavaScript.
The inner structure of the workflow block for a Studio-generated symbolic model is illustrated below.
The block accepts a JSON with plain text, "enriched" text, that is text with graphical layout information (the output of the Extract Converter processor) or text divided in sections, plus optional additional input, like side-by-side document data and options. See the next article for a description of the input JSON.
NL Core performs NLU analysis of the text, producing text features that are used by NL Core itself as the operands of hand written categorization or extraction rules. Hand written rules can exploit all the expressiveness of the rules language, something that automatically generated rules do not.
NL Core also executes any JavaScript code with which the developers of the model can affect the document processing pipeline, for example pre-processing the input text and/or post-processing the results, possibly taking into account any custom options passed in input and producing extra output.
The main output of the block are the predictions (categories and/or extraction) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more text features, layout information and any extra data generated by the JavaScript code, based on the block's functional parameters.
Knowledge Models
Knowledge Models are built-in symbolic models that cover a variety of text analysis use cases.
Their inner structure is the same of Studio-generated symbolic models.
Learn more about them in the dedicated section of this manual.
Models availability
Every model created with the Platform authoring application is made available to NL Flow by publishing it.
In the authoring application models can also be exported to be later imported in the same or another installation of both the authoring application and NL Flow.
Studio models can the imported in NL Flow. The model import functionality is available in the Models view of the main dashboard and in the My Models bar of the editor.
Knowledge Models are always available.
Block properties
Block properties can be set by editing the block.
Hosting service versions
To correctly interpret the properties of a model block, it is necessary to consider that for each of the available model types (knowledge model, ML, symbolic) it is possible to choose between more versions of the software service hosting the block:
- The latest version
- Older versions up to 1.0.0
When designing new workflows always use the latest version, as older versions are present for backward compatibility with workflows produced with previous versions of NL Flow.
The properties of a block change slightly depending on the version of the hosting service, the differences are highlighted on a case-by-case basis below.
Note
All model block properties below are listed in function of the latest version of the software service hosting the block.
General properties
The general properties of any model block are displayed at the top of the properties pop-up. They are:
- The block name, that can be edited.
- The version of the software service that hosts the block (read only).
- For ML Models only: the version of the model.
- The block ID (read only).
Functional properties
Functional properties can be checked and set in the Functional tab of the properties pop-up.
- Apply rules: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model apply symbolic categorization and extraction rules, filling the
categories
and theextractions
arrays based on the type of rules that are defined inside the model and are triggered by text features. If Apply rules is turned off thecategories
and theextractions
arrays are returned empty. - CPK: (applies to symbolic model blocks) the symbolic model of the block.
- Desired Namespace: (applies to ML model blocks in advanced mode) overrides the value of the
namespace
property of predictions (categories and extractions). - Knowledge Model ML Engine: output namespace: (applies to ML model blocks in basic mode) overrides the value of the
namespace
property of predictions (categories and extractions). - ML Engine: Output non-winner categories max number: (applies to ML model blocks in basic mode) maximum number of low score categories—those with the
winner
property set tofalse
—to output. - ML Engine: Output predictions explanations: (applies to ML model blocks in basic mode) enriches output with information about the text features that brought to the prediction of categories and/or extractions.
- ML Engine: Output winner categories only: (applies to ML model blocks in basic mode) output only categories with relatively high scores, that is those with the
winner
property set totrue
. - ML Engine: propagate document content to output: (applies to ML model blocks in basic mode) make the model return the
content
output key that is the echo of the input key with the same name. - ML Model: (applies to ML model blocks) the predictive model of the block.
- Only winners: (applies to ML model blocks in advanced mode) output only categories with the relatively highest scores, that is those with the
winner
property set totrue
. - Output all decimals in scores: (applies to knowledge model blocks and symbolic model blocks with the latest version of the hosting service) returns categorization and extraction scores with all the decimal digits instead of maximum two.
- Output dependency tree: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
pos
,dependency
andmorphology
properties for each item of thetokens
output array. - Output document data: (applies to knowledge model blocks with the latest version of the hosting service, symbolic model blocks and ML model blocks in basic mode with the latest version of the hosting service1) make the model return the
documentData
output key. - Output explanations: (applies to knowledge model blocks with the latest version of the hosting service, symbolic model blocks and ML model blocks in basic mode with the latest version of the hosting service1) enriches output with information about the symbolic rules that were triggered and brought to the prediction of categories and/or extractions.
- Output external ids: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) inside the Knowledge Graph used by NL Core, concepts—called syncons—are identified by a unique number. This number is the value of the
syncon
property of the items of thetokens
and theknowledge
output keys.
Syncons have further identification numbers, so-called external identifiers (one or more) that are not shown by default in the model output.
When turned on, this property determines the addition to the output, for each item of theknowledge
array, of theexternaIds
array listing those external identifiers.
Turning on this property is effective only if the Output knowledge property is turned on too (see below). - Output knowledge: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
knowledge
output key. - Output layout information: (applies to knowledge model blocks with the latest version of the hosting service and symbolic model blocks1) make the model return the
layoutData
output key. - Output namespace metadata: (applies to knowledge model blocks with the latest version of the hosting service, symbolic model blocks and ML model blocks in basic mode with the latest version of the hosting service1) make the model return the
namespaces
output key. - Output non-winner categories max number: (applies to ML model blocks in advanced mode) maximum number of low score categories—those with the
winner
property set tofalse
—to output. - Output predictions explanations: (applies to ML model blocks in advanced mode) enriches output with information about the text features that brought to the prediction of categories and/or extractions.
- Output relations: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
relations
output key. - Output relevants: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the most important elements of the input text, that is the
topics
,mainSentences
,mainPhrases
,mainSyncons
andmainLemmas
output keys. - Output rules extra data: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
extraData
output key. Only models generated with Studio can optionally produce extra data. - Output segments: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
segments
output key. Only Studio-generated symbolic models can optionally detect and output segments. - Output sentiment: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode) when turned on, makes the model return the
sentiment
output key. - Propagate Content to Output: (applies to ML model blocks in advanced mode) make the model return the
content
output key that is the echo of the input key with the same name. - Propagate Symbolic Categories and Extractions To Output: (applies to ML model blocks in basic mode with v. 1.0.0 of the hosting service) adds any categories and extraction output by NL Core to the
categories
andextractions
output keys respectively. - Propagate symbolic engine output: (applies to ML model blocks in basic mode) when this option is turned on, NL Core output is included in the overall block output.
- Propagate Symbolic to Output: (applies to ML model blocks in advanced mode) when this option is turned on, the input to the block—containing text features and coming from an upstream symbolic model block—is included in the overall block output.
- Required user properties for syncons (comma separated list): (applies to knowledge model blocks and symbolic model blocks) it allows you to specify the user data you want to be included in the items of the output
knowledge
array. It's a comma separated list of user data names. - Return all decimals numbers in scores: (applies to ML model blocks in basic mode with the latest version of the hosting service) returns categorization and extraction scores with all the decimal digits instead of maximum two.
- Rules output namespace: (applies to knowledge model blocks, symbolic models and ML model blocks in basic mode) overrides the value of the
namespace
property of predictions (categories and extractions). - Rules output user properties for syncons (comma separated list): (applies to ML model blocks in basic mode) it allows you to specify the user data you want to be included in the items of the output
knowledge
array generated by NL Core. It's a comma separated list of user data names. -
Sub-document segmentation strategy: (applies to ML models with the latest version of the hosting service) strategy used to identify the sub-documents on which to make category prediction. Auto-ML categorization models for which the Enable strict "Sub document categorization" compatibility mode option has been enabled can predict categories for each sub-document found in its input text, returning, for each category, the boundaries of the sub-document in the overall text. This parameter determines what a sub-document is. Possible values are:
- None: sub-documents, if any, are ignored, the predictions refer to the whole document text.
-
Extract Converter title: when the input to the model is the output of an Extract Converter block, a sub-document is:
- Any sequence of layout blocks that begins with a block of type title and ends either at the end of the document or immediately before another block of type title.
- Any table.
Consecutive title-type blocks are treated as a whole; header and footer blocks are ignored.
-
Extract Converter block: when the input to the model is the output of an Extract Converter block, the sub-documents are all the blocks of the layout except those of type header and footer.
- CPK segments: when the input to the model comes from a symbolic model originally generated with Studio and that model is able to detect and output text segments, sub-documents are the segments that are present in the output of that model.
-
Synchronize positions to original text: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in basic mode)) model output can contain the positions—start, end—of elements in text, for example the position of parts of the text that triggered a category or the parts the correspond to extractions.
The input text to the model can be changed by the model itself before being analyzed: for example, sequences of new line characters can be reduced to one new line character or multiple consecutive space characters collapsed to one space character. The model, if made with Studio can also perform find-and-replace operations through JavaScript before analyzing the text.By default, positions refer to the analyzed text, which can thus differ from the original, and analyzed text is returned in output in the
content
key, so if positions are used to highlight parts of the analyzed text, they are always accurate. This option must be turned on only if the user needs positions that refer to the original input text. When the option is turned on, the model rebases positions so that they are accurate for the original text, and the original input text is returned in output instead of analyzed text.Warning
Position rebasing can have some inaccuracies, especially if the model makes heavy changes to the input text before analyzing it.
If all output functional parameters are off, all model blocks based on NL Core still perform basic analysis of the input text and named entity recognition. This analysis produces the following output keys:
Input properties
The Input tab of the block-properties pop-up is the list of the top level keys that can be present in the JSON input of the block. In the next article you will find the detailed description of these keys. All the permitted properties are listed, but it may happen that some of them are mutually exclusive: by including those in the input JSON, the others must be omitted. This is also explained in the next article.
If the model is the first of one—or the only—flow of the workflow and is known that workflow input will be compatible with it, the input properties are read-only, just informative: they tell you what the workflow's input JSON must contain, that is properties with the same name and the same type.
If otherwise the block is preceded by other blocks or the format of the workflow's input has been explicitly defined, for one of the possible reasons, a drop-down list is added to each property to allow for input mapping.
Deployment properties
The block deployment properties of a model determine:
- The amount of computational resources (CPU and RAM) that each replica of the block or of each of its modules needs to work.
- The number of replicas of the block or of each of its modules.
- The processing timeout for the block or each of its modules.
- The number of threads of the software module that provides input to each replica of the block or of each of its modules when the workflow published in asynchronous mode.
Deployment properties are checked and changed in the Deployment tab of the block properties pop-up. In case of blocks with an old version of the hosting service, some properties may be found in the Type specific tab.
- Consumer Number: (applies to knowledge model blocks with the latest version of the hosting service, symbolic model blocks with the latest version of the hosting service and ML model blocks in advanced mode) number of threads of the software modules that provides input to the block when the workflow is published in asynchronous mode.
- CPU: (applies to knowledge model blocks, symbolic model blocks and ML model blocks in advanced mode) thousandths of CPU required (for example: 2000 = 2 CPUs) for each replica of the block.
- Memory: (applies to knowledge model blocks, symbolic model blocks and ML model block in advanced mode) RAM required by each replica of the block, expressed in IEC units (Ki = kibibytes, Mi = mebibytes, Gi = gibibytes, etc.).
- ML Engine Consumer Number: (applies to ML model blocks in basic mode with the latest version of the hosting service) number of threads of the software modules that provides input to the predictive model of the block when the workflow is published in asynchronous mode.
- ML Engine CPU: (applies to ML model blocks in basic mode) thousandths of CPU required by each replica of the predictive model of the block.
- ML Engine Memory: (applies to ML model blocks in basic mode) RAM required by each replica of the predictive model of the block, expressed in IEC units.
- ML Engine Replicas: (applies to ML model blocks in basic mode) number of replicas of the predictive model of the block.
- ML Engine Timeout: (applies to ML model blocks in basic mode) maximum time, expressed in minutes (m) or seconds (s), within which processing of input text features by the predictive model of the block must finish. If processing takes longer, the block generates an error.
- Replicas: (applies to knowledge model blocks, symbolic model blocks and ML models block in advanced mode) number of replicas of the block.
- Symbolic Engine Consumer Number:
- Symbolic Engine CPU: (applies to ML model blocks in basic mode) thousandths of CPU required by each replica of the NL Core module of the block.
- Symbolic Engine Memory: (applies to ML model blocks in basic mode) RAM required by each replica of the NL Core module of the block, expressed in IEC units.
- Symbolic Engine Replicas: (applies to ML model blocks in basic mode) number of replicas of the NL Core module of the block.
- Symbolic Engine Timeout: (applies to ML model blocks in basic mode) maximum time, expressed in minutes (m) or seconds (s), within which processing of any input by the NL Core module of the block must finish. If processing takes longer, the block generates an error.
- Timeout: (applies to knowledge model blocks, symbolic model blocks and ML models block in advanced mode) maximum time, expressed in minutes (m) or seconds (s), within which processing of any input must finish. If processing takes longer, the block generates an error.
-
The actual ability of the block to produce this output depends on the version of NL Core, which must be 4.12 or later. You can determine the version of NL Core for a knowledge model block or symbolic model block by selecting Show resources on the contextual mmenu of the block inside the editor or looking at the Resources area after selecting the model in the Models view of the main dashboard. ↩↩↩↩