Model types and properties
Overview
Model blocks are the key components of NL Flow workflows.
Their main task is to use a model—either ML or symbolic—to make predictions about the input text in terms of categories—the topics covered in the text or the typology of the document inferred from its text—or extracted information. Depending on their type, however, model blocks can yield many other useful information.
Model types
These types of model can be used as blocks when designing a workflow:
Type | Block icon | Description |
---|---|---|
Basic mode ML models | ![]() |
Platform-generated ML models, which embed NL Core for feature extraction |
Advanced mode ML models | ![]() |
Platform-generated ML models used without NL Core |
Symbolic models | ![]() |
Symbolic models, either generated with the Platform authoring application or with Studio |
Knowledge Models | ![]() |
NL Flow built-in symbolic models |
Basic mode ML models
ML models are generated in Platform authoring application. Basic mode refers to the way the corresponding blocks are put in the workflow, that is without any changes to the model structure, which embeds NL Core.
A basic mode ML model block is made of two modules, as illustrated below: one or more replicas of NL Core and one or more replicas of the predictive ML model, as illustrated below.
The input to the block is a JSON containing plain text, unless the model has been trained with annotations referring to the position of the text in the graphical rendering of the document: in that case the input JSON must contain text enriched with graphical layout information, that is the output of the Extract Converter processor.
Input can also include options affecting NL Core.
NL Core performs NLU analysis of the text, extracting the text features with which the ML prediction model is then fed. The main output of the block are the predictions (categories or extractions) plus, optionally, selected portions of the output of NL Core, determined by the block's functional parameters.
Advanced mode ML models
Advanced mode ML models are ML models that are put in a workflow with the Advanced mode option. The effect of this option is to remove NL Core, leaving only the predictive model. The inner structure of an advanced mode workflow block for a ML model is illustrated below.
The input to the model block is a JSON containing text features. In fact, predictions are based on text features, but since the block doesn't contain NL Core, it cannot extract them by itself.
To provide the block with the necessary features, a symbolic model or a Knowledge Model like NLP Core is placed in the workflow upstream of the advanced mode ML model block and its output mapped to the input of the ML model.
The main output of the block are the predictions (categories or extractions) plus, optionally, the echo of the input text features.
Advanced mode is meant for workflows requiring more than one ML model: if every ML model is put in the workflow with the advanced mode option, one upstream symbolic model is enough to feed all the advanced mode ML model blocks with text features, as illustrated in the picture below.
The result is a leaner and faster workflow: leaner because advanced mode ML models, lacking the NLU engine, require less computing resources to be published; faster because NLU analysis is performed only once per workflow activation instead of once per model.
Note
Advanced mode is available for ML models based on expert.ai ML technology version 3.0 or higher.
Symbolic models
Symbolic models use NL Core to get symbolic information about the input text and to evaluate categorization or extraction rules that use that symbolic information as their operands.
Note
Since the rules are human readable, it is always possible for a human to understand the reason for a result, hence these models are called explainable.
Platform-generated symbolic models
Platform authoring application allows for the automatic generation of symbolic models for categorization, extraction and thesaurus projects. Symbolic rules are produced as the result of model training.
The inner structure of the workflow block for a Platform-generated symbolic model is illustrated below.
The model block is made of an instance of NL Core.
It expects a JSON containing plain text as its input, any other input is ignored. Input can also include options affecting NL Core.
NL Core performs NLU analysis of the text, extracting text features that are then used by NL Core itself as the operands of automatically generated symbolic rules. The type of rules—categorization or extraction—depends on the Platform project type: categorization for categorization projects, extraction for extraction and thesaurus projects.
The main output of the block are the predictions (categories or extractions) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more of those features, based on the block's functional parameters.
Studio-generated symbolic models
If more human supervision is needed, symbolic models can be refined or even designed form scratch using Studio. Generated models can then be uploaded to the Platform's authoring application, to test them, and to NL Flow to use them in workflows.
When needed, Studio lets you generate models that combine categorization and extraction rules, something that is not doable with Platform generated models. Studio also allows you to fully exploit all the capabilities of NL Core, like segmentation rules and custom JavaScript.
The inner structure of the workflow block for a Studio-generated symbolic model is illustrated below.
The block accepts a JSON with plain text, "enriched" text, that is text with graphical layout information (the output of the Extract Converter processor) or text divided in sections, plus optional additional input, like side-by-side document data and options. See the next article for a description of the input variables.
NL Core performs NLU analysis of the text, producing text features that are used by NL Core itself as the operands of hand written categorization or extraction rules. Hand written rules can exploit all the expressiveness of the rules language, something that automatically generated rules do not.
NL Core also executes any JavaScript code with which the developers of the model can affect the document processing pipeline, for example pre-processing the input text and/or post-processing the results, possibly taking into account any custom options passed in input and producing extra output.
The main output of the block are the predictions (categories and/or extraction) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more text features, layout information and any extra data generated by the JavaScript code, based on the block's functional parameters.
Knowledge Models
Knowledge Models are built-in symbolic models that cover a variety of text analysis use cases.
Their inner structure is the same of Studio-generated symbolic models.
Learn more about them in the dedicated section of this manual.
Models availability
Every model created with the Platform authoring application is made available to NL Flow by publishing it.
In the authoring application models can also be exported to be later imported in the same or another installation of both the authoring application and NL Flow.
Studio models can the imported in NL Flow. The model import functionality is available in the the Models view of the dashboard and in the My Models bar of the editor.
Knowledge Models are always available.
Block properties
Block properties can be set by editing the block.
Hosting service versions
When the workflow is published, the model of a block is run inside an hosting service that manages the communication to and from the model itself. The hosting service can be seen as a wrapper containing the actual model. In fact, you are free to edit a block and change its model without deleting the block and placing a new one: the model is a configuration property of the block.
In the inventory of components inside the editor there are multiple version of the component for each model type (Knowledge Models, ML, symbolic) and each version corresponds to a version of the hosting service. You choose the component version, and hence the version of the hosting service, when you add the model block to the diagram. Old versions of the components are available for backward compatibility, to allow you to edit workflows designed with an older version of NL Flow.
The properties of a block change slightly depending on the version of the hosting service, the differences are highlighted on a case-by-case basis below.
When designing new workflows, always choose the latest version of the component, to to have the model run in the latest versione of the hosting service.
General properties
The general properties of any model block are displayed at the top of the properties pop-up. They are:
- The block name, that can be edited.
- The version of the software service that hosts the block (read only).
- For ML Models only: the version of the model (read only).
- The block ID (read only).
Functional properties
Functional properties can be checked and set in the Functional tab of the properties pop-up.
- Output entities (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
entities
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - Output tokenization (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of text subdivisions that is
paragraphs
,sentences
,phrases
andtokens
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - Output content (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
content
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - CPK (for symbolic models): the symbolic model of the block.
-
Synchronize positions to original text (for symbolic models, basic mode ML models and Knowledge Models): toggles the synchronization of output positions to the original text. Many elements of the model's output can contain the position—start, end—of the portion of text they derive from, for example the parts of the text that triggered the prediction of a category or an extractions.
Models can pre-process&mdas;and thus alter—the original input text before starting the prediction process: for example, sequences of new line characters can be reduced to one new line character or multiple consecutive space characters collapsed to one space character. Symbolic models made with Studio can also perform find-and-replace operations through JavaScript.By default, positions refer to the text that has been analyzed, which can hence differ from the original. Analyzed text is returned in output in the
content
key, so if positions are used to highlight parts of the analyzed text, they are always accurate with respect to it.
This option must be turned on only if the user cannot needs positions referred to the original input text. When the option is turned on, the model post-processes positions so that they become accurate for the original text and returns the original input text instead of analyzed text in thecontent
key.Warning
The process cannot be guaranteed to be error-free: the more the model pre-processes the original text the more synchronized positions may be wrong.
-
Apply rules (for symbolic models, basic mode ML models and Knowledge Models): toggles the evaluation of any symbolic rules ny NL Core. Rules can populate
categories
,extractions
,segments
andtags
, based on their type. - Rules output namespace (for symbolic models, basic mode ML models and Knowledge Models): value to give to the
namespace
property of (categories and extractions). In ML models it only affects the output of NL Core, not that of the ML algorithm. - Output relevants (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the most important elements of the input text, that is
topics
,mainSentences
,mainPhrases
,mainSyncons
andmainLemmas
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - Output sentiment (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
sentiment
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - Output relations (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
relations
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. - Output dependency tree (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
pos
,dependency
andmorphology
fortokens
. The effect of the value of this property is subordinated to Output tokens and, for ML models, to Propagate symbolic engine output. - Output knowledge (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of
knowledge
. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output. -
Output external ids (for symbolic models, basic mode ML models and Knowledge Models): toggles the addition of the
externalIds
property to the items ofknowledge
.Inside the Knowledge Graph used by NL Core, concepts—called syncons—are identified by a unique number. This number is the value of the
syncon
property of the items of thetokens
and of theknowledge
output keys.
Syncons have further identification numbers, so-called external identifiers (one or more) that are not shown by default in the model output.
When turned on, this property determines the addition to the output, for each item of theknowledge
array, of theexternaIds
array listing those identifiers.
The effect of the values of this property is subordinated to Output knowledge and, for ML models, to Propagate symbolic engine output. -
Output rules extra data (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the
extraData
output key. Only models generated with Studio can optionally produce extra data. - Output segments (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the
segments
output key. Only Studio-generated symbolic models can optionally detect and output segments. - Required user properties for syncons (comma separated list) (for symbolic models and Knowledge Models) or Rules output user properties for syncons (comma separated list) (for basic mode ML models): it allows you to specify the user data you want to be included in the items of the output
knowledge
array. It's a comma separated list of user data names. - Output explanations (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service1): toggles the enrichment of the with information about any symbolic rules that were triggered and brought to the prediction of categories and/or extractions by NL Core.
- Output namespace metadata (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service1): toggles the output of
namespaces
. - Output document data (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service1): toggles the output of
documentData
. - Output layout information (for symbolic model and Knowledge Models with the latest version of the hosting service and 1): toggles the output of
layoutData
. - Output all decimals in scores (for symbolic models and Knowledge Models with the latest version of the hosting service) or Return all decimals numbers in scores (for basic mode ML models, depending on the version of the hosting service): returns categorization and extraction scores with all the decimal digits instead of maximum two.
- Output tags: (for symbolic models, basic mode ML models and Knowledge Models) toggles the output of
tags
. - ML Model (for ML models, both basic mode and advanced): the predictive model of the block.
- ML Engine: propagate document content to output (for basic mode ML models) and Propagate Content to Output
- ML Engine: output namespace (for basic mode ML models) or Desired namespace (for advanced mode ML modeld): toggles the overriding of the value of the
namespace
property of predictions (categories and extractions). - ML Engine: Output winner categories only (for basic mode ML models) and Only winners (for advanced mode ML models): toggles the output of the sole categories having relatively high scores, that is those with the
winner
property set totrue
. - ML Engine: Output non-winner categories max number (for basic mode ML models) and Output non-winner categories max number (for advanced mode ML models): maximum number of low score categories—those with the
winner
property set tofalse
—to output. - ML Engine: Output predictions explanations (for basic mode ML models) and Output predictions explanations (for advanced mode ML models): toggles the enrichment of the output with information about the text features that brought to the prediction of categories and/or extractions.
-
Sub-document segmentation strategy (for ML models, both in basic and in advanced mode, depending on the version of the hosting service): strategy used to identify the sub-documents on which to make category prediction.
Auto-ML categorization models for which the Enable strict "Sub document categorization" compatibility mode option has been enabled can predict categories for each sub-document found in its input text, returning, for each category, the boundaries of the sub-document in the overall text. This parameter determines what a sub-document is. Possible values are:
- None: sub-documents, if any, are ignored, the predictions refer to the whole document text.
-
Extract Converter title: when the input to the model is the output of an Extract Converter block, a sub-document is:
- Any sequence of layout blocks that begins with a block of type title and ends either at the end of the document or immediately before another block of type title.
- Any table.
Consecutive title-type blocks are treated as a whole; header and footer blocks are ignored.
-
Extract Converter block: when the input to the model is the output of an Extract Converter block, the sub-documents are all the blocks of the layout except those of type header and footer.
- CPK segments: when the input to the model comes from a symbolic model originally generated with Studio and that model is able to detect and output text segments, sub-documents are the segments that are present in the output of that model.
-
Propagate symbolic engine output (for basic mode ML models) or Propagate Symbolic to Output (for advanced mode ML models): toggles the inclusion of the NL Core output in the overall output. Basic mode ML models generate their own NL Core output, while advanced mode ML models receive it from an upstream block.
- Propagate Symbolic Categories and Extractions To Output (for basic mode ML models with v. 1.0.0 of the hosting service): toggles the addition of any categories and extractions produced by NL Core to
categories
andextractions
output keys.
If all output functional parameters are off, all model blocks based on NL Core still output language
and version
.
Input properties
The input properties are managed in the Input tab of the block properties dialog. These properties correspond to the input variables.
All the variables are listed, but some of them are mutually exclusive: if they are set at execution time, the others must be omitted. This is explained in the next article.
Input properties are read-only—so only descriptive of the expected input—when the block is the first in a flow and the workflow's input has not been explicitly described. In that case the workflow's input JSON must contain keys whose name and type match a valid combination of input variables.
Otherwise, they are editable and must be set.
Deployment properties
The block deployment properties of a model determine:
- The amount of computational resources (CPU and RAM) that each replica of the block or of each of its modules needs to work.
- The number of replicas of the block or of each of its modules.
- The processing timeout for the block or each of its modules.
- The number of threads of the software module that provides input to each replica of the block or of each of its modules when the workflow is published in asynchronous mode.
Deployment properties are checked and changed in the Deployment tab of the block properties pop-up. In case of blocks with an old version of the hosting service, some properties may be found in the Type specific tab.
- Consumer Number (for Knowledge Model blocks with the latest version of the hosting service, symbolic model blocks with the latest version of the hosting service and ML model blocks in advanced mode): number of threads of the software modules that provides input to the block when the workflow is published in asynchronous mode.
- CPU (for Knowledge Model blocks, symbolic model blocks and ML model blocks in advanced mode): thousandths of CPU required (for example: 2000 = 2 CPUs) for each replica of the block.
- Memory (for Knowledge Model blocks, symbolic model blocks and ML model block in advanced mode): RAM required by each replica of the block, expressed in IEC units (Ki = kibibytes, Mi = mebibytes, Gi = gibibytes, etc.).
- ML Engine Consumer Number (for basic mode ML models with the latest version of the hosting service): number of threads of the software modules that provides input to the predictive model of the block when the workflow is published in asynchronous mode.
- ML Engine CPU (for basic mode ML models): thousandths of CPU required by each replica of the predictive model of the block.
- ML Engine Memory (for basic mode ML models): RAM required by each replica of the predictive model of the block, expressed in IEC units.
- ML Engine Replicas (for basic mode ML models): number of replicas of the predictive model of the block.
- ML Engine Timeout (for basic mode ML models): maximum time, expressed in minutes (m) or seconds (s), within which processing of input text features by the predictive model of the block must finish. If processing takes longer, the block generates an error.
- Replicas (for Knowledge Model blocks, symbolic model blocks and ML models block in advanced mode): number of replicas of the block.
- Symbolic Engine Consumer Number:
- Symbolic Engine CPU (for basic mode ML models): thousandths of CPU required by each replica of the NL Core module of the block.
- Symbolic Engine Memory (for basic mode ML models): RAM required by each replica of the NL Core module of the block, expressed in IEC units.
- Symbolic Engine Replicas (for basic mode ML models): number of replicas of the NL Core module of the block.
- Symbolic Engine Timeout (for basic mode ML models): maximum time, expressed in minutes (m) or seconds (s), within which processing of any input by the NL Core module of the block must finish. If processing takes longer, the block generates an error.
- Timeout (for Knowledge Model blocks, symbolic model blocks and ML models block in advanced mode): maximum time, expressed in minutes (m) or seconds (s), within which processing of any input must finish. If processing takes longer, the block generates an error.
-
The actual ability of the block to produce this output depends on the version of NL Core, which must be 4.12 or later. You can determine the version of NL Core for a Knowledge Model block or symbolic model block by selecting Show resources
on the contextual mmenu of the block inside the editor or looking at the Resources area after selecting the model in the Models view of the dashboard. ↩↩↩↩