Model types and properties

Overview

Model blocks are the key components of NL Flow workflows.
Their main task is to use a model—either ML or symbolic—to make predictions about the input text in terms of categories—the topics covered in the text or the typology of the document inferred from its text—or extracted information. Depending on their type, however, model blocks can yield many other useful information.

Model types

These types of model can be used as blocks when designing a workflow:

Type	Block icon	Description
Basic mode ML models		Platform-generated ML models, which embed NL Core for feature extraction
Advanced mode ML models		Platform-generated ML models used without NL Core
Symbolic models		Symbolic models, either generated with the Platform authoring application or with Studio
Knowledge Models		NL Flow built-in symbolic models

Basic mode ML models

ML models are generated in Platform authoring application. Basic mode refers to the way the corresponding blocks are put in the workflow, that is without any changes to the model structure, which embeds NL Core.
A basic mode ML model block is made of two modules, as illustrated below: one or more replicas of NL Core and one or more replicas of the predictive ML model, as illustrated below.

The input to the block is a JSON containing plain text, unless the model has been trained with annotations referring to the position of the text in the graphical rendering of the document: in that case the input JSON must contain text enriched with graphical layout information, that is the output of the Extract Converter processor.
Input can also include options affecting NL Core.

NL Core performs NLU analysis of the text, extracting the text features with which the ML prediction model is then fed.

The main output of the block are the predictions (categories or extractions) plus, optionally, selected portions of the output of NL Core, determined by the block's functional parameters.

Advanced mode ML models

Advanced mode ML models are ML models that are put in a workflow with the Advanced mode option. The effect of this option is to remove NL Core, leaving only the predictive model. The inner structure of an advanced mode workflow block for a ML model is illustrated below.

The input to the model block is a JSON containing text features. In fact, predictions are based on text features, but since the block doesn't contain NL Core, it cannot extract them by itself.

To provide the block with the necessary features, a symbolic model or a Knowledge Model like NLP Core is placed in the workflow upstream of the advanced mode ML model block and its output mapped to the input of the ML model.

The main output of the block are the predictions (categories or extractions) plus, optionally, the echo of the input text features.

Advanced mode is meant for workflows requiring more than one ML model: if every ML model is put in the workflow with the advanced mode option, one upstream symbolic model is enough to feed all the advanced mode ML model blocks with text features, as illustrated in the picture below.

The result is a leaner and faster workflow: leaner because advanced mode ML models, lacking the NLU engine, require less computing resources to be published; faster because NLU analysis is performed only once per workflow activation instead of once per model.

Note

Advanced mode is available for ML models based on expert.ai ML technology version 3.0 or higher.

Symbolic models

Symbolic models use NL Core to get symbolic information about the input text and to evaluate categorization or extraction rules that use that symbolic information as their operands.

Note

Since the rules are human readable, it is always possible for a human to understand the reason for a result, hence these models are called explainable.

Platform-generated symbolic models

Platform authoring application allows for the automatic generation of symbolic models for categorization, extraction and thesaurus projects. Symbolic rules are produced as the result of model training.
The inner structure of the workflow block for a Platform-generated symbolic model is illustrated below.

The model block is made of an instance of NL Core.

It expects a JSON containing plain text as its input, any other input is ignored. Input can also include options affecting NL Core.

NL Core performs NLU analysis of the text, extracting text features that are then used by NL Core itself as the operands of automatically generated symbolic rules. The type of rules—categorization or extraction—depends on the Platform project type: categorization for categorization projects, extraction for extraction and thesaurus projects.

The main output of the block are the predictions (categories or extractions) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more of those features, based on the block's functional parameters.

Studio-generated symbolic models

If more human supervision is needed, symbolic models can be refined or even designed form scratch using Studio. Generated models can then be uploaded to the Platform's authoring application, to test them, and to NL Flow to use them in workflows.
When needed, Studio lets you generate models that combine categorization and extraction rules, something that is not doable with Platform generated models. Studio also allows you to fully exploit all the capabilities of NL Core, like segmentation rules and custom JavaScript.
The inner structure of the workflow block for a Studio-generated symbolic model is illustrated below.

The block accepts a JSON with plain text, "enriched" text, that is text with graphical layout information (the output of the Extract Converter processor) or text divided in sections, plus optional additional input, like side-by-side document data and options. See the next article for a description of the input variables.

NL Core performs NLU analysis of the text, producing text features that are used by NL Core itself as the operands of hand written categorization or extraction rules. Hand written rules can exploit all the expressiveness of the rules language, something that automatically generated rules do not.
NL Core also executes any JavaScript code with which the developers of the model can affect the document processing pipeline, for example pre-processing the input text and/or post-processing the results, possibly taking into account any custom options passed in input and producing extra output.

The main output of the block are the predictions (categories and/or extraction) plus the basic features of the text extracted by NL Core. Optionally it's possible to output more text features, layout information and any extra data generated by the JavaScript code, based on the block's functional parameters.

Knowledge Models

Knowledge Models are built-in symbolic models that cover a variety of text analysis use cases.
Their inner structure is the same of Studio-generated symbolic models.
Learn more about them in the dedicated section of this manual.

Models availability

Every model created with the Platform authoring application is made available to NL Flow by publishing it.
In the authoring application models can also be exported to be later imported in the same or another installation of both the authoring application and NL Flow.
Studio models can the imported in NL Flow. The model import functionality is available in the Models view of the dashboard and in the My Models bar of the editor.
Knowledge Models are always available.

Block properties

Block properties can be set by editing the block.

Hosting service versions

When the workflow is published, the model of a block is run inside an hosting service that manages the communication to and from the model itself. The hosting service can be seen as a wrapper containing the actual model. In fact, you are free to edit a block and change its model without deleting the block and placing a new one: the model is a configuration property of the block.

In the inventory of components inside the editor there are multiple version of the component for each model type (Knowledge Models, ML, symbolic) and each version corresponds to a version of the hosting service. You choose the component version, and hence the version of the hosting service, when you add the model block to the diagram. Old versions of the components are available for backward compatibility, to allow you to edit workflows designed with an older version of NL Flow.
The properties of a block change slightly depending on the version of the hosting service, the differences are highlighted on a case-by-case basis below.

When designing new workflows, always choose the latest version of the component, to to have the model run in the latest versione of the hosting service.

General properties

The general properties of any model block are displayed at the top of the properties pop-up. They are:

The block name, that can be edited.
The version of the software service that hosts the block (read only).
For ML Models only: the version of the model (read only).
The block ID (read only).

Functional properties

Functional properties can be checked and set in the Functional tab of the properties pop-up.

Output entities (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of entities. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output tokenization (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of text subdivisions that is paragraphs, sentences, phrases and tokens. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output content (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of content. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
CPK (for symbolic models): the symbolic model of the block.

Synchronize positions to original text (for symbolic models, basic mode ML models and Knowledge Models): toggles the synchronization of output positions with the input text.

Many elements of the model's output can contain the position—start, end—of the portion of text they derive from, for example the parts of the text that triggered the prediction of a category or an extractions, but also paragraphs, sentences, tokens, etc.
Models can pre-process—and thus alter—the input text before analyzing it: for example, sequences of new line characters are reduced to one new line character and multiple consecutive space characters between words are collapsed to one space character. The analyzed text can hence be shorter than the original input text.

When this parameter is off and Output content is on, the content key of the output JSON contains the result of pre-processing, that is the text that was actually analyzed. If the user needs to highlight portions of text for which positions are provided, he must use the value of the content output key instead of the input text, because positions are accurate with respect to the analyzed text, not the original text.
When this option is on, instead, positions are computed on the analyzed text but before being returned they are rebased so to become accurate for the input text. There is no need to produce the content key because hihghlight works fine with the input text. However, it remains possible to turn on Output content: in that case the content key is the echo of the input text.
If the content output key is produced, then, positions are always accurate with respect to its value.

To sum up:

Value of Synchronize position to original text	Positions are accurate for:	`content` is necessary for highlighting?	`content`, when present, contains:
OFF	Analyzed text, possibly different from the input text	Yes	Analyzed text
ON	The input text	No	The input text

Warning

The position rebasing process cannot be guaranteed to be error-free: the more the model pre-processes the original text the more synchronized positions may be wrong.

Apply rules (for symbolic models, basic mode ML models and Knowledge Models): toggles the evaluation of any symbolic rules ny NL Core. Rules can populate categories, extractions, segments and tags, based on their type.
Rules output namespace (for symbolic models, basic mode ML models and Knowledge Models): value to give to the namespace property of (categories and extractions). In ML models it only affects the output of NL Core, not that of the ML algorithm.
Output relevants (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the most important elements of the input text, that is topics, mainSentences, mainPhrases, mainSyncons and mainLemmas. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output sentiment (for symbolic models, basic mode ML models and Knowledge Models): toggles the output ofsentiment. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output relations (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of relations. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output dependency tree (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of pos, dependency and morphology for tokens. The effect of the value of this property is subordinated to Output tokens and, for ML models, to Propagate symbolic engine output.
Output knowledge (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of knowledge. For ML models, the effect of the values of this property is subordinated to Propagate symbolic engine output.
Output external ids (for symbolic models, basic mode ML models and Knowledge Models): toggles the addition of the externalIds property to the items of knowledge.

Inside the Knowledge Graph used by NL Core, concepts—called syncons—are identified by a unique number. This number is the value of the syncon property of the items of the tokens and of the knowledge output keys.
Syncons have further identification numbers, so-called external identifiers (one or more) that are not shown by default in the model output.
When turned on, this property determines the addition to the output, for each item of the knowledge array, of the externaIds array listing those identifiers.
The effect of the values of this property is subordinated to Output knowledge and, for ML models, to Propagate symbolic engine output.
Output rules extra data (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the extraData output key. Only models generated with Studio can optionally produce extra data.
Output segments (for symbolic models, basic mode ML models and Knowledge Models): toggles the output of the segments output key. Only Studio-generated symbolic models can optionally detect and output segments.
Required user properties for syncons (comma separated list) (for symbolic models and Knowledge Models) or Rules output user properties for syncons (comma separated list) (for basic mode ML models): it allows you to specify the user data you want to be included in the items of the output knowledge array. It's a comma separated list of user data names.
Output explanations (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service¹): toggles the enrichment of the with information about any symbolic rules that were triggered and brought to the prediction of categories and/or extractions by NL Core.
Output namespace metadata (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service¹): toggles the output of namespaces.
Output document data (for symbolic models, basic mode ML models and Knowledge Models blocks with the latest version of the hosting service¹): toggles the output of documentData.
Output layout information (for symbolic model and Knowledge Models with the latest version of the hosting service and ¹): toggles the output of layoutData.
Output all decimals in scores (for symbolic models and Knowledge Models with the latest version of the hosting service) or Return all decimals numbers in scores (for basic mode ML models, depending on the version of the hosting service): returns categorization and extraction scores with all the decimal digits instead of maximum two.
Output tags: (for symbolic models, basic mode ML models and Knowledge Models) toggles the output of tags.
Enable RPC Mode: (for symbolic models) toggles the Remote Procedure Call (RPC) mode.
When RPC mode is enabled, the software that "wraps" the model uses the model itself through a remote procedure call instead of a direct call. This is a decoupling technique that can determine a slightly higher processing time, but allows the wrapper to interrupt the model in case of timeout (see the deployment properties below). If RPC mode is disabled it can happen that a block replica remains unresponsive until the previous processing completes even if a timeout occurred.
ML Model (for ML models, both basic mode and advanced): the predictive model of the block.
ML Engine: propagate document content to output (for basic mode ML models) and Propagate Content to Output
ML Engine: output namespace (for basic mode ML models) or Desired namespace (for advanced mode ML modeld): toggles the overriding of the value of the namespace property of predictions (categories and extractions).
ML Engine: Output winner categories only (for basic mode ML models) and Only winners (for advanced mode ML models): toggles the output of the sole categories having relatively high scores, that is those with the winner property set to true.
ML Engine: Output non-winner categories max number (for basic mode ML models) and Output non-winner categories max number (for advanced mode ML models): maximum number of low score categories—those with the winner property set to false—to output.
ML Engine: Output predictions explanations (for basic mode ML models) and Output predictions explanations (for advanced mode ML models): toggles the enrichment of the output with information about the text features that brought to the prediction of categories and/or extractions.
Sub-document segmentation strategy (for ML models, both in basic and in advanced mode, depending on the version of the hosting service): strategy used to identify the sub-documents on which to make category prediction.

Auto-ML categorization models for which the Enable strict "Sub document categorization" compatibility mode option has been enabled can predict categories for each sub-document found in its input text, returning, for each category, the boundaries of the sub-document in the overall text. This parameter determines what a sub-document is. Possible values are:
- None: sub-documents, if any, are ignored, the predictions refer to the whole document text.
- Extract Converter title: when the input to the model is the output of an Extract Converter block, a sub-document is:
  - Any sequence of layout blocks that begins with a block of type title and ends either at the end of the document or immediately before another block of type title.
  - Any table.
  Consecutive title-type blocks are treated as a whole; header and footer blocks are ignored.
- Extract Converter block: when the input to the model is the output of an Extract Converter block, the sub-documents are all the blocks of the layout except those of type header and footer.
- CPK segments: when the input to the model comes from a symbolic model originally generated with Studio and that model is able to detect and output text segments, sub-documents are the segments that are present in the output of that model.
Propagate symbolic engine output (for basic mode ML models) or Propagate Symbolic to Output (for advanced mode ML models): toggles the inclusion of the NL Core output in the overall output. Basic mode ML models generate their own NL Core output, while advanced mode ML models receive it from an upstream block.
Propagate Symbolic Categories and Extractions To Output (for basic mode ML models with v. 1.0.0 of the hosting service): toggles the addition of any categories and extractions produced by NL Core to categories and extractions output keys.

If all output functional parameters are off, all model blocks based on NL Core still output language and version.

Input properties

The input properties are managed in the Input tab of the block properties dialog. These properties correspond to the input variables, described in the dedicated article.

Read the article dedicated to the topic to learn how to set input properties.

Deployment properties

The block deployment properties of a model determine:

The amount of computational resources (CPU and RAM) that each replica of the block or of each of its modules needs to work.
The number of replicas of the block or of each of its modules.
The processing timeout for the block or each of its modules.
The number of threads of the software module that provides input to each replica of the block or of each of its modules when the workflow is published in asynchronous mode.

Deployment properties are checked and changed in the Deployment tab of the block properties pop-up. In case of blocks with an old version of the hosting service, some properties may be found in the Type specific tab.

Consumer Number (for Knowledge Model blocks with the latest version of the hosting service, symbolic model blocks with the latest version of the hosting service and ML model blocks in advanced mode): number of threads of the software modules that provides input to the block when the workflow is published in asynchronous mode.
CPU (for Knowledge Model blocks, symbolic model blocks and ML model blocks in advanced mode): thousandths of CPU required (for example: 2000 = 2 CPUs) for each replica of the block.
Memory (for Knowledge Model blocks, symbolic model blocks and ML model block in advanced mode): RAM required by each replica of the block, expressed in IEC units (Ki = kibibytes, Mi = mebibytes, Gi = gibibytes, etc.).
ML Engine Consumer Number (for basic mode ML models with the latest version of the hosting service): number of threads of the software modules that provides input to the predictive model of the block when the workflow is published in asynchronous mode.
ML Engine CPU (for basic mode ML models): thousandths of CPU required by each replica of the predictive model of the block.
ML Engine Memory (for basic mode ML models): RAM required by each replica of the predictive model of the block, expressed in IEC units.
ML Engine Replicas (for basic mode ML models): number of replicas of the predictive model of the block.
ML Engine Timeout (for basic mode ML models): maximum time, expressed in minutes (m) or seconds (s), within which processing of input text features by the predictive model of the block must finish. If processing takes longer, the block generates an error.
Replicas (for Knowledge Model blocks, symbolic model blocks and ML models block in advanced mode): number of replicas of the block.
Symbolic Engine Consumer Number:
Symbolic Engine CPU (for basic mode ML models): thousandths of CPU required by each replica of the NL Core module of the block.
Symbolic Engine Memory (for basic mode ML models): RAM required by each replica of the NL Core module of the block, expressed in IEC units.
Symbolic Engine Replicas (for basic mode ML models): number of replicas of the NL Core module of the block.
Symbolic Engine Timeout (for basic mode ML models): maximum time, expressed in minutes (m) or seconds (s), within which processing of any input by the NL Core module of the block must finish. If processing takes longer, the block generates an error.
Timeout (for Knowledge Model blocks, symbolic model blocks and ML models block in advanced mode): maximum time, expressed in minutes (m) or seconds (s), within which processing of any input must finish. If processing takes longer, the block generates an error.

The actual ability of the block to produce this output depends on the version of NL Core, which must be 4.12 or later. You can determine the version of NL Core for a Knowledge Model block or symbolic model block by selecting Show resources on the contextual mmenu of the block inside the editor or looking at the Resources area after selecting the model in the Models view of the dashboard. ↩↩↩↩