Input for model blocks

First level keys

The top-level keys of the input JSON that a model block recognizes and can use depend on the presence of NL Core inside the model. If the model has this component, as it is in the case of symbolic models and basic mode ML models, it always recognizes these keys:

text (string)
sections (array)
sectionsText (object)
documentLayout (object)
options (object)

If the symbolic component is based on NL Core version 4.12 or later, the block also recognizes this key:

documentData (array)

Tip

You can determine the version of NL Core for a symbolic model by selecting Show resources in the editor or looking at the Resources area after selecting the model in the Models view of the main dashboard.

In general, the block always expects a text to analyze, so one key between text, sectionsText and documentLayout is mandatory (see details below), while the other keys are optional.

Advanced mode ML models don't have NL Core and the only input key they recognize is:

document (object)

In this case the block doesn't expect a text to analyze: instead it expects text features, that is the outcome of the NLU analysis of a text.

text

text is text that must be analyzed by NL Core.
When input mapping is needed, this key is typically mapped, through the corresponding text input property, to:

The modelName.document.content key of another model block.
The content key of a TikaTesseract Converter processor or a URL Converter processor block.

text is alternative to documentLayout: if one of these keys is present in input, the other must be omitted.
text can be complemented by sections and sectionsText for Studio-generated symbolic models whose rules can distinguish between text sections.

documentLayout

documentLayout is an object with the same structure of the result key of Extract Converter processor output, so a model using it is typically preceded by an Extract Converter block and this key is mapped through the corresponding documentLayout input property to that output key.
It must be used for Studio-generated symbolic models with rules that leverage layout information and for extraction ML models trained with layout-based annotations.

Note

Any model with NL Core recognizes this key and is able to derive plain text to analyze from it, but there is no point in passing layout information to a model that is not specialized to leverage it.

If documentLayout is present in the input JSON, text, sections and sectionsText—which are alternative means of giving input text to the block—must be omitted.

sections

The sections key is optional and complementary to text. When present, it indicates the boundaries of text sections, for example:

"sections": [
    {
        "name": "TITLE",
        "start": 0,
        "end": 61
    },
    {
        "name": "BODY",
        "start": 62,
        "end": 2407
    }
]

Currently only symbolic models designed with Studio can contain hand-written symbolic rules that account for sections. In particular, with multiple sections, rules can be written that are triggered only by the text of a given section, while Platform generated rules have all the same scope—even if the input document has sections—that is the entire input text.

sections is an array. Each item corresponds to a section and it's an object with these properties:

name: section name.
start: zero-based position of the first character in the section inside the value of text.
end: zero-based position of the first character after the section inside the value of text.

If input mapping is needed, the expected mapping of the corresponding sections input property is mapped to a key of the workflow input or the modelName.document.sections property of an upstream model block which in turn received sections data.

sectionsText

The sectionsText key is text to be analyzed divided into sections, for example:

"sectionsText": [
    {
        "name": "TITLE",
        "text": "This is a title"
    },
    {
        "name": "BODY",
        "text": "This is the body"
    }
]

sectionsText is an array of objects. Each object has these properties:

name: section name
text section text

The model builds plain text to analyze by concatenating the values of the text properties of the array items using a newline character as a separator.
If the text key is also set, the text obtained from sectionsText is appended to the one represented by text using a newline character as a separator, so the model receives a text that is the result of the concatenation of two texts. The model also receives automatically computed section boundaries referred to the concatenated text.

For example:

Value of text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.

Value of sectionsText:

[
    {
        "name": "TITLE",
        "text": "President John F. Kennedy delivered his inaugural address"
    }
]

Concatenated plain text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.
President John F. Kennedy delivered his inaugural address

Sections boundaries:
- Section name: TITLE
- Start: 142
- End: 199

When input mapping is needed, the corresponding sectionsText input property is mapped to one key of the workflow input or to the modelName.document.sectionsText property of a model block which in turn received that data.

options

Theoptions object contains optional parameters that can be passed to the model to influence its behavior. They mainly affect NL Core.

The most extensive structure that this object can have is this:

"allCategories": boolean,
"custom": object,
"disambiguation": {
  "flags": number
},
"output": object,
"rules": object

or, for old models, this:

"allCategories": boolean,
"custom": object

Old models have NL Core version 4.11 or lower.

Tip

You can determine the version of NL Core for a symbolic model by selecting Show resources in the editor or looking at the Resources area after selecting the model in the Models view of the main dashboard. For basic mode ML models, the version of NL Core is tied to that of the ML engine, which is visible when you select the model from the list.

All the components of this structure are optional; they are described below.

allCategories

Retained for backwards compatibility, this option is equivalent to the allCategories property of the rules object.

custom

Retained for backwards compatibility, this option is equivalent to the customOptions property of the rules object.

disambiguation

This is an advanced option for NL Core.
It is meant to be used with the support of your expert.ai technical contact should he determine that the tuning of low-level options can improve the quality of NLU analysis.
When used, this option contains, in its only flags parameter, a number representing one or more disambiguation options. Multiple options are combined in binary OR.

output

The most extensive structure that this object can have is the following:

 "output": {
    "analysis": string array,
    "features": string array,
    "knowledgeProperties": string array
}

All the components of this structure are optional.
These options affect the output of NL Core. The properties specified for this object override the values of corresponding functional properties of the model block. These are the correspondences:

analysis array items:

The presence of an item in the analysis array is equivalent to turn on the corresponding functional property.

Item value Functional property

relevants Output relevants

sentiment Output sentiment

relations Output relations

segments Output segments

features array items:

The presence of an item in the features array is equivalent to turn on the corresponding functional property.

Item value	Functional property
syncpos	Synchronize positions to original text
dependency	Output dependency tree
knowledge	Output knowledge
externalIds	Output external ids
extradata	Output rules extra data
explanations	Output explanations
namespaces	Output namespace metadata
documentData	Output document data
layout	Output layout information

knowledgeProperties array: this array replaces the value of the Required user properties for syncons functional property.

rules

The most extensive structure that this object can have is the following:

"rules": {
  "allCategories": boolean,
  "applyRules": boolean,
  "customOptions": object,
  "namespace": string
}

All the components of this structure are optional; they are described below.

allCategories

When its value is false, a categorization model returns only the categories with the highest scores, that is those with the winner property set to true. The default value is true.

applyRules

The value of this option overrides that of the Apply rules functional property.

customOptions

This object can be used to convey custom options to Studio-generated symbolic models and thesaurus models that access them via specific JavaScript code.

scoreConfig

Thesaurus models are based on NL Core and contain automatically generated JavaScript code that implements a scoring algorithm that can affect the confidence score of the extracted concept.
The scoring algorithm is based on a configuration which can be changed with the scoreConfig property of customOptions.
The default configuration for the scoring algorithm corresponds to this scoreConfig object:

"scoreConfig": {
    "disableScore": false,
    "defaultScore": 1,
    "normalize": 100,
    "boostByHierarchy": {
        "byParent": 1,
        "byChildren": 0.5,
        "byRelated": 0.3
    },
    "boostByFrequency": true,
    "boostByLabel": {
        "matchPrefLabel": 1,
        "matchAltLabel": 0.5,
        "lengthMeasure": 0.1,
        "ignoreCase": true
    }
}

If you pass one or more of the configuration settings above, with non default values, to the model, you affect the scoring algorithm.
These are the properties of the scoreConfig object:

disableScore (boolean, default value false): if true, a Studio-like scoring algorithm is used. All the other options are ignored, so you can omit them.
defaultScore (number, default value 1): default base score for all extractions. Ignored if boostByFrequency is true.
normalize (number, default value 100): the final score of extraction will be normalized to a value in the range between 0 and the value of this parameter. Use value 0 to disable score normalization.
boostByHierarchy: this property is an object whose properties are multiplication factors that are applied to the base score based on the relationship between the extracted concept and other concepts in the thesaurus.
- byParent (number, default value 1): applied for every broader concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
- byChildren (number, default value 0.5): applied for every narrower concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
- byRelated (number, default value 0.3): applied for every non-hierarchically related concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
boostByFrequency (boolean, default value true): when true, the base score is the concept frequency in the text.
boostByLabel: this property is an object whose properties determine how the base score is affected by the relationship between the extracted text and the concept labels.
- matchPrefLabel (number, default value 1): multiplication factor applied to the base score if the matching text is the preferred label.
- matchAltLabel (number, default value 0.5): multiplication factor applied to the base score if the matching text is one of alternative labels.
- lengthMeasure (number, default value 0.1): multiplication factor applied to the base score that is further multiplied by the number of tokens—separated by space—of the match.
- ignoreCase (boolean, default value true): when true, the case is ignored when matching the text and the labels of the concept.

normalizeToConceptId

The normalizeToConceptId property of the customOptions object is a boolean that, when true, makes a thesaurus model add to its output extra data containing additional thesaurus information for extracted concepts.

namespace

The value of this option overrides that of the Rules output namespace functional property.

documentData

The documentData input key is optional and, when present, contains side-by-side information about the document that can be used by a symbolic model.

It is an array, each item of which represents one piece of information. The type of information is indicated by the mandatory type property:

disambiguation: a text token or a disambiguation which, for the text ranges indicated by positions (see below), overwrites the choices made by the model's text analysis.
entity: reserved for future use.
tag: a tag which, in the positions indicated by positions (see below), is added to any other tags that a CPK developed with Studio can produce via tagging rules or JavaScript and that the same CPK can exploit in categorization or extraction rules.
annotation: reserved for future use.

If type is disambiguation, the item also has a disambiguationOptions object property.
The disambiguationOptions object has a mandatory property type which can be either token or semantic. If it's token, it's also the only property of the object and means that positions contains the ranges on one or more tokens that are alternative to those the the text analysis would find when tokenizing the text.
If type is semantic, instead, the remaining properties of the disambiguationOptions object specify an alternative disambiguation for the text ranges indicated in positions. These properties are:

baseForm: base form, that is the lemma
entityId: a numeric ID of choice, used to identify any documentData disambiguation item referred to the same named entity
extraData: reserved for future use
parentSyncon: identification number of the "parent" syncon in the Knowledge Graph

If type is tag, the item also has a tagOptions object property which in turn has these properties:

tag: name of the tag
value: optional value of the tag; if omitted, the values of the tag are the portions of text indicated in positions

Each item of the positions array is a characters range.
In the case of information of type disambiguation, if the sub-type is token, each range corresponds to a different token, if instead it is semantic they are occurrences of the concept.
In the case of tag type information, each range corresponds to an occurrence of the tag, possibly with the same value, if specified.
Each item of the array is an object with two properties, start and end, which must be valued with the same logic as the positions of output elements.

document

Blocks corresponding to ML models placed in the workflow in advanced mode expect an input JSON with top-level key document. This key is an object with the same structure as the output of a symbolic model.
The reason for this is that the block doesn't have NL Core, in only contains the prediction model. It doesn't expect a text to analyze, it expects the features of the text extracted by a NLU analysis of the text that is can't perform. Features are the basis of model's predictions.
Any upstream block with NL Core can be used to perform feature extraction, for example you can use the NLP Core knowledge model, then the document input property must be mapped to the key with the same name in the output of the feature extraction block.

Item value	Functional property
relevants	Output relevants
sentiment	Output sentiment
relations	Output relations
segments	Output segments