Input for model blocks

First level keys

The top level keys of the input JSON that a model block is able to interpret are:

text (string)
sections (array)
sectionsText (object)
documentLayout (object)
options (object)

Some keys are mutually exclusive, because they represent alternative ways of supplying the text to be parsed to the model block.

The sections key can only be present in combination with text while sectionsText and text can be used alone or in combination.

text

text is the text that is analyzed by the model.
This input property is usually mapped to:

The modelName.document.content key of a model block's output.
The content key of TikaTessarct Converter processor output.
The content key of URL Converter processor output.

text is alternative to documentLayout: if one of these keys is present, the other must be omitted.
text can be complemented by sections and sectionsText for symbolic models that use text sections.

documentLayout

documentLayout is an object with the same structure of the result key of Extract Converter output, so a model using it is often preceded by an Extract Converter block and this input property is mapped to that output key.
It must be used for symbolic models needing layout information and for extraction ML models trained with layout-based annotations. The model can parse the object to take the text to analyze plus original document's graphical layout information.

If documentLayout is present in the input JSON, text, sections and sectionsText—which are alternative means of giving input text to the block—must be omitted.

sections

The sections key is complementary to text and indicates the boundaries of text sections for symbolic models that can leverage this information, for example:

"sections": [
    {
        "name": "TITLE",
        "start": 0,
        "end": 61
    },
    {
        "name": "BODY",
        "start": 62,
        "end": 2407
    }
]

Currently only symbolic models built with Studio can account for sections.
sections is an array. Each item corresponds to a section and it's an object with these properties:

name: section name.
start: zero-based position of the first character in the section inside the value of text.
end: zero-based position of the first character after the section inside the value of text.

The expected mapping of the corresponding input property is with the workflow input or the modelName.document.sections property of a model block which in turn received sections data.

sectionsText

The sectionsText key represents text to be analyzed divided into sections, for example:

"sectionsText": [
    {
        "name": "TITLE",
        "text": "This is a title"
    },
    {
        "name": "BODY",
        "text": "This is the body"
    }
]

sectionsText is an array of objects. Each object has these properties:

name: section name
text section text

The model builds plain text to analyze by concatenating the values of the text properties of the array items using a newline character as a separator.
If the text key is also set, the text obtained from sectionsText is appended to the one represented by text using a newline character as a separator, so the model receives a text that is the result of the concatenation of two texts. The model also receives automatically computed section boundaries referred to the concatenated text.

For example:

Value of text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.

Value of sectionsText:

[
    {
        "name": "TITLE",
        "text": "President John F. Kennedy delivered his inaugural address"
    }
]

Concatenated plain text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.
President John F. Kennedy delivered his inaugural address

Sections boundaries:
- Section name: TITLE
- Start: 142
- End: 199

The expected mapping of the corresponding input property is with the workflow input or the modelName.document.sections property of a model block which in turn received sections data.

options

Theoptions object corresponds to optional parameters that can be passed to the model to influence its behavior.

allCategories

The allCategories property of the options object is a boolean. When its value is false, a categorization model returns only the categories with the highest scores.

custom

The custom property of the options object is an object that can be used to convey options to the JavaScript code inside a symbolic model.

Thesaurus models contain such a code, which is automatically generated together with the model.
It is also possible to insert this kind of code in Studio projects, so that the code is then included in the project model.

The JavaScript code uses the getOptions method of the predefined CTX object to access the options.

scoreConfig

With the scoreConfig property of the custom object it is possible to customize the scoring algorithm of thesaurus models.
For example:

"options": {
    "custom": {
        "scoreConfig": {
            "disableScore": false,
            "defaultScore": 1,
            "normalize": 100,
            "boostByHierarchy": {
                "byParent": 1,
                "byChildren": 0.5,
                "byRelated": 0.3
            },
            "boostByFrequency": true,
            "boostByLabel": {
                "matchPrefLabel": 1,
                "matchAltLabel": 0.5,
                "lengthMeasure": 0.1,
                "ignoreCase": true
            }
        }
    }
}

These are the properties of the scoreConfig object:

disableScore (boolean, default value false): if true, a Studio-like scoring algorithm is used. All the other options are ignored, so you can omit them.
defaultScore (number, default value 1): default base score for all extractions. Ignored if boostByFrequency is true.
normalize (number, default value 100): the final score of extraction will be normalized to a value in the range between 0 and the value of this parameter. Use value 0 to disable score normalization.
boostByHierarchy: this property is an object whose properties are multiplication factors that are applied to the base score based on the relationship between the extracted concept and other concepts in the thesaurus.
- byParent (number, default value 1): applied for every broader concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
- byChildren (number, default value 0.5): applied for every narrower concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
- byRelated (number, default value 0.3): applied for every non-hierarchically related concept that is also extracted from the text. Value 0 is interpreted as no multiplication.
byFrequency (boolean, default value true): when true, the base score is the concept frequency in the text.
byLabel: this property is an object whose properties determine how the base score is affected by the relationship between the extracted text and the concept labels.
- matchPrefLabel (number, default value 1): multiplication factor applied to the base score if the matching text is the preferred label.
- matchAltLabel (number, default value 0.5): multiplication factor applied to the base score if the matching text is one of alternative labels.
- lengthMeasure (number, default value 0.1): multiplication factor applied to the base score that is further multiplied by the number of tokens—separated by space—of the match.
- ignoreCase (boolean, default value true): when true, the case is ignored when matching the text and the labels of the concept.

You can omit the properties whose default value if fine for you. If all the default values are fine, you can omit scoreConfig and if you don't need to specify other options you can omit the options property altogether.

normalizeToConceptId

The normalizeToConceptId property of the custom object is a boolean that, when true, makes a thesaurus model add to its output additional thesaurus information for extracted concepts.
For example:

"options": {
    "custom": {
        "normalizeToConceptId": true
    }
}