Input structure

First level keys

The top level keys of the input JSON that a model block is able to interpret are:

text (string)
sections (array)
sectionsText (object)
documentLayout (object)
options (object)

Some keys are mutually exclusive, because they represent alternative ways of supplying the text to be parsed to the model block.

The sections key can only be present in combination with text while sectionsText and text can be used alone or in combination.

text

text is the text that is analyzed by the model.
This input property is usually mapped to:

The modelName.document.content key of a model block's output.
The content key of Tika Converter processor output.
The content key of URL Converter processor output.

text is alternative to documentLayout: if one of these keys is present, the other must be omitted.
text can be complemented by sections and sectionsText for symbolic models that use text sections.

documentLayout

documentLayout is an object with the same structure of the result key of Extract Converter output, so a model using it is often preceded by an Extract Converter block and this input property is mapped to that output key.
It must be used for symbolic models needing layout information and for extraction ML models trained with layout-based annotations. The model can parse the object to take the text to analyze plus original document's graphical layout information.

If documentLayout is present in the input JSON, text, sections and sectionsText—which are alternative means of giving input text to the block—must be omitted.

sections

The sections key is complementary to text and indicates the boundaries of text sections for symbolic models that can leverage this information, for example:

"sections": [
    {
        "name": "TITLE",
        "start": 0,
        "end": 61
    },
    {
        "name": "BODY",
        "start": 62,
        "end": 2407
    }
]

Currently only symbolic models built with Studio can account for sections.
sections is an array. Each item corresponds to a section and it's an object with these properties:

name: section name.
start: zero-based position of the first character in the section inside the value of text.
end: zero-based position of the first character after the section inside the value of text.

The expected mapping of the corresponding input property is with the workflow input or the modelName.document.sections property of a model block which in turn received sections data.

sectionsText

The sectionsText key represents text to be analyzed divided into sections, for example:

"sectionsText": [
    {
        "name": "TITLE",
        "text": "This is a title"
    },
    {
        "name": "BODY",
        "text": "This is the body"
    }
]

sectionsText is an array of objects. Each object has these properties:

name: section name
text section text

The model builds plain text to analyze by concatenating the values of the text properties of the array items using a newline character as a separator.
If the text key is also set, the text obtained from sectionsText is appended to the one represented by text using a newline character as a separator, so the model receives a text that is the result of the concatenation of two texts. The model also receives automatically computed section boundaries referred to the concatenated text.

For example:

Value of text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.

Value of sectionsText:

[
    {
        "name": "TITLE",
        "text": "President John F. Kennedy delivered his inaugural address"
    }
]

Concatenated plain text:

We shall pay any price, bear any burden, meet any hardship, support any friend, oppose any foe to assure the survival and success of liberty.
President John F. Kennedy delivered his inaugural address

Sections boundaries:
- Section name: TITLE
- Start: 142
- End: 199

The expected mapping of the corresponding input property is with the workflow input or the modelName.document.sections property of a model block which in turn received sections data.

options

Theoptions object corresponds to optional parameters that can be passed to the model to influence its behavior.

The allCategories property of the options object is a boolean. When its value is false, a categorization model returns only the categories with the highest scores.

The custom property of the options object is an object that can be used to convey options to the script inside a symbolic model developed with Studio. To access those options, the script must use the getOptions method of the predefined CTX object.