Skip to content

Text Fragmenter processor

Description

The Text Fragmenter breaks the input text into fragments such as sentences, newline separated parts and quotes.

Input

The processor requires the input JSON to contain at least this top level key:

"text": "text"

where text is the text to process.
An optional top level key is:

"options": options

options is an object. Its properties can be used to override the values of block's functional properties (see below). This is the correspondence between the properties of options and the functional properties of the block:

Object property Corresponding functional parameter
outputText Propagate input text to output

Block properties

Block properties can be set by editing the block.
Text Fragmenter workflow blocks have the following properties:

  • Common:

    • The unique block ID and the service version, displayed in the title bar (read only, displayed also in the block tooltip in the canvas).
    • Block name: the block name, it can be edited.
    • Description: the description of the processor (read only).
  • Type Specific:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
  • Functional:

    • Propagate input text to output: when turned on, the input key text is echoed in the output JSON. Default: off.
  • Deployment:

    • Replicas: number of required instances.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPUs).
  • Input

    Used for input mapping: one property for each of the top level keys of the input JSON.
    If:

    • The block is the first in a flow and the workflow input contains only the expected keys.

    Or:

    these properties do not need to be set.
    Otherwise, the properties determine which top level keys of the overall "upstream JSON" must be mapped to the block's input keys. The values of the properties must be set choosing from the compatible keys of upstream blocks' output or, if the input format of the workflow has been defined, from the keys of the $nlflow_input pseudo block.

Output

The output of a Text Fragmenter block is a JSON object with the following structure:

{
    "fragments": {}
}

The fragments object has these properties:

  • positions: an array of objects, each corresponding to the extremes of a fragment. Each object has these properties:

    • start: the zero-based position of the first character of the fragment in the text.
    • end: the zero-based position of the first charater after the fragment in the text.
  • strategy: the fragmentation strategy used, for example BaseTextFragmenter. It is a troubleshooting only information.

Output-input mapping processor

A Text Fragmenter block is suitable to be followed by a Language Detection block because the latter is able to accept fragments as input to make specific predictions about them as well.

For a correct functioning of the two processors in pairs:

  • Turn on the Propagate input text to output functional property of the Text Fragmenter block.
  • In the Language Detector block, associate input properties text and fragments to the homonymous output keys of the Text Fragmenter block.