Skip to content

Tika Converter

Description

The Tika Converter processor gets plain text out of a binary file.
It's based on Apache Tika, see the product documentation for the list of supported file formats.

Block properties

Tika Converter workflow blocks have the following properties:

  • Common block properties:

    • The unique block ID.
    • Block name: the block name, it can be edited.
    • Description: the description of the processor (read only).
  • Type Specific tab:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
  • Deployment tab:

    • Replicas: number of required instances (3 maximum)
    • Memory: required memory
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPUs)

Input

Read the NL Flow API manual for the description of the JSON object to submit to a Tika Converter block.

Output and output-input mapping

The output of a Tika Converter block is a JSON object with the following structure:

{
    "content": extracted text,
    "mime": media type,
    "path": file name or path    
}

Typically, in the workflow, the Tika Converter block is followed by a model block or more model blocks in parallel.
In these cases, inside the model block's configuration, the content property of the Tika Converter block's output must be mapped to the text input property.