Skip to content

PDF Splitter

Overview

PDF Splitter is a splitter that produces a single-page PDF for each page of an input PDF file and runs its context for each of those single-page PDFs. In other words, at each iteration, the first context block after the splitter receives as input the output of PDF Splitter corresponding to a page of the input PDF.

Input

A PDF Splitter block has these input variables:

  • base64 (string, required) Base64 encoding of the PDF file to split.
  • path (string, required): name or the path of the file. It is only for debugging, logging or auditing purposes, it is not used to "read" the file, which instead is completely represented by the value of the base64 key.

Block properties

The properties of a PDF Splitter block are accessed by editing the block and are divided into these groups:

  • Basic properties:

    • Block name
    • Component version (read only)
    • Block ID (read only)
  • Deployment:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
    • Replicas: number of required instances.
    • Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
  • Input: these properties correspond to the input variables.
    Input properties are read-only—so only descriptive of the expected input—when the block is the first in a flow and the workflow's input has not been explicitly described. In that case the workflow's input JSON must contain keys whose name and type match those of the input variables.
    Otherwise, they are editable and must be set.

  • Output: read-only, this property is reminder of the structure of the output JSON that the splitter produces for every page of the input PDF.

Output

A PDF Splitter block produces as many items as the pages of the input PDF file. The block's context cycles over these items.
Each item is a JSON with this structure:

{
  "base64": Base64 encoding of the PDF corresponding to one page of the input PDF,
  "path": Filename of the mono-page PDF
}

The value of path has this structure:

Name of the input PDF without extension_p####.pdf

where #### is replaced with the page number right-aligned and zero-padded in a four characters field. For example, if the name of the input PDF file is:

23Q4Results.pdf

path will be:

  • 23Q4Results_p0001.pdf for the PDF corresponding to the first page.
  • 23Q4Results_p0002.pdf for the PDF corresponding to the second page.
  • 23Q4Results_p0003.pdf for the PDF corresponding to the third page.

and so on.