Skip to content

PDF Splitter

Overview

PDF Splitter is a splitter that produces a single-page PDF for each page of an input PDF file and runs its context for each of those single-page PDFs. In other words, at each iteration, the first context block after the splitter receives as input the output of PDF Splitter corresponding to a page of the input PDF.

Input

The input to a PDF Splitter block must be a JSON with this structure:

{
  "base64": Base64 encoding of a PDF file,
  "path": Filename or path of the PDF file
}

Block properties

The properties of a PDF Splitter block are accessed by editing the block and are divided into these groups:

  • Basic properties:

    • Block name
    • Component version (read only)
    • Block ID (read only)
  • Deployment:

    • Timeout: execution timeout expressed in minutes (m) or seconds (s).
    • Replicas: number of required instances.
    • Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
    • Memory: required memory.
    • CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
  • Input: read-only, these properties are a reminder of the structure of the input JSON.

  • Output: read-only, this property is reminder of the structure of the output JSON that the splitter produces for every page of the input PDF.

Output

A PDF Splitter block produces as many items as the pages of the input PDF file. The block's context cycles over these items.
Each item is a JSON with this structure:

{
  "base64": Base64 encoding of the PDF corresponding to one page of the input PDF,
  "path": Filename of the mono-page PDF
}

The value of path has this structure:

Name of the input PDF without extension_p####.pdf

where #### is replaced with the page number right-aligned and zero-padded in a four characters field. For example, if the name of the input PDF file is:

23Q4Results.pdf

path will be:

  • 23Q4Results_p0001.pdf for the PDF corresponding to the first page.
  • 23Q4Results_p0002.pdf for the PDF corresponding to the second page.
  • 23Q4Results_p0003.pdf for the PDF corresponding to the third page.

and so on.