PDF Splitter

Overview

PDF Splitter is a splitter that produces a single-page PDF for each page of an input PDF file and runs its context for each of those single-page PDFs. In other words, at each iteration, the first context block after the splitter receives as input the output of PDF Splitter corresponding to a page of the input PDF.

Input

A PDF Splitter block has these input variables:

base64 (string, required) Base64 encoding of the PDF file to split.
path (string, required): name or the path of the file. It is only for debugging, logging or auditing purposes, it is not used to "read" the file, which instead is completely represented by the value of the base64 key.

Block properties

The properties of a PDF Splitter block are accessed by editing the block and are divided into these groups:

Basic properties:
- Block name
- Component version (read only)
- Block ID (read only)
Deployment:
- Timeout: execution timeout expressed in minutes (m) or seconds (s).
- Replicas: number of required instances.
- Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
- Memory: required memory.
- CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
Input: input properties correspond to the input variables of the component (see above).
Output: read-only, the manifest of the output that the block produces for every extracted file.

Output

A PDF Splitter block produces as many items as the pages of the input PDF file. The block's context cycles over these items.
Each item is a JSON with this structure:

{
  "base64": Base64 encoding of the PDF corresponding to one page of the input PDF,
  "path": Filename of the mono-page PDF
}

The value of path has this structure:

Name of the input PDF without extension_p####.pdf

where #### is replaced with the page number right-aligned and zero-padded in a four characters field. For example, if the name of the input PDF file is:

23Q4Results.pdf

path will be:

23Q4Results_p0001.pdf for the PDF corresponding to the first page.
23Q4Results_p0002.pdf for the PDF corresponding to the second page.
23Q4Results_p0003.pdf for the PDF corresponding to the third page.

and so on.