PDF Splitter
Overview
PDF Splitter is a splitter that produces a single-page PDF for each page of an input PDF file and runs its context for each of those single-page PDFs. In other words, at each iteration, the first context block after the splitter receives as input the output of PDF Splitter corresponding to a page of the input PDF.
Input
A PDF Splitter block has these input variables:
base64
(string, required) Base64 encoding of the PDF file to split.path
(string, required): name or the path of the file. It is only for debugging, logging or auditing purposes, it is not used to "read" the file, which instead is completely represented by the value of thebase64
key.
Block properties
The properties of a PDF Splitter block are accessed by editing the block and are divided into these groups:
-
Basic properties:
- Block name
- Component version (read only)
- Block ID (read only)
-
Deployment:
- Timeout: execution timeout expressed in minutes (m) or seconds (s).
- Replicas: number of required instances.
- Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
- Memory: required memory.
- CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
-
Input: these properties correspond to the input variables.
Input properties are read-only—so only descriptive of the expected input—when the block is the first in a flow and the workflow's input has not been explicitly described. In that case the workflow's input JSON must contain keys whose name and type match those of the input variables.
Otherwise, they are editable and must be set. -
Output: read-only, this property is reminder of the structure of the output JSON that the splitter produces for every page of the input PDF.
Output
A PDF Splitter block produces as many items as the pages of the input PDF file. The block's context cycles over these items.
Each item is a JSON with this structure:
{
"base64": Base64 encoding of the PDF corresponding to one page of the input PDF,
"path": Filename of the mono-page PDF
}
The value of path
has this structure:
Name of the input PDF without extension_p####.pdf
where ####
is replaced with the page number right-aligned and zero-padded in a four characters field. For example, if the name of the input PDF file is:
23Q4Results.pdf
path
will be:
23Q4Results_p0001.pdf
for the PDF corresponding to the first page.23Q4Results_p0002.pdf
for the PDF corresponding to the second page.23Q4Results_p0003.pdf
for the PDF corresponding to the third page.
and so on.