PDF Splitter
Overview
PDF Splitter is a splitter that produces a single-page PDF for each page of an input PDF file and runs its context for each of those single-page PDFs. In other words, at each iteration, the first context block after the splitter receives as input the output of PDF Splitter corresponding to a page of the input PDF.
Input
The input to a PDF Splitter block must be a JSON with this structure:
{
"base64": Base64 encoding of a PDF file,
"path": Filename or path of the PDF file
}
Block properties
The properties of a PDF Splitter block are accessed by editing the block and are divided into these groups:
-
Basic properties:
- Block name
- Component version (read only)
- Block ID (read only)
-
Deployment:
- Timeout: execution timeout expressed in minutes (m) or seconds (s).
- Replicas: number of required instances.
- Consumer Number: number of threads of the consumer, the software module of the block that provides input to process by taking it from the block's work queue.
- Memory: required memory.
- CPU: thousandths of a CPU required (for example: 1000 = 1 CPU).
-
Input: read-only, these properties are a reminder of the structure of the input JSON.
- Output: read-only, this property is reminder of the structure of the output JSON that the splitter produces for every page of the input PDF.
Output
A PDF Splitter block produces as many items as the pages of the input PDF file. The block's context cycles over these items.
Each item is a JSON with this structure:
{
"base64": Base64 encoding of the PDF corresponding to one page of the input PDF,
"path": Filename of the mono-page PDF
}
The value of path
has this structure:
Name of the input PDF without extension_p####.pdf
where ####
is replaced with the page number right-aligned and zero-padded in a four characters field. For example, if the name of the input PDF file is:
23Q4Results.pdf
path
will be:
23Q4Results_p0001.pdf
for the PDF corresponding to the first page.23Q4Results_p0002.pdf
for the PDF corresponding to the second page.23Q4Results_p0003.pdf
for the PDF corresponding to the third page.
and so on.