Skip to content

Blocks

The methods of the LAY object that work on layout blocks are:

getBlocksCount

The getBlocksCount returns the number of blocks detected in the document layout.

The syntax is:

LAY.getBlocksCount()

The method returns an integer representing the number of blocks.

Info

The bounding box of each page, which contains all the pages blocks, is also counted as a block.

getBlock

Given an ID, the getBlock method returns an object that represents the layout block with that ID.

Considering the example given in the introduction, the instruction:

var block = LAY.getBlock(3);

sets the variable block to an object corresponding to the block with ID 3. The object may look like this:

{
    "id": 3,
    "parent": 1,
    "pageNumber": 1,
    "type": "text",
    "x0": 67,
    "y0": 139,
    "x1": 152,
    "y1": 176,
    "children": [],
    "beginPos": 8,
    "endPos": 24,
    "tokenBegin": 1,
    "tokenEnd": 8,
    "wordBegin": 1,
    "wordEnd": 2,
    "label": ""
}

where:

Field name Description Field type Default value
id A unique id associated to the block Integer -1
parent The id of the parent block Integer -1
pageNumber The page number in which the block is situated Integer -1
type The type of block:
ValueBlock type
pageThe "container" (it has no text of its own) of all the textual elements displayed on a page
textA block of text (e.g. a paragraph, a text box) at the body-level, i.e. not a title
titleA heading
tableThe "container" (it has no text of its own) of all the element (cells) of a table
cellA table's cell
headerA page header
footerA page footer
tocAn item of the table of contents
String Empty string
label A label with some additional information on the block String Empty string
x0 The x-axis coordinate of the upper-left corner of the block, relative to the page Integer 0
y0 The y-axis coordinate of the upper-left corner of the block, relative to the page Integer 0
x1 The x-axis coordinate of the lower-right corner of the block, relative to the page Integer 0
y1 The y-axis coordinate of the lower-right corner of the block, relative to the page Integer 0
children The id of the blocks that are children of the block (like the blocks of a page or the cells of a table) List of Integers Empty array
beginPos The position in the text in which the block content starts Integer -1
endPos The position in the text in which the block content ends Integer -1
tokenBegin The index of the first token in the block content Integer -1
tokenEnd The index of the last token in the block content Integer -1
wordBegin The index of the first word in the block Integer -1
wordEnd The index of the last word in the block Integer -1

The syntax is:

LAY.getBlock(id);

where id is the block ID.

getBlockText

The getBlockText method returns the text contained in the block with the given ID, or undefined if the ID is not valid.

Info

Block IDs are not zero-based so they start from 1.

For example the instruction:

var blockText = LAY.getBlockText(3);

sets a variable called blockText that, considering the Extract example output described in the dedicated page, is set to the following value:

DATE: 
10/21/2015

The syntax is:

LAY.getBlockText(id);

where id is the block ID.