Skip to content

Blocks

The methods of the LAY object that work on layout blocks are:

getBlocksCount

The getBlocksCount returns the number of blocks detected in the document layout.

The syntax is:

LAY.getBlocksCount()

The method returns an integer representing the number of blocks.

Info

The bounding box of each page, which contains all the pages blocks, is also counted as a block.

getBlock

Given an ID, the getBlock method returns an object that represents the layout block with that ID.

For example, considering the Extract example output introduced in the dedicated page, the instruction:

var block = LAY.getBlock(3);

sets the variable block to an object corresponding to the block with ID 3. The object may look like this:

{
    "id": 3,
    "parent": 1,
    "pageNumber": 1,
    "type": "text",
    "x0": 67,
    "y0": 139,
    "x1": 152,
    "y1": 176,
    "children": [],
    "beginPos": 8,
    "endPos": 24,
    "tokenBegin": 1,
    "tokenEnd": 8,
    "wordBegin": 1,
    "wordEnd": 2,
    "label": ""
}

where:

Field name Description Field type Default value
id A unique id associated to the block Integer -1
parent The id of the parent block Integer -1
pageNumber The page number in which the block is situated Integer -1
type The type of block (text, title, cell, and so on) String ""
label A label with some additional information on the block String ""
x0 The x-axis coordinate of the upper-left corner of the block, relative to the page Integer 0
y0 The y-axis coordinate of the upper-left corner of the block, relative to the page Integer 0
x1 The x-axis coordinate of the lower-right corner of the block, relative to the page Integer 0
y1 The y-axis coordinate of the lower-right corner of the block, relative to the page Integer 0
children The id of the blocks that are children of the block (like the blocks of a page or the cells of a table) List of Integers []
beginPos The position in the text in which the block content starts Integer -1
endPos The position in the text in which the block content ends Integer -1
tokenBegin The index of the first token in the block content Integer -1
tokenEnd The index of the last token in the block content Integer -1
wordBegin The index of the first word in the block Integer -1
wordEnd The index of the last word in the block Integer -1

The syntax is:

LAY.getBlock(id);

where id is the block ID.

getBlockText

The getBlockText method returns the text contained in the block with the given ID, or undefined if the ID is not valid.

Info

Block IDs are not zero-based so they start from 1.

For example the instruction:

var blockText = LAY.getBlockText(3);

sets a variable called blockText that, considering the Extract example output described in the dedicated page, is set to the following value:

DATE: 
10/21/2015

The syntax is:

LAY.getBlockText(id);

where id is the block ID.