Skip to content

PARAGRAPH

PARAGRAPH is an extraction transformation option that can be described as a completion feature of the matched value rather than a normalization. It adds elements surrounding the original matched data to the final extracted value.

Its action is based on the concept of "paragraph", a unit of a discourse in writing, dealing with a particular idea. It consists of one or more sentences and its start is typically indicated by the beginning of a new line. The recognition of paragraphs takes place during the disambiguation process.

The PARAGRAPH option returns the whole paragraph containing the value matched by an attribute.

The syntax of the PARAGRAPH option is the following:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[PARAGRAPH]
    }
}

This option is useful in situations where it's necessary to expand the extraction output revolving around a matched element.

Consider the following example:

SCOPE SENTENCE
{
    IDENTIFY(SUBJECT)
    {
        @Subject[TYPE(NPH) + ROLE(SUBJECT)]|[PARAGRAPH]
    }
}

The purpose of this rule is to extract people's names (TYPE(NPH)), only if the names identified are the subjects of a sentence or clause (+ ROLE(SUBJECT)). If this condition is verified, the PARAGRAPH transformation option will ensure that every extracted value will be expanded to the paragraph where the people's names are found as a subject.

Consider the extraction output if the rule above is run against the following sample text:

Scotland Yard is to close 65 police stations to the public across London and move its front desks into post offices and supermarkets as part of proposals to make £500m budget cuts.
In a blueprint for the future that will see the role of the detective at the Yard - once considered to have the finest investigators in the world - apparently downgraded, 1,200 more constables will be put into boroughs, and neighborhood teams will be boosted by 2,600 officers.
Closing police stations mapped. Click image to explore it
Eight hundred of the 1,200 extra constables will be detectives who are to be taken out of specialist squads, such as the burglary squad, and put back into uniform and on to the streets. The aim is to hand investigative powers to neighborhood constables for low-level crime. They will be led by a "sheriff" in each London borough, and will be supported by teams of special constables, PCSOs and some detectives within each of the 32 boroughs of the force.
Assistant Commissioner Simon Byrne described detectives as "constables in T-shirts and jeans" and said he wanted to end the division between uniformed officers and detectives.
While carrying out what the commissioner has in the past admitted are huge cuts to the budget, the mayor's office wants public confidence in the police to rise from 62% to about 75%, and to reduce crime in seven key areas by 20%.
The mayor's office for policing and crime confirmed that the Scotland Yard building in central London would be one of 200 sold off as part of the cuts.
Within six months a pilot of putting police officers into post offices will be unveiled.
Byrne said: "This is about fundamental change. I think we can demonstrate that we care about local priorities. The way that neighborhood policing in London is run compared with other examples of good practice is that we can do more.

The text contains two values matching the sample rule: Simon Byrne and Byrne. Both are recognized as people's names and are the subject of a clause. The PARAGRAPH transformation causes the extraction of the whole paragraph in which the people's name were found, that is:

Assistant Commissioner Simon Byrne described detectives as "constables in T-shirts and jeans" and said he wanted to end the division between uniformed officers and detectives.

and:

Byrne said: "This is about fundamental change. I think we can demonstrate that we care about local priorities. The way that neighborhood policing in London is run compared with other examples of good practice is that we can do more.