Skip to content

CLAUSE

CLAUSE is an extraction transformation option that can be described as a completion feature of the matched value rather than a normalization. It adds elements surrounding the original matched data to the final extracted value.

Its action is based on the concept of "clause", which is the smallest grammatical unit that can express a complete proposition. The recognition of one or more clauses in a sentence takes place during the disambiguation process.

The CLAUSE option returns the whole clause containing the value matched by an attribute.

The syntax of the CLAUSE option is the following:

SCOPE scopeOption
{
    IDENTIFY(templateName)
    {
        @field[attribute]|[CLAUSE]
    }
}

This option is useful in situations where it's necessary to expand the extraction output revolving around a matched element.
Consider the following example:

SCOPE SENTENCE
{
    IDENTIFY(SUBJECT)
    {
        @Subject[TYPE(NPH) + ROLE(SUBJECT)]|[CLAUSE]
    }
}

This rule's purpose is to extract human proper nouns (TYPE(NPH)) only if the names identified are the subjects of a sentence or a clause (+ ROLE(SUBJECT)). If this condition is verified, the CLAUSE transformation option will ensure that every extracted value will be expanded to the clause where the people's names are found as subjects.

Consider the extraction output if the rule above is run against the following sample sentence:

Assistant Commissioner Simon Byrne described detectives as "constables in T-shirts and jeans" and said he wanted to end the division between uniformed officers and detectives.

The text contains one value matching the sample rule: Simon Byrne. This concept is recognized as a person's name and is the subject of the first clause of the sentence. Due to the CLAUSE transformation, the whole clause in which the person's name was found is extracted: Assistant Commissioner Simon Byrne described detectives as "constables in T-shirts and jeans.

Info

If there is no clause because the verb is missing, the entire sentence gets extracted.