Sections and segments overview

Within a categorization and/or extraction project, sections and segments are a way of defining and recognizing specific portions of text in a document. The usage of sectioned or segmented documents is useful in several cases. This is especially true when categorization or extraction rules must act upon blocks of text different from those automatically recognized by the semantic disambiguator text analysis (for example paragraphs, sentences, clauses and phrases).

This is typically the case of those documents whose original (and usually highly recognizable) layout or structure is critical for the correct identification and/or retrieval of information. In fact, sometimes, the key element of a document must not be searched for throughout the text. Instead, it may be located in a very precise point of a document, and therefore it cannot be searched for in any sentence, clause or phrase. It must be pinpointed in a specific sentence (or group of sentences) which a particular position or structure in the document considered as a whole.

Sections and segments have different characteristics and have been designed to respond to different operational needs. A developer must determine which solution is the best based on the nature of the text blocks to be recognized and the objectives to be achieved.