Skip to content

Structured entities

Overview

The disambiguator recognizes entities like people's names, dates, addresses, monetary values, measures, etc., when present in a text. For instance, in the sentence:

John Smith lives in a 900 sq ft apartment at 22 Green Park Street.

the disambiguator recognizes John Smith as a person's name (entity type NPH), 900 sq ft as a measure (entity type MEA) and 22 Green Park Street as address (entity type ADR).
Recognized entities cited in a text can be matched by a rule condition using the TYPE attribute and the entity type. Some of them are also referred to as structured entities, because they are usually made of components like numbers, letters and punctuation marks.

Structure decomposition

As mentioned above, a structured entity is an aggregation of components, for example a date contains at least a day and a month or a month and a year, an address contains at least a street number and a street name, etc.
NPH, DAT, HOU, MEA, MON and ADR types have two properties:

  • They are always associated with a virtual supernomen which specifies what type of entity they are.
  • They can be subdivided into logical components.

The first property is related to the fact that tokens like 22 Green Park Street don't correspond to standard Knowledge Graph syncons. Yet the disambiguator understands it's an address and assigns it the meaning of street. Therefore, unknown entities automatically receive a syncon ID of a known concept.

For example:

February 28, 1893

is recognized as a date and as an instance of the DAT type during disambiguation, but it's also assigned the virtual supernomen corresponding to syncon 65454 (date, tag_date). Due to this recognition, the disambiguator can also distinguish between the day, the month and the year and these components can be used in rules using the TRANSFORM feature.

In another example:

900 sq ft

the text is recognized as an instance of MEA type and is assigned the virtual supernomen corresponding to syncon 58572 (square foot). Such an entity has two parts: the numeric value (900) and the unit of measurement (sq ft).

The TRANSFORM feature allows the user to define the way in which structured entities can be divided into components.
Below is a list of the virtual supernomens and their corresponding entity types along with the concepts of their components. Check the Knowledge Graph to find the syncon ID for each of them depending on your project language.

Entity type Virtual supernomen Components' syncons
NPH cat. person tag_first_name, tag_surname, tag_gender
DAT tag_date tag_weekday, tag_day, tag_month, tag_year
HOU tag_hour tag_hour, tag_minute
MEA cat. unit of measurement tag_number, cat. unit of measurement
MON cat. money tag_number, tag_currency
ADR street tag_road, tag_proper_noun, tag_street_number