Structured entities
Overview
The disambiguator recognizes entities like people's names, dates, addresses, monetary values, measures, etc., when present in a text. For instance, in the sentence:
John Smith lives in a 900 sq ft apartment at 22 Green Park Street.
the disambiguator recognizes John Smith as a person's name (entity type NPH
), 900 sq ft as a measure (entity type MEA
) and 22 Green Park Street as address (entity type ADR
).
Recognized entities cited in a text can be matched by a rule condition using the TYPE
attribute and the entity type.
Some of them are also referred to as structured entities, because they are usually made of components like numbers, letters and punctuation marks.
Structure decomposition
As mentioned above, a structured entity is an aggregation of components, for example a date contains at least a day and a month or a month and a year, an address contains at least a street number and a street name, etc.
NPH
, DAT
, HOU
, MEA
, MON
and ADR
types have two properties:
- They are always associated with a virtual supernomen which specifies what type of entity they are.
- They can be subdivided into logical components.
The first property is related to the fact that tokens like 22 Green Park Street don't correspond to standard Knowledge Graph syncons. Yet the disambiguator understands it's an address and assigns it the meaning of street. Therefore, unknown entities automatically receive a syncon ID of a known concept.
For example:
February 28, 1893
is recognized as a date and as an instance of the DAT
type during disambiguation, but it's also assigned the virtual supernomen corresponding to syncon 65454 (date, tag_date). Due to this recognition, the disambiguator can also distinguish between the day, the month and the year and these components can be used in rules using the TRANSFORM
feature.
In another example:
900 sq ft
the text is recognized as an instance of MEA
type and is assigned the virtual supernomen corresponding to syncon 58572 (square foot). Such an entity has two parts: the numeric value (900
) and the unit of measurement (sq ft
).
The TRANSFORM
feature allows the user to define the way in which structured entities can be divided into components.
Below is a list of the virtual supernomens and their corresponding entity types along with the concepts of their components. Check the Knowledge Graph to find the syncon ID for each of them depending on your project language.
Entity type | Virtual supernomen | Components' syncons |
---|---|---|
NPH |
cat. person | tag_first_name, tag_surname, tag_gender |
DAT |
tag_date | tag_weekday, tag_day, tag_month, tag_year |
HOU |
tag_hour | tag_hour, tag_minute |
MEA |
cat. unit of measurement | tag_number, cat. unit of measurement |
MON |
cat. money | tag_number, tag_currency |
ADR |
street | tag_road, tag_proper_noun, tag_street_number |