PII detector output
Introduction
The PII detector API resource returns a JSON object with this format:
{
"success": Boolean success flag,
"data": {
"content": analyzed text,
"language": language code,
"version": technology version info,
"knowledge": [],
"paragraphs": [],
"sentences": [],
"phrases": [],
"tokens": [],
"entities": [],
"extractions": [],
"extraData": {}
}
}
Tip
Use the live demo to see how API responses look like. Run an analysis then select the {...} json tab in the results page.
For the description of the contents
, language
and version
properties, see the API resources output overview.
You can ignore all the arrays except extractions
because they are simply functional to the production of the fundamental output that is inside the extradata
object. If you are still interested, since arrays are the product of other API features, then:
-
For
knowledge
see the description of full analysis output. -
For:
paragraphs
sentences
phrases
tokens
see the description of deep linguistic analysis output.
-
For:
mainSentences
mainPhrases
mainLemmas
mainSyncons
topics
see the description of keyphrase extraction output.
-
For
entities
see the description of named entity recognition output.
The extractions
array and the extraData
object both contain detected PII in two alternative formats.
The extractions
array represents PII with a proprietary expert.ai format, while the JSON-LD
property of the extraData
object is a JSON-LD representation of the same information.
It's up to you to choose the format you prefer.
Simple Vs composite information
The PII detector returns simple and composite information.
Simple information—like phone numbers and e-mail addresses—have only one property. Composite information have two or more properties, like a postal address which is composed of a street name, a locality, a ZIP code and a region.
extraData object
The extraData
object only property is JSON-LD
, for example:
"extraData": {
"JSON-LD": {
"@context": {
...
},
"@graph": [
{
"@id": "https://schema.org/email?email=m.gut%40bfu.edu",
"@type": "https://schema.org/email",
"email": "[email protected]",
"matches": [
{
"end": 211,
"name": "email",
"start": 197,
"value": "[email protected]"
}
]
},
{
"@id": "https://schema.org/telephone?telephone=(210)%20617-5256",
"@type": "https://schema.org/telephone",
"matches": [
{
"end": 153,
"name": "telephone",
"start": 138,
"value": "(210) 617-5256"
}
],
"telephone": "(210) 617-5256"
},
{
"@id": "https://schema.org/telephone?telephone=(210)%20949-3006",
"@type": "https://schema.org/telephone",
"matches": [
{
"end": 181,
"name": "telephone",
"start": 166,
"value": "(210) 949-3006"
}
],
"telephone": "(210) 949-3006"
},
{
"@id": "https://schema.org/PostalAddress?address=7400%20Merton%20Minter%20Blvd.%2C%20San%20Antonio%2C%20TX%2C%2078229-4404",
"@type": "https://schema.org/PostalAddress",
"address": "7400 Merton Minter Blvd., San Antonio, TX, 78229-4404",
"addressCountry": "United States of America",
"addressLocality": "San Antonio",
"addressRegion": "Texas",
"matches": [
{
"end": 88,
"name": "streetAddress",
"start": 64,
"value": "7400 Merton Minter Blvd."
},
{
"end": 123,
"name": "postalCode",
"start": 112,
"value": "78229-4404"
},
{
"end": 123,
"name": "address",
"start": 64,
"value": "7400 Merton Minter Blvd., 111E, San Antonio, TX 78229-4404"
},
{
"end": 111,
"name": "addressLocality",
"start": 96,
"value": "San Antonio, TX"
},
{
"end": 111,
"name": "addressRegion",
"start": 96,
"value": "San Antonio, TX"
},
{
"end": 111,
"name": "addressCountry",
"start": 96,
"value": "San Antonio, TX"
}
],
"postalCode": "78229-4404",
"streetAddress": "7400 Merton Minter Blvd."
},
{
"@id": "https://schema.org/Person?person=Mark%20Gutenberg",
"@type": "https://schema.org/Person",
"birthDate": "1984-12-08",
"birthPlace": "Hamburg",
"familyName": "Gutenberg",
"gender": "M",
"givenName": "Mark",
"matches": [
{
"end": 54,
"name": "familyName",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "gender",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "givenName",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "person",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 260,
"name": "birthPlace",
"start": 243,
"value": "HAMBURG, GERMANY"
},
{
"end": 282,
"name": "birthDate",
"start": 272,
"value": "12/8/1984"
}
],
"person": "Mark Gutenberg"
}
]
}
}
The value of the JSON-LD
property is the JSON-LD object.
The characteristic of the JSON-LD format is to provide linked data. Specifically, PII information types and properties are linked to schema.org public vocabulary definitions.
For example, the type of the information representing a postal address corresponds to the https://schema.org/PostalAddress definition and the type's properties correspond to schema.org definitions too.
For the description of the JSON-LD format refer to the official documentation.
The @graph
property of the JSON-LD object contains the actual PII. @graph
is an array, each item of which represents a simple or composite information.
These are all the PII that may be present:
* dateTime
is an array, since there can be more than one value associated with the person.
The matches
array of each information item contains the occurrences of the properties in the text.
Each item of the array corresponds to a property. Item properties are:
name
: property namestart
: zero-based index of the first character of the occurrence in the textend
: zero-based index of the first character after the occurrence in the textvalue
: the portion of text from which the property value was taken
extractions array
To understand the contents of the extractions
array you must know that information detection can also be seen as a process of extracting records of data from the text. Each record contains data fields and its structure—the possible fields—is called template.
A template can be compared to a table and the template fields to the columns of the table, as shown in the following figure.
So for example instances of the PII_PERSON
template are records that contain fields like:
familyName
gender
givenName
birthPlace
birthDate
Every item of the extractions
array represents an extraction record.
For example, the following item is a record that's an instance of the PII_PERSON
template:
{
"fields": [
{
"name": "familyName",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Gutenberg"
},
{
"name": "gender",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "M"
},
{
"name": "givenName",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Mark"
},
{
"name": "person",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Mark Gutenberg"
},
{
"name": "birthPlace",
"positions": [
{
"end": 260,
"start": 243
}
],
"value": "Hamburg"
},
{
"name": "birthDate",
"positions": [
{
"end": 282,
"start": 272
}
],
"value": "1984-12-08"
}
],
"namespace": "pii_en_1.0",
"template": "PII_PERSON"
}
In each item:
namespace
is the name of the software module performing the extraction.template
is the name of the template.fields
is the array of record fields.
Each item of the fields
array item represents an extracted value where:
name
is the field's name.value
is the field's value.positions
is an array containing the extracted field's positions.
These are all the templates and related fields:
Information type | Template | Field |
---|---|---|
Personal attributes | PII_PERSON |
|
person |
||
givenName |
||
familyName |
||
age |
||
gender |
||
nationality |
||
birthDate |
||
birthPlace |
||
deathDate |
||
deathPlace |
||
dateTime |
||
Postal address | PII_ADDRESS |
|
address |
||
streetAddress |
||
addressCountry |
||
addressLocality |
||
addressRegion |
||
postalCode |
||
postOfficeBoxNumber |
||
Bank account | PII_BANKACCOUNT |
|
IBAN |
||
IBANcountry |
||
IP address | PII_IP |
|
IP |
||
E-mail address | PII_EMAIL |
|
email |
||
URL | PII_URL |
|
URL |
||
Financial product (credi/debit card) | PII_FINANCIALPRODUCT |
|
creditDebitNumber |
||
CVV |
||
expirationDate |
||
Phone number | PII_TELEPHONE |
|
telephone |