Skip to content

PII detector output

Introduction

The PII detector API resource returns a JSON object with this format:

{
    "success": Boolean success flag,
    "data": {
        "content": analyzed text,
        "language": language code,
        "version": technology version info,
        "knowledge": [],
        "paragraphs": [],
        "sentences": [],
        "phrases": [],
        "tokens": [],
        "entities": [],
        "extractions": [],
        "extraData": {}
    }
}

For the description of the contents, language and version properties, see the API resources output overview.

Most components arrays have the same structure they have in the response of the resource that performs the corresponding process, so:

The knowledge array contains Knowledge Graph data about syncons. Its contents are described in the article about the output of full analysis.

The extractions array and the extraData object both contain detected PII in two alternative formats.
The extractions array represents PII with a proprietary expert.ai format, while the JSON-LD property of the extraData object is a JSON-LD representation of the same information.
It's up to you to choose the format you prefer.

Simple Vs composite information

The PII detector returns simple and composite information.
Simple information—like phone numbers and e-mail addresses—have only one property. Composite information have two or more properties, like a postal address which is composed of a street name, a locality, a ZIP code and a region.

extraData object

The extraData object only property is JSON-LD, for example:

"extraData": {
"JSON-LD": {
  "@context": {
  ...
  },
  "@graph": [
    {
      "@id": "https://schema.org/email?email=m.gut%40bfu.edu",
      "@type": "https://schema.org/email",
      "email": "m.gut@bfu.edu",
      "matches": [
        {
          "end": 211,
          "name": "email",
          "start": 197,
          "value": "m.gut@bfu.edu"
        }
      ]
    },
    {
      "@id": "https://schema.org/telephone?telephone=(210)%20617-5256",
      "@type": "https://schema.org/telephone",
      "matches": [
        {
          "end": 153,
          "name": "telephone",
          "start": 138,
          "value": "(210) 617-5256"
        }
      ],
      "telephone": "(210) 617-5256"
    },
    {
      "@id": "https://schema.org/telephone?telephone=(210)%20949-3006",
      "@type": "https://schema.org/telephone",
      "matches": [
        {
          "end": 181,
          "name": "telephone",
          "start": 166,
          "value": "(210) 949-3006"
        }
      ],
      "telephone": "(210) 949-3006"
    },
    {
      "@id": "https://schema.org/PostalAddress?address=7400%20Merton%20Minter%20Blvd.%2C%20San%20Antonio%2C%20TX%2C%2078229-4404",
      "@type": "https://schema.org/PostalAddress",
      "address": "7400 Merton Minter Blvd., San Antonio, TX, 78229-4404",
      "addressCountry": "United States of America",
      "addressLocality": "San Antonio",
      "addressRegion": "Texas",
      "matches": [
        {
          "end": 88,
          "name": "streetAddress",
          "start": 64,
          "value": "7400 Merton Minter Blvd."
        },
        {
          "end": 123,
          "name": "postalCode",
          "start": 112,
          "value": "78229-4404"
        },
        {
          "end": 123,
          "name": "address",
          "start": 64,
          "value": "7400 Merton Minter Blvd., 111E, San Antonio, TX 78229-4404"
        },
        {
          "end": 111,
          "name": "addressLocality",
          "start": 96,
          "value": "San Antonio, TX"
        },
        {
          "end": 111,
          "name": "addressRegion",
          "start": 96,
          "value": "San Antonio, TX"
        },
        {
          "end": 111,
          "name": "addressCountry",
          "start": 96,
          "value": "San Antonio, TX"
        }
      ],
      "postalCode": "78229-4404",
      "streetAddress": "7400 Merton Minter Blvd."
    },
    {
      "@id": "https://schema.org/Person?person=Mark%20Gutenberg",
      "@type": "https://schema.org/Person",
      "birthDate": "1984-12-08",
      "birthPlace": "Hamburg",
      "familyName": "Gutenberg",
      "gender": "M",
      "givenName": "Mark",
      "matches": [
        {
          "end": 54,
          "name": "familyName",
          "start": 39,
          "value": "Mark Gutenberg"
        },
        {
          "end": 54,
          "name": "gender",
          "start": 39,
          "value": "Mark Gutenberg"
        },
        {
          "end": 54,
          "name": "givenName",
          "start": 39,
          "value": "Mark Gutenberg"
        },
        {
          "end": 54,
          "name": "person",
          "start": 39,
          "value": "Mark Gutenberg"
        },
        {
          "end": 260,
          "name": "birthPlace",
          "start": 243,
          "value": "HAMBURG, GERMANY"
        },
        {
          "end": 282,
          "name": "birthDate",
          "start": 272,
          "value": "12/8/1984"
        }
      ],
      "person": "Mark Gutenberg"
    }
  ]
}
},

The value of the JSON-LD property is the JSON-LD object.

The characteristic of the JSON-LD format is to provide linked data. Specifically, PII information types and properties are linked to schema.org public vocabulary definitions.
For example, the type of the information representing a postal address corresponds to the https://schema.org/PostalAddress definition and the type's properties correspond to schema.org definitions too.

For the description of the JSON-LD format refer to the official documentation.

The @graph property of the JSON-LD object contains the actual PII. @graph is an array, each item of which represents a simple or composite information.

These are all the PII that may be present:

Information type Property Linked data reference
Personal attributes https://schema.org/Person
person https://schema.org/Person
givenName https://schema.org/givenName
familyName https://schema.org/familyName
age https://schema.org/Number
gender https://schema.org/gender
nationality https://schema.org/nationality
birthDate https://schema.org/birthDate
birthPlace https://schema.org/birthPlace
deathDate https://schema.org/deathDate
deathPlace https://schema.org/deathPlace
dateTime* https://schema.org/Date
Postal address https://schema.org/PostalAddress
address https://schema.org/Text
streetAddress https://schema.org/streetAddress
addressCountry https://schema.org/addressCountry
addressLocality https://schema.org/addressLocality
addressRegion https://schema.org/addressRegion
postalCode https://schema.org/postalCode
postOfficeBoxNumber https://schema.org/postOfficeBoxNumber
Bank account https://schema.org/BankAccount
IBAN https://schema.org/PropertyValue
IBANcountry https://schema.org/Country
IP address https://schema.org/additionalProperty
IP https://schema.org/Text
E-mail address https://schema.org/email
email https://schema.org/email
URL https://schema.org/URL
URL https://schema.org/URL
Financial product (credi/debit card) https://schema.org/FinancialProduct
creditDebitNumber https://schema.org/Text
CVV https://schema.org/Number
expirationDate https://schema.org/Date
Phone number https://schema.org/telephone
telephone https://schema.org/telephone

* dateTime is an array, since there can be more than one value associated with the person.

The matches array of each information item contains the occurrences of the properties in the text. Each item of the array corresponds to a property. Item properties are:

  • name: property name
  • start: zero-based index of the first character of the occurrence in the text
  • end: zero-based index of the first character after the occurrence in the text
  • value: the portion of text from which the property value was taken

extractions array

To understand the contents of the extractions array you must know that information detection can also be seen as a process of extracting records of data from the text. Each record contains data fields and its structure—the possible fields—is called template.
A template can be compared to a table and the template fields to the columns of the table, as shown in the following figure.

So for example instances of the PII_PERSON template are records that contain fields like:

  • familyName
  • gender
  • givenName
  • birthPlace
  • birthDate

Every item of the extractions array represents an extraction record.
For example, the following item is a record that's an instance of the PII_PERSON template:

{
  "fields": [
    {
      "name": "familyName",
      "positions": [
        {
          "end": 54,
          "start": 39
        }
      ],
      "value": "Gutenberg"
    },
    {
      "name": "gender",
      "positions": [
        {
          "end": 54,
          "start": 39
        }
      ],
      "value": "M"
    },
    {
      "name": "givenName",
      "positions": [
        {
          "end": 54,
          "start": 39
        }
      ],
      "value": "Mark"
    },
    {
      "name": "person",
      "positions": [
        {
          "end": 54,
          "start": 39
        }
      ],
      "value": "Mark Gutenberg"
    },
    {
      "name": "birthPlace",
      "positions": [
        {
          "end": 260,
          "start": 243
        }
      ],
      "value": "Hamburg"
    },
    {
      "name": "birthDate",
      "positions": [
        {
          "end": 282,
          "start": 272
        }
      ],
      "value": "1984-12-08"
    }
  ],
  "namespace": "pii_en_1.0",
  "template": "PII_PERSON"
}

In each item:

  • namespace is the name of the software module performing the extraction.
  • template is the name of the template.
  • fields is the array of record fields.

Each item of the fields array item represents an extracted value where:

  • name is the field's name.
  • value is the field's value.
  • positions is an array containing the extracted field's positions.

These are all the templates and related fields:

Information type Template Field
Personal attributes PII_PERSON
person
givenName
familyName
age
gender
nationality
birthDate
birthPlace
deathDate
deathPlace
dateTime
Postal address PII_ADDRESS
address
streetAddress
addressCountry
addressLocality
addressRegion
postalCode
postOfficeBoxNumber
Bank account PII_BANKACCOUNT
IBAN
IBANcountry
IP address PII_IP
IP
E-mail address PII_EMAIL
email
URL PII_URL
URL
Financial product (credi/debit card) PII_FINANCIALPRODUCT
creditDebitNumber
CVV
expirationDate
Phone number PII_TELEPHONE
telephone