Skip to content

PII Knowledge Model

Overview

The PII Knowledge Model (display name: PII EN v#) aims at detecting and extracting personally identifiable information (PII) and returning it in two alternative formats including linked data in JSON-LD format (see also https://json-ld.org/).

The PII model does not de-identify PII in the text, this can be achieved by using the PII Pseudonymization and PII Anonymization Knowledge models. Alternatively, it's possible to post-process the text, using PII model output to determine where the PII are and what type they are, to replace PII with placeholders or pseudonyms.

Information types

These are the information types PII can detect:

Information type Notes
Personal attributes Of a real person or a fictional character
Postal address
Bank account
IP address
E-mail address
URL
Financial product Credit or debit card
Phone number

These are the properties of each information type:

Information type Property Linked data reference
Personal attributes
Full name of the person https://schema.org/Person
First name https://schema.org/givenName
Last name https://schema.org/familyName
Age https://schema.org/Number
Gender https://schema.org/gender
Nationality https://schema.org/nationality
Date of birth https://schema.org/birthDate
Place of birth https://schema.org/birthPlace
Date of death https://schema.org/deathDate
Place of death https://schema.org/deathPlace
Any date or a time related to the person https://schema.org/Date
Postal address
Full address https://schema.org/Text
Street name and house number https://schema.org/streetAddress
Country https://schema.org/addressCountry
Postal code https://schema.org/postalCode
Locality https://schema.org/addressLocality
Region https://schema.org/addressRegion
PO box number https://schema.org/postOfficeBoxNumber
Bank account
IBAN code https://schema.org/PropertyValue
IBAN code country https://schema.org/Country
IP address
Address https://schema.org/Text
E-mail address
Address https://schema.org/email
URL
URL https://schema.org/URL
Financial product
Number of the credit/debit card https://schema.org/Text
Card Verification Value (CVV) or Card Verification Code (CVC) https://schema.org/Number
Card expiration date https://schema.org/Date
Phone number
Number https://schema.org/telephone

Simple Vs composite information

The PII model detects both simple and composite information.
Simple information—like phone numbers and e-mail addresses—have only one property. Composite information have two or more properties, like a postal address which is composed of a street name, a locality, a ZIP code and a region.

Output structure

The model output has the same structure as any other model and is affected by the functional properties of the workflow block.
The peculiar parts of the output are the result of information extraction, i.e. the extractions array, and the extraData object: to have extraData it's necessary to turn on the Output rules extra data functional option of the workflow block.

The extractions array represents PII as extracted records, while the JSON-LD property of the extraData object is a JSON-LD representation of the same information.

extraData object

The extraData object only property is JSON-LD, for example:

"extraData": {
    "JSON-LD": {
        "@context": {
            ...
        },
        "@graph": [
            {
                "@id": "https://schema.org/email?email=m.gut%40bfu.edu",
                "@type": "https://schema.org/email",
                "email": "[email protected]",
                "matches": [
                    {
                        "end": 211,
                        "name": "email",
                        "start": 197,
                        "value": "[email protected]"
                    }
                ]
            },
            {
                "@id": "https://schema.org/telephone?telephone=(210)%20617-5256",
                "@type": "https://schema.org/telephone",
                "matches": [
                    {
                        "end": 153,
                        "name": "telephone",
                        "start": 138,
                        "value": "(210) 617-5256"
                    }
                ],
                "telephone": "(210) 617-5256"
            },
            {
                "@id": "https://schema.org/telephone?telephone=(210)%20949-3006",
                "@type": "https://schema.org/telephone",
                "matches": [
                    {
                        "end": 181,
                        "name": "telephone",
                        "start": 166,
                        "value": "(210) 949-3006"
                    }
                ],
                "telephone": "(210) 949-3006"
            },
            {
                "@id": "https://schema.org/PostalAddress?address=7400%20Merton%20Minter%20Blvd.%2C%20San%20Antonio%2C%20TX%2C%2078229-4404",
                "@type": "https://schema.org/PostalAddress",
                "address": "7400 Merton Minter Blvd., San Antonio, TX, 78229-4404",
                "addressCountry": "United States of America",
                "addressLocality": "San Antonio",
                "addressRegion": "Texas",
                "matches": [
                    {
                        "end": 88,
                        "name": "streetAddress",
                        "start": 64,
                        "value": "7400 Merton Minter Blvd."
                    },
                    {
                        "end": 123,
                        "name": "postalCode",
                        "start": 112,
                        "value": "78229-4404"
                    },
                    {
                        "end": 123,
                        "name": "address",
                        "start": 64,
                        "value": "7400 Merton Minter Blvd., 111E, San Antonio, TX 78229-4404"
                    },
                    {
                        "end": 111,
                        "name": "addressLocality",
                        "start": 96,
                        "value": "San Antonio, TX"
                    },
                    {
                        "end": 111,
                        "name": "addressRegion",
                        "start": 96,
                        "value": "San Antonio, TX"
                    },
                    {
                        "end": 111,
                        "name": "addressCountry",
                        "start": 96,
                        "value": "San Antonio, TX"
                    }
                ],
                "postalCode": "78229-4404",
                "streetAddress": "7400 Merton Minter Blvd."
            },
            {
                "@id": "https://schema.org/Person?person=Mark%20Gutenberg",
                "@type": "https://schema.org/Person",
                "birthDate": "1984-12-08",
                "birthPlace": "Hamburg",
                "familyName": "Gutenberg",
                "gender": "M",
                "givenName": "Mark",
                "matches": [
                    {
                        "end": 54,
                        "name": "familyName",
                        "start": 39,
                        "value": "Mark Gutenberg"
                    },
                    {
                        "end": 54,
                        "name": "gender",
                        "start": 39,
                        "value": "Mark Gutenberg"
                    },
                    {
                        "end": 54,
                        "name": "givenName",
                        "start": 39,
                        "value": "Mark Gutenberg"
                    },
                    {
                        "end": 54,
                        "name": "person",
                        "start": 39,
                        "value": "Mark Gutenberg"
                    },
                    {
                        "end": 260,
                        "name": "birthPlace",
                        "start": 243,
                        "value": "HAMBURG, GERMANY"
                    },
                    {
                        "end": 282,
                        "name": "birthDate",
                        "start": 272,
                        "value": "12/8/1984"
                    }
                ],
                "person": "Mark Gutenberg"
            }
        ]
    }
}

The value of the JSON-LD property is the JSON-LD object.

The characteristic of the JSON-LD format is to provide linked data. Specifically, PII information types and properties are linked to schema.org public vocabulary definitions.
For example, the type of the information representing a postal address corresponds to the https://schema.org/PostalAddress definition and the type's properties correspond to schema.org definitions too.

For the description of the JSON-LD format refer to the official documentation.

The @graph property of the JSON-LD object contains the actual PII. @graph is an array, each item of which represents a simple or composite information.

These are all the PII that may be present:

Information type Property Linked data reference
Personal attributes https://schema.org/Person
person https://schema.org/Person
givenName https://schema.org/givenName
familyName https://schema.org/familyName
age https://schema.org/Number
gender https://schema.org/gender
nationality https://schema.org/nationality
birthDate https://schema.org/birthDate
birthPlace https://schema.org/birthPlace
deathDate https://schema.org/deathDate
deathPlace https://schema.org/deathPlace
dateTime* https://schema.org/Date
Postal address https://schema.org/PostalAddress
address https://schema.org/Text
streetAddress https://schema.org/streetAddress
addressCountry https://schema.org/addressCountry
addressLocality https://schema.org/addressLocality
addressRegion https://schema.org/addressRegion
postalCode https://schema.org/postalCode
postOfficeBoxNumber https://schema.org/postOfficeBoxNumber
Bank account https://schema.org/BankAccount
IBAN https://schema.org/PropertyValue
IBANcountry https://schema.org/Country
IP address https://schema.org/additionalProperty
IP https://schema.org/Text
E-mail address https://schema.org/email
email https://schema.org/email
URL https://schema.org/URL
URL https://schema.org/URL
Financial product (credi/debit card) https://schema.org/FinancialProduct
creditDebitNumber https://schema.org/Text
CVV https://schema.org/Number
expirationDate https://schema.org/Date
Phone number https://schema.org/telephone
telephone https://schema.org/telephone

* dateTime is an array, since there can be more than one value associated with the person.

The matches array of each information item contains the occurrences of the properties in the text. Each item of the array corresponds to a property. Item properties are:

  • name: property name
  • start: zero-based index of the first character of the occurrence in the text
  • end: zero-based index of the first character after the occurrence in the text
  • value: the portion of text from which the property value was taken

extractions array

These are all the templates and related fields:

Information type Template Field
Personal attributes PII_PERSON
person
givenName
familyName
age
gender
nationality
birthDate
birthPlace
deathDate
deathPlace
dateTime
Postal address PII_ADDRESS
address
streetAddress
addressCountry
addressLocality
addressRegion
postalCode
postOfficeBoxNumber
Bank account PII_BANKACCOUNT
IBAN
IBANcountry
IP address PII_IP
IP
E-mail address PII_EMAIL
email
URL PII_URL
URL
Financial product (credi/debit card) PII_FINANCIALPRODUCT
creditDebitNumber
CVV
expirationDate
Phone number PII_TELEPHONE
telephone

Note

If you are familiar with Platform extraction projects, the template key in this model's output corresponds to the concept of group and template fields correspond to classes.