PII detector output
Introduction
The PII detector API resource returns a JSON object with this format:
{
"success": Boolean success flag,
"data": {
"content": analyzed text,
"language": language code,
"version": technology version info,
"knowledge": [],
"paragraphs": [],
"sentences": [],
"phrases": [],
"tokens": [],
"entities": [],
"extractions": [],
"extraData": {}
}
}
Tip
Use the live demo to see how API responses look like. Run an analysis then select the {...} json tab in the results page.
For the description of the contents, language and version properties, see the API resources output overview.
You can ignore all the arrays except extractions because they are simply functional to the production of the fundamental output that is inside the extradata object. If you are still interested, since arrays are the product of other API features, then:
-
For
knowledgesee the description of full analysis output. -
For:
paragraphssentencesphrasestokens
see the description of deep linguistic analysis output.
-
For:
mainSentencesmainPhrasesmainLemmasmainSynconstopics
see the description of keyphrase extraction output.
-
For
entitiessee the description of named entity recognition output.
The extractions array and the extraData object both contain detected PII in two alternative formats.
The extractions array represents PII with a proprietary expert.ai format, while the JSON-LD property of the extraData object is a JSON-LD representation of the same information.
It's up to you to choose the format you prefer.
Simple Vs composite information
The PII detector returns simple and composite information.
Simple information—like phone numbers and e-mail addresses—have only one property. Composite information have two or more properties, like a postal address which is composed of a street name, a locality, a ZIP code and a region.
extraData object
The extraData object only property is JSON-LD, for example:
"extraData": {
"JSON-LD": {
"@context": {
...
},
"@graph": [
{
"@id": "https://schema.org/email?email=m.gut%40bfu.edu",
"@type": "https://schema.org/email",
"email": "[email protected]",
"matches": [
{
"end": 211,
"name": "email",
"start": 197,
"value": "[email protected]"
}
]
},
{
"@id": "https://schema.org/telephone?telephone=(210)%20617-5256",
"@type": "https://schema.org/telephone",
"matches": [
{
"end": 153,
"name": "telephone",
"start": 138,
"value": "(210) 617-5256"
}
],
"telephone": "(210) 617-5256"
},
{
"@id": "https://schema.org/telephone?telephone=(210)%20949-3006",
"@type": "https://schema.org/telephone",
"matches": [
{
"end": 181,
"name": "telephone",
"start": 166,
"value": "(210) 949-3006"
}
],
"telephone": "(210) 949-3006"
},
{
"@id": "https://schema.org/PostalAddress?address=7400%20Merton%20Minter%20Blvd.%2C%20San%20Antonio%2C%20TX%2C%2078229-4404",
"@type": "https://schema.org/PostalAddress",
"address": "7400 Merton Minter Blvd., San Antonio, TX, 78229-4404",
"addressCountry": "United States of America",
"addressLocality": "San Antonio",
"addressRegion": "Texas",
"matches": [
{
"end": 88,
"name": "streetAddress",
"start": 64,
"value": "7400 Merton Minter Blvd."
},
{
"end": 123,
"name": "postalCode",
"start": 112,
"value": "78229-4404"
},
{
"end": 123,
"name": "address",
"start": 64,
"value": "7400 Merton Minter Blvd., 111E, San Antonio, TX 78229-4404"
},
{
"end": 111,
"name": "addressLocality",
"start": 96,
"value": "San Antonio, TX"
},
{
"end": 111,
"name": "addressRegion",
"start": 96,
"value": "San Antonio, TX"
},
{
"end": 111,
"name": "addressCountry",
"start": 96,
"value": "San Antonio, TX"
}
],
"postalCode": "78229-4404",
"streetAddress": "7400 Merton Minter Blvd."
},
{
"@id": "https://schema.org/Person?person=Mark%20Gutenberg",
"@type": "https://schema.org/Person",
"birthDate": "1984-12-08",
"birthPlace": "Hamburg",
"familyName": "Gutenberg",
"gender": "M",
"givenName": "Mark",
"matches": [
{
"end": 54,
"name": "familyName",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "gender",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "givenName",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 54,
"name": "person",
"start": 39,
"value": "Mark Gutenberg"
},
{
"end": 260,
"name": "birthPlace",
"start": 243,
"value": "HAMBURG, GERMANY"
},
{
"end": 282,
"name": "birthDate",
"start": 272,
"value": "12/8/1984"
}
],
"person": "Mark Gutenberg"
}
]
}
}
The value of the JSON-LD property is the JSON-LD object.
The characteristic of the JSON-LD format is to provide linked data. Specifically, PII information types and properties are linked to schema.org public vocabulary definitions.
For example, the type of the information representing a postal address corresponds to the https://schema.org/PostalAddress definition and the type's properties correspond to schema.org definitions too.
For the description of the JSON-LD format refer to the official documentation.
The @graph property of the JSON-LD object contains the actual PII. @graph is an array, each item of which represents a simple or composite information.
These are all the PII that may be present:
* dateTime is an array, since there can be more than one value associated with the person.
The matches array of each information item contains the occurrences of the properties in the text.
Each item of the array corresponds to a property. Item properties are:
name: property namestart: zero-based index of the first character of the occurrence in the textend: zero-based index of the first character after the occurrence in the textvalue: the portion of text from which the property value was taken
extractions array
To understand the contents of the extractions array you must know that information detection can also be seen as a process of extracting records of data from the text. Each record contains data fields and its structure—the possible fields—is called template.
A template can be compared to a table and the template fields to the columns of the table, as shown in the following figure.

So for example instances of the PII_PERSON template are records that contain fields like:
familyNamegendergivenNamebirthPlacebirthDate
Every item of the extractions array represents an extraction record.
For example, the following item is a record that's an instance of the PII_PERSON template:
{
"fields": [
{
"name": "familyName",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Gutenberg"
},
{
"name": "gender",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "M"
},
{
"name": "givenName",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Mark"
},
{
"name": "person",
"positions": [
{
"end": 54,
"start": 39
}
],
"value": "Mark Gutenberg"
},
{
"name": "birthPlace",
"positions": [
{
"end": 260,
"start": 243
}
],
"value": "Hamburg"
},
{
"name": "birthDate",
"positions": [
{
"end": 282,
"start": 272
}
],
"value": "1984-12-08"
}
],
"namespace": "pii_en_1.0",
"template": "PII_PERSON"
}
In each item:
namespaceis the name of the software module performing the extraction.templateis the name of the template.fieldsis the array of record fields.
Each item of the fields array item represents an extracted value where:
nameis the field's name.valueis the field's value.positionsis an array containing the extracted field's positions.
These are all the templates and related fields:
| Information type | Template | Field |
|---|---|---|
| Personal attributes | PII_PERSON |
|
person |
||
givenName |
||
familyName |
||
age |
||
gender |
||
nationality |
||
birthDate |
||
birthPlace |
||
deathDate |
||
deathPlace |
||
dateTime |
||
| Postal address | PII_ADDRESS |
|
address |
||
streetAddress |
||
addressCountry |
||
addressLocality |
||
addressRegion |
||
postalCode |
||
postOfficeBoxNumber |
||
| Bank account | PII_BANKACCOUNT |
|
IBAN |
||
IBANcountry |
||
| IP address | PII_IP |
|
IP |
||
| E-mail address | PII_EMAIL |
|
email |
||
| URL | PII_URL |
|
URL |
||
| Financial product (credi/debit card) | PII_FINANCIALPRODUCT |
|
creditDebitNumber |
||
CVV |
||
expirationDate |
||
| Phone number | PII_TELEPHONE |
|
telephone |