Skip to content

Named entity recognition

Named entity recognition determines which entities—persons, places, organizations, dates, addresses, etc.—are mentioned in a text.

Named entity recognition also performs knowledge linking—Wikidata, DBpedia and GeoNames references are provided for selected entities. In the case of real places, geographic coordinates are also provided.

Entities are also recognized in pronouns and shorter forms that refer to named mentions.
This kind of by reference recognition can be defined as anaphoric because entities are recognized through anaphoras.
For example in this text:

Michael Jordan was one of the best basketball players of all time.

Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.

three mentions of Michael Jordan are recognized:

  • the full named mention: Michael Jordan
  • the anaphoras—Jordan and he—for which Michael Jordan is considered the antecedent.

Here is an example of performing named entity recognition on a short English test:

This example is based on the Python SDK you find in the expert.ai developer portal.

The SDK's API client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with you account credentials before running the sample program below.

The program prints a JSON representation of the results and the list of entities with their type.

from expertai.client import ExpertAiClient
eai = ExpertAiClient()

text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
language= 'en'

response = eai.specific_resource_analysis(body={"document": {"text": text}}, params={'language': language, 'resource': 'entities'})


# Output JSON representation

print("JSON representation:")
print(response.json)


# Tab separated list of entitites' lemma and type

print("\nTab separated list of entitites' lemma and type:")
document = response.json["data"]

for entity in document["entities"]:
    print(entity["lemma"], entity["type"], sep="\t")

This example is based on the Java SDK you find in the expert.ai developer portal.

In the code below, replace yourusername and yourpassword with your account credentials.

The program prints a JSON representation of the results and the list of entities with their type.

import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.Credential;
import ai.expert.nlapi.v1.API;
import ai.expert.nlapi.v1.Analyzer;
import ai.expert.nlapi.v1.AnalyzerConfig;    
import ai.expert.nlapi.v1.message.ResponseDocument;
import ai.expert.nlapi.v1.model.DataModel;

public class Main {

    public static Authentication createAuthentication() throws Exception {
        Authenticator authenticator = new BasicAuthenticator(new Credential("yourusername", "yourpassword"));
        return new Authentication(authenticator);
    }

    public static Analyzer createAnalyzer() throws Exception {
        return new Analyzer(AnalyzerConfig.builder()
                .withVersion(API.Versions.V1)
                .withContext(API.Contexts.STANDARD)
                .withLanguage(API.Languages.en)
                .withAuthentication(createAuthentication())
                .build());
    }

    public static void main(String[] args) {
        try {
            String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";

            Analyzer analyzer = createAnalyzer();

            ResponseDocument entities = analyzer.entities(text);


            // Output JSON representation

            System.out.println("JSON representation:");
            entities.prettyPrint();


            // Tab separated list of entitites' lemma and type.

            System.out.println("Tab separated list of entities' lemma and type:");
            DataModel data = entities.getData();
            data.getEntities().stream().forEach(c -> System.out.println(c.getLemma() + "\t" + c.getType()));
        }
        catch(Exception ex) {
            ex.printStackTrace();
        }
    }
}

The following curl command posts a document to the named entity recognition resource of the API's REST interface.
Run the command from a shell after replacing token with the actual authorization token.

curl -X POST https://nlapi.expert.ai/v1/analyze/standard/en/entities \
    -H 'Authorization: Bearer token' \
    -H 'Content-Type: application/json; charset=utf-8' \
    -d '{
  "document": {
    "text": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan'\''s stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
  }
}'

The server returns a JSON object like the one below.
For more information see the following pages in the reference section:

{
    "data": {
        "content": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.",
        "entities": [
            {
                "lemma": "Michael Jordan",
                "positions": [
                    {
                        "end": 14,
                        "start": 0
                    },
                    {
                        "end": 85,
                        "start": 79
                    },
                    {
                        "end": 111,
                        "start": 109
                    }
                ],
                "syncon": -1,
                "type": "NPH"
            },
            {
                "lemma": "National Basketball Association",
                "positions": [
                    {
                        "end": 139,
                        "start": 136
                    }
                ],
                "syncon": 206693,
                "type": "ORG"
            }
        ],
        "knowledge": [
            {
                "label": "group.human_group.organization.sport_association",
                "properties": [
                    {
                        "type": "DBpediaId",
                        "value": "dbpedia.org/page/National_Basketball_Association"
                    },
                    {
                        "type": "WikiDataId",
                        "value": "Q155223"
                    }
                ],
                "syncon": 206693
            }
        ],
        "language": "en",
        "version": "sensei: 3.1.0; disambiguator: 15.0-QNTX-2016"
    },
    "success": true
}

The following curl command posts a document to the named entity recognition resource of the API's REST interface.
Open a command prompt in the folder where you installed curl and run the command after replacing token with the actual authorization token.

curl -X POST https://nlapi.expert.ai/v1/analyze/standard/en/entities  -H "Authorization: Bearer token" -H "Content-Type: application/json; charset=utf-8" -d "{\"document\": {\"text\": \"Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.\"}}"

The server returns a JSON object like the one below.
For more information see the following pages in the reference section:

{
    "data": {
        "content": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.",
        "entities": [
            {
                "lemma": "Michael Jordan",
                "positions": [
                    {
                        "end": 14,
                        "start": 0
                    },
                    {
                        "end": 85,
                        "start": 79
                    },
                    {
                        "end": 111,
                        "start": 109
                    }
                ],
                "syncon": -1,
                "type": "NPH"
            },
            {
                "lemma": "National Basketball Association",
                "positions": [
                    {
                        "end": 139,
                        "start": 136
                    }
                ],
                "syncon": 206693,
                "type": "ORG"
            }
        ],
        "knowledge": [
            {
                "label": "group.human_group.organization.sport_association",
                "properties": [
                    {
                        "type": "DBpediaId",
                        "value": "dbpedia.org/page/National_Basketball_Association"
                    },
                    {
                        "type": "WikiDataId",
                        "value": "Q155223"
                    }
                ],
                "syncon": 206693
            }
        ],
        "language": "en",
        "version": "sensei: 3.1.0; disambiguator: 15.0-QNTX-2016"
    },
    "success": true
}