Skip to content

Information extraction

Information extraction detects meaningful parts of the document by mapping them to templates. It can return more than one record—you can think of a record as an instance of a template—for each defined template, based on the text matched by extraction rules.

The API resource carrying out information extraction has the following endpoint:

/api/analyze

In the reference section of this manual you will find all the information you need to perform information extraction, specifically:

Here is an example of performing information extraction on a short English test:

This example is based on the Python client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with you account credentials before running the sample program below.

The program prints the list of templates and their fields.

from expertai.nlapi.edge.client import ExpertAiClient
client = ExpertAiClient()

text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."

output = client.extraction(text)

print("List of templates and their fields:")

for extraction in output.extractions:
  print(extraction.template)
  for field in extraction.fields:
    print(field.name, field.value, sep="\t")

This example is based on the Java client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with you account credentials before running the sample program below.

The program prints the JSON response.

import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.edge.Analyzer;
import ai.expert.nlapi.v2.edge.AnalyzerConfig;
import ai.expert.nlapi.v2.message.AnalyzeResponse;
import ai.expert.nlapi.v2.model.AnalyzeDocument;
import ai.expert.nlapi.v2.model.Extraction;

public static Authentication createAuthentication() throws Exception {
        DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
        Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
        return new Authentication(authenticator);
    }

    public static Analyzer createAnalyzer() throws Exception {
        return new Analyzer(AnalyzerConfig.builder()
                .withVersion(API.Versions.V2)
                .withHost(API.DEFAULT_EDGE_HOST)
                .withAuthentication(createAuthentication())
                .build());
    }

    public static void main(String[] args) {
        try {
            String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";

            Analyzer analyzer = createAnalyzer();

            AnalyzeResponse analysis = analyzer.extraction(text);


            // Output JSON representation

            System.out.println("JSON representation:");
            analysis.prettyPrint();

            List<Extraction> extractions = analysis.getData().getExtractions();
        }
        catch(Exception ex) {
            ex.printStackTrace();
        }
    }
}