Information extraction
Information extraction detects meaningful parts of the document by mapping them to templates. It can return more than one record—you can think of a record as an instance of a template—for each defined template, based on the text matched by extraction rules.
The API resource carrying out information extraction has the following endpoint:
/api/analyze
In the reference section of this manual you will find all the information you need to perform information extraction, specifically:
Here is an example of performing information extraction on a short English test:
This example is based on the Python client you can find on GitHub.
The client gets user credentials from two environment variables:
EAI_USERNAME
EAI_PASSWORD
Set those variables with you account credentials before running the sample program below.
The program prints the list of templates and their fields.
from expertai.nlapi.edge.client import ExpertAiClient
client = ExpertAiClient()
text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
output = client.extraction(text)
print("List of templates and their fields:")
for extraction in output.extractions:
print(extraction.template)
for field in extraction.fields:
print(field.name, field.value, sep="\t")
This example is based on the Java client you can find on GitHub.
The client gets user credentials from two environment variables:
EAI_USERNAME
EAI_PASSWORD
Set those variables with you account credentials before running the sample program below.
The program prints the JSON response.
import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.edge.Analyzer;
import ai.expert.nlapi.v2.edge.AnalyzerConfig;
import ai.expert.nlapi.v2.message.AnalyzeResponse;
import ai.expert.nlapi.v2.model.AnalyzeDocument;
import ai.expert.nlapi.v2.model.Extraction;
public static Authentication createAuthentication() throws Exception {
DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
return new Authentication(authenticator);
}
public static Analyzer createAnalyzer() throws Exception {
return new Analyzer(AnalyzerConfig.builder()
.withVersion(API.Versions.V2)
.withHost(API.DEFAULT_EDGE_HOST)
.withAuthentication(createAuthentication())
.build());
}
public static void main(String[] args) {
try {
String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";
Analyzer analyzer = createAnalyzer();
AnalyzeResponse analysis = analyzer.extraction(text);
// Output JSON representation
System.out.println("JSON representation:");
analysis.prettyPrint();
List<Extraction> extractions = analysis.getData().getExtractions();
}
catch(Exception ex) {
ex.printStackTrace();
}
}
}