Document classification
Document classification determines what the document text is about by mapping it to the nodes of a taxonomy.
Here is an example of performing document classification on a short English test:
This example is based on the Python SDK you find in the expert.ai developer portal.
The SDK's API client gets user credentials from two environment variables:
EAI_USERNAME
EAI_PASSWORD
Set those variables with you account credentials before running the sample program below.
The program prints a JSON representation of the results and the list of categories' id and hierarchy.
from expertai.client import ExpertAiClient
eai = ExpertAiClient()
text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
language= 'en'
response = eai.iptc_media_topics_classification(body={"document": {"text": text}}, params={'language': language})
# Output JSON representation
print("JSON representation:")
print(response.json)
# Tab separated list of categories' id and hierarchy.
print("\nTab separated list of categories' id and hierarchy:")
document = response.json["data"]
for category in document["categories"]:
print(category["id"], category["hierarchy"], sep="\t")
This example is based on the Java SDK you find in the expert.ai developer portal.
In the code below, replace yourusername
and yourpassword
with your account credentials.
The program prints a JSON representation of the results and the list of categories' id and hierarchy.
The program prints a JSON representation of the results and the list of categories' id and hierarchy.
import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.Credential;
import ai.expert.nlapi.v1.API;
import ai.expert.nlapi.v1.Categorizer;
import ai.expert.nlapi.v1.CategorizerConfig;
import ai.expert.nlapi.v1.message.ResponseDocument;
import ai.expert.nlapi.v1.model.DataModel;
public class Main {
public static Authentication createAuthentication() throws Exception {
Authenticator authenticator = new BasicAuthenticator(new Credential("yourusername", "yourpassword"));
return new Authentication(authenticator);
}
public static Categorizer createCategorizer() throws Exception {
return new Categorizer(CategorizerConfig.builder()
.withVersion(API.Versions.V1)
.withTaxonomy(API.Taxonomies.IPTC)
.withLanguage(API.Languages.en)
.withAuthentication(createAuthentication())
.build());
}
public static void main(String[] args) {
try {
String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";
Categorizer categorizer = createCategorizer();
ResponseDocument categorization = categorizer.categorize(text);
// Output JSON representation
System.out.println("JSON representation:");
categorization.prettyPrint();
// Tab separated list of categories' id and hierarchy.
System.out.println("Tab separated list of categories' id and hierarchy:");
DataModel data = categorization.getData();
data.getCategories().stream().forEach(c -> System.out.println(c.getId() + "\t" + c.getHierarchy()));
}
catch(Exception ex) {
ex.printStackTrace();
}
}
}
The following curl command posts a document to the document classification resource of the API's REST interface.
Run the command from a shell after replacing token
with the actual authorization token.
curl -X POST https://nlapi.expert.ai/v1/categorize/iptc/en \
-H 'Authorization: Bearer token' \
-H 'Content-Type: application/json; charset=utf-8' \
-d '{
"document": {
"text": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan'\''s stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
}
}'
The server returns a JSON object like the one below.
For more information see the following pages in the reference section:
{
"data": {
"categories": [
{
"frequency": 70.62,
"hierarchy": [
"Sport",
"Competition discipline",
"Basketball"
],
"id": "20000851",
"label": "Basketball",
"namespace": "iptc_en_1.0",
"positions": [
{
"end": 14,
"start": 0
},
{
"end": 53,
"start": 35
},
{
"end": 139,
"start": 136
}
],
"score": 4005.0,
"winner": true
}
],
"content": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.",
"language": "en",
"version": "sensei: 3.1.0; disambiguator: 14.5-QNTX-2016"
},
"success": true
}
The following curl command posts a document to the document classification resource of the API's REST interface.
Open a command prompt in the folder where you installed curl and run the command after replacing token
with the actual authorization token.
curl -X POST https://nlapi.expert.ai/v1/categorize/iptc/en -H "Authorization: Bearer token" -H "Content-Type: application/json; charset=utf-8" -d "{\"document\": {\"text\": \"Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.\"}}"
The server returns a JSON object like the one below.
For more information see the following pages in the reference section:
{
"data": {
"categories": [
{
"frequency": 70.62,
"hierarchy": [
"Sport",
"Competition discipline",
"Basketball"
],
"id": "20000851",
"label": "Basketball",
"namespace": "iptc_en_1.0",
"positions": [
{
"end": 14,
"start": 0
},
{
"end": 53,
"start": 35
},
{
"end": 139,
"start": 136
}
],
"score": 4005.0,
"winner": true
}
],
"content": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.",
"language": "en",
"version": "sensei: 3.1.0; disambiguator: 14.5-QNTX-2016"
},
"success": true
}