Skip to content

Document classification

Try it live

Classification and taxonomies

Document classification determines what a text is about in terms of categories of a taxonomy.

Available taxonomies are:

Taxonomy English Spanish French German Italian
iptc
geotax
emotional-traits
behavioral-traits

In the Natural Language API terminology, taxonomy "x" is both a specific set of categories and the name of the API resources capable of classifying a text according to that set.

Taxonomies' resources have paths like this:

categorize/taxonomy name/language code

Boxed parts are placeholders, so for example:

https://nlapi.expert.ai/v2/categorize/iptc/en

is the URL of the iptc resource capable of performing the IPTC Media Topics classification of English texts.
These resources must be requested with the POST method, submitting the text to classify.

In the reference section of this manual you will find all the information you need to perform document classification using the API's RESTful interface, specifically:

Note

Even if you consume the API through a ready-to-use client that hides low-level requests and responses, knowing the output format helps you understand and navigate the results.

Here is an example of performing IPTC Media Topics classification of a short English text:

This example code uses expertai-nlapi, the open-source Python client corresponding to the nlapi-python GitHub project.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

For each output category, the program prints the ID and the hierarchy.

from expertai.nlapi.cloud.client import ExpertAiClient
client = ExpertAiClient()

text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
taxonomy = 'iptc'
language= 'en'

output = client.classification(body={"document": {"text": text}}, params={'taxonomy': taxonomy, 'language': language})

print("Tab separated list of categories:")

for category in output.categories:
    print(category.id_, category.hierarchy, sep="\t")

This example code uses @expertai/nlapi, the open-source NodeJS client corresponding to the nlapi-nodejs GitHub project.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

For each output category, the program prints the ID and the hierarchy.

import {NLClient} from "@expertai/nlapi";
import {Language} from "@expertai/nlapi";

var nlClient = new NLClient();

var text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";

nlClient.categorize(text, {
  taxonomy: "iptc",
  language: Language.EN
}).then((result) => {
    console.log("Categories:");
    for (const category of result.data.categories) {
        console.log(category.id + " = " + category.hierarchy.join(" > "));
    }
})

This example code uses nlapi-java-sdk, the open-source Java client corresponding to the nlapi-java GitHub project.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

The program prints the JSON response.

import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.cloud.Categorizer;
import ai.expert.nlapi.v2.cloud.CategorizerConfig;
import ai.expert.nlapi.v2.message.CategorizeResponse;
import ai.expert.nlapi.v2.model.CategorizeDocument;

public class Main {

    public static Authentication createAuthentication() throws Exception {
        DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
        Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
        return new Authentication(authenticator);
    }

    public static Categorizer createCategorizer() throws Exception {
        return new Categorizer(CategorizerConfig.builder()
                                                .withVersion(API.Versions.V2)
                                                .withTaxonomy("iptc")
                                                .withLanguage(API.Languages.en)
                                                .withAuthentication(createAuthentication())
                                                .build());
    }

    public static void main(String[] args) {
        try {
            String text = "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.";

            Categorizer categorizer = createCategorizer();

            CategorizeResponse categorization = categorizer.categorize(text);


            // Output JSON representation

            System.out.println("JSON representation:");
            categorization.prettyPrint();


            // Tab separated list of categories.

            System.out.println("Tab separated list of categories:");
            CategorizeDocument data = categorization.getData();

            data.getCategories().stream().forEach(c -> System.out.println(c.getId() + "\t" + c.getHierarchy()));
        }
        catch(Exception ex) {
            ex.printStackTrace();
        }
    }
}

The following curl command posts a document to the document classification resource of the API's REST interface.
Run the command from a shell after replacing token with the actual authorization token.

curl -X POST https://nlapi.expert.ai/v2/categorize/iptc/en \
    -H 'Authorization: Bearer token' \
    -H 'Content-Type: application/json; charset=utf-8' \
    -d '{
  "document": {
    "text": "Michael Jordan was one of the best basketball players of all time. Scoring was Jordan'\''s stand-out skill, but he still holds a defensive NBA record, with eight steals in a half."
  }
}'

The server returns a JSON object.

The following curl command posts a document to the document classification resource of the API's REST interface.
Open a command prompt in the folder where you installed curl and run the command after replacing token with the actual authorization token.

curl -X POST https://nlapi.expert.ai/v2/categorize/iptc/en  -H "Authorization: Bearer token" -H "Content-Type: application/json; charset=utf-8" -d "{\"document\": {\"text\": \"Michael Jordan was one of the best basketball players of all time. Scoring was Jordan's stand-out skill, but he still holds a defensive NBA record, with eight steals in a half.\"}}"

The server returns a JSON object.

The following articles describe the capabilities of the available taxonomies.

Self-documentation resources

The API provides self-documentation resources to programmatically discover available taxonomies and their features. Learn more about these resources in the dedicated article.