Skip to content

Taxonomies

Introduction

A taxonomy is the set of categories that a document classification resource can recognize.
In the expert.ai Natural Language API, the taxonomy name is used to identify document classification resources.

Several classification resources can exist for the same taxonomy, each supporting a different language. The complete endpoint of a classification resource must thus contain both the taxonomy name and the language.

To date, the API exposes classification resources for four taxonomies.
The table below shows taxonomy names and the supported languages.

Taxonomy name English Spanish French German Italian
iptc
geotax
emotional-traits
behavioral-traits

For more information about categories see the article in the reference section.

iptc

The classification resources corresponding to the iptc taxonomy classify texts in terms of IPTC Media Topics subject codes.
IPTC is the global standards body of the news media and as such, this type of classification is geared towards news media.

Use the self-documentation resources to get the complete list of recognized categories.

geotax

The classification resources corresponding to the geotax taxonomy classify texts according to the names of the countries mentioned directly or indirectly.
Use the self-documentation resources to get a list of the recognized countries.

In the case of the US and UK, both the state/country and the federation/kingdom are returned. For example, in the case of this text:

Chicago is the birthplace of many celebrities such as Walt Disney.

Both Illinois and United States of America are returned.

The same resources, when requested with a specific query-string parameter, also return countries' information as GeoJSON data.

emotional-traits

The classification resources corresponding to the emotional-traits taxonomy classify documents in terms of the feelings—joy, surprise, irritation, etc.—expressed in the text. They can recognize 39 different emotional traits divided into eight groups.

During the design phase, the choice of which emotional traits to identify was guided—in addition to the developer community needs for this API extension—by the literature available on the subject, including some recent research publications1.

You can find the category tree for this taxonomy in the reference section.
Here is an abstract of the category tree for English:

...
    Group Dejection
        Sadness
        Torment
        Suffering
        Sorrow
        Disappointment
        Disillusion
        Resignation
    Group Surprise
        Surprise
    Group Delight
        Happiness
        Excitement
        Joy
        Amusement
        Well-Being
        Satisfaction
        Relief
    ...

Classification resources return leaf categories, that is 2nd level categories like Excitement and Disillusion, but if requested with a special parameter they also return the main groups of emotional traits. Main groups are the taxonomy groups corresponding to the most relevant emotional traits expressed in the text. They provides an easy-to-read indication of the clusters of emotional traits the text is more about, similarly to an abstract.

behavioral-traits

The document classification API resources corresponding to the behavioral-traits taxonomy recognize references to personality traits—like curiosity, honesty, negativity, etc.—of the people mentioned in the text. They can recognize 72 different traits divided into seven super-groups with three sub-groups each.

In the design phase, the behavioral traits to be identified were chosen based on the Big Five and by considering real world applications and recent papers on the subject2.

You can find the category tree for this taxonomy in the reference section.
Here is an excerpt:

...
Action
    Action low
        Sedentariness
        Passivity
    Action fair
        Calmness
    Action high
        Initiative
        Dynamism
Openness
    Openness low
        Rejection
        Apathy
        Apprehension
        Traditionalism
        Conformism
        Negativity
        Bias
    Openness fair
        Cautiousness
    Openness high
        Progressiveness
        Acceptance
        Courage
        Positivity
        Curiosity
...

Categories at the first and second level of the hierarchy function as super-groups and sub-groups.

Grouping is purely ontological. The names of the groups have no positive or negative connotations: in particular, the high, fair and low qualifications in sub-groups are not to be interpreted as a moral judgment, but as the intensity with which encompassed behavioral traits represent the super-group.

Only the categories at the third level of the hierarchy—the leaf categories—are returned in the output, but their full path starting from the super-group is returned as an additional property for each category.

Self-documentation resources

List of available taxonomies

The API provides a self-documentation resource to discover available taxonomies and their features. It has this path:

taxonomies

Therefore, the complete URL is:

https://nlapi.expert.ai/v2/taxonomies

It must be requested with the GET method.
It returns the list of available taxonomies along with the supported languages—as in the above table.

In the reference section of this manual you will find all the information you need to get taxonomies information using the API's RESTful interface, specifically:

Even if you use the API through a client that hides the REST interface, whether it is made by you or offered by expert.ai, the last piece of information is useful as it helps understand the data returned by the API.

Here is an example of getting taxonomies information:

This example is based on the Python client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

The program prints the list of taxonomies with the language they support.

from expertai.nlapi.cloud.client import ExpertAiClient

client = ExpertAiClient()

output = client.taxonomies()

print("Taxonomies:")

for taxonomy in output.taxonomies:
    print(taxonomy.name)
    print("\tLanguages:")
    for language in taxonomy.languages:
        print("\t", language.code)

This example is based on the Java client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with you account credentials before running the sample program below.

The program prints the JSON response.

import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.cloud.InfoAPI;
import ai.expert.nlapi.v2.cloud.InfoAPIConfig;
import ai.expert.nlapi.v2.message.TaxonomiesResponse;

public class Main {

    public static Authentication createAuthentication() throws Exception {
        DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
        Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
        return new Authentication(authenticator);
    }

    public static void main(String[] args) {
        try {
            InfoAPI infoAPI = new InfoAPI(InfoAPIConfig.builder()
                                                       .withAuthentication(createAuthentication())
                                                       .withVersion(API.Versions.V2)
                                                       .build());

            TaxonomiesResponse taxonomies = infoAPI.getTaxonomies();
            taxonomies.prettyPrint();
        }
        catch(Exception ex) {
            ex.printStackTrace();
        }
    }
}

The following curl command gets the taxonomies documentation resource of the API's REST interface.
Run the command from a shell after replacing token with the actual authorization token.

curl -X GET https://nlapi.expert.ai/v2/taxonomies \
    -H 'Authorization: Bearer token'

The server returns a JSON object.

The following curl command gets the taxonomies documentation resource of the API's REST interface.
Open a command prompt in the folder where you installed curl and run the command after replacing token with the actual authorization token.

curl -X GET https://nlapi.expert.ai/v2/taxonomies -H "Authorization: Bearer token"

The server returns a JSON object.

Category tree

The API also provides self-documentation resources that return a specific taxonomy for a given language in form of category tree.
These resources have paths like this:

taxonomies/taxonomy/language

Boxed parts are placeholders.
For example:

https://nlapi.expert.ai/v2/taxonomies/iptc/en

is the URL of the resource returning the category tree for the iptc taxonomy for the English language.
These resources must be requested with the GET method.

In the reference section of this manual you will find all the information you need to get these resources using the API's RESTful interface, specifically:

Even if you use the API through a client that hides the REST interface, whether it is made by you or offered by expert.ai, the last piece of information is useful as it helps understand the data returned by the API.

For example, here is how to get the category tree of the geotax taxonomy for English:

This example is based on the Python client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

The program prints the category tree.

from expertai.nlapi.cloud.client import ExpertAiClient

def printCategory(level, category):
    tabs = "\t" * level
    print(tabs, category.id, "(", category.label, ")")
    for nestedCategory in category.categories:
        printCategory(level + 1, nestedCategory)

client = ExpertAiClient()

taxonomy='geotax'
language='en'

output = client.taxonomy(params={'taxonomy': taxonomy, 'language': language})

print("geotax categories' tree:")

for category in output.taxonomy[0].categories:
    printCategory(1, category)

This example is based on the Java client you can find on GitHub.

The client gets user credentials from two environment variables:

EAI_USERNAME
EAI_PASSWORD

Set those variables with your account credentials before running the sample program below.

The program prints the JSON response.

import ai.expert.nlapi.security.Authentication;
import ai.expert.nlapi.security.Authenticator;
import ai.expert.nlapi.security.BasicAuthenticator;
import ai.expert.nlapi.security.DefaultCredentialsProvider;
import ai.expert.nlapi.v2.API;
import ai.expert.nlapi.v2.cloud.InfoAPI;
import ai.expert.nlapi.v2.cloud.InfoAPIConfig;
import ai.expert.nlapi.v2.message.TaxonomyResponse;

public class Main {

    public static Authentication createAuthentication() throws Exception {
        DefaultCredentialsProvider credentialsProvider = new DefaultCredentialsProvider();
        Authenticator authenticator = new BasicAuthenticator(credentialsProvider);
        return new Authentication(authenticator);
    }

    public static void main(String[] args) {
        try {
            InfoAPI infoAPI = new InfoAPI(InfoAPIConfig.builder()
                                                       .withAuthentication(createAuthentication())
                                                       .withVersion(API.Versions.V2)
                                                       .build());

            TaxonomyResponse taxonomy = infoAPI.getTaxonomy("geotax", API.Languages.en);
            taxonomy.prettyPrint();
        }
        catch(Exception ex) {
            ex.printStackTrace();
        }
    }
}

The following curl command gets the resource of the API's REST interface that returns the categories' tree of the English geotax taxonomy. Run the command from a shell after replacing token with the actual authorization token.

curl -X GET https://nlapi.expert.ai/v2/taxonomies/geotax/en \
    -H 'Authorization: Bearer token'

The server returns a JSON object.

The following curl command gets the resource of the API's REST interface that returns the categories' tree of the English geotax taxonomy. Open a command prompt in the folder where you installed curl and run the command after replacing token with the actual authorization token.

curl -X GET https://nlapi.expert.ai/v2/geotax/en -H "Authorization: Bearer token"

The server returns a JSON object.


  1. A. Dabrowski, "Emotions in philosophy. A short introduction", Studia Humana, 2016, 5:3, 8-20.
    E. Kim, R. Klinger, "A Survey on Sentiment and Emotion Analysis for Computational Literary Studies", Institut für Maschinelle Sprachverarbeitung, University of Stuttgart, 2018.
    A. Yadollahi, A. G. Shahraki, O. S. Zaiane, "Current State of Text Sentiment Analysis from Opinion to Emotion Mining", University of Alberta, 2017.
    R. Donovan, A. Johnson, A. deRoiste, R. O'Reilly, "Quantifying the Links between Personality Sub-Traits and the Basic Emotions", Computer Science and its Applications, 2020. 

  2. N. Najm, "Big Five Traits: A Critical Review", Gadjah Mada International Journal of Business, September 2019.
    R. M. S. Ramos, G. B. S. Neto, B. B. C. Silva, D. S. Monteiro, I. Paraboni, R. F. S. Dias, "Building a Corpus for Personality-dependent Natural Language Understanding and Generation", Lrec conference proceedings, 2018.
    K. Luyckx, W. Daelemans, "Personae: a corpus for author and personality prediction from text", Lrec conference proceedings, 2008.
    F. Celli, "Adaptive Personality Recognition from Text", PhD Thesis, University of Trento, 2012.