Skip to content

Document classification output

The APi resources performing document classification return a JSON object with this format:

{
    "success": Boolean success flag,
    "data": {
        "content": analyzed text,
        "language": language code,
        "version": technology version info,
        "categories": []
    }
}

For the description of the contents, language and version properties, see the API resources output overview.

Each item of the categories array represents a category, for example:

{
    "frequency": 70.62,
    "hierarchy": [
        "Sport",
        "Competition discipline",
        "Basketball"
    ],
    "id": "20000851",
    "label": "Basketball",
    "namespace": "iptc_en_1.0",
    "positions": [
        {
            "end": 14,
            "start": 0
        },
        {
            "end": 53,
            "start": 35
        },
        {
            "end": 139,
            "start": 136
        }
    ],
    "score": 4005.0,
    "winner": true
}
  • namespace is the name of the software package containing the reference taxonomy.
  • id, label and hierarchy identify the category.
  • score is the cumulative score that was attributed to the category.
  • frequency is a percentage and an alternative measure of score that's easier to interpret when results need to be filtered based on the relative score difference. For example, if category #1 has frequency 50, category #2 has frequency 40 and category #3 has frequency 10, a filtering criteria like: "exclude categories with a frequency that's more than 10% lower than the highest" would reject category #3.
  • winner is a Boolean flag set to true if the category was considered particularly relevant.
  • positions is an array containing the positions of the text blocks that contributed to category score.