Skip to content

Hate speech detector output

Introduction

The Hate speech detector API resource returns a JSON object with this format:

{
    "success": Boolean success flag,
    "data": {
        "content": analyzed text,
        "language": language code,
        "version": technology version info,
        "categories": [],
        "extractions": [],
    }
}

For a description of the contents, language and version properties, see the API resources output overview.

categories is the output for categorization, extractions is the output for information extraction.

categorization array

Each item of the categories array represents a detected category, for example:

{
    "frequency": 6.96,
    "hierarchy": [
        "Sexism"
    ],
    "id": "2200",
    "label": "Sexism",
    "namespace": "hate_speech",
    "positions": [
        {
            "end": 15,
            "start": 10
        }
    ],
    "score": 3,
    "winner": true
}

where:

  • namespace is the name of the software module containing the reference taxonomy.
  • id, label and hierarchy identify the node in the category tree:

    • id is the identifying code
    • label is the description.
    • hierarchy is the path of the category in the category tree. The path is the sequence of categories that goes from the farthest ancestor to the category itself. hierarchy is an array containing the values of the label property for all the categories along the path.
  • score is the cumulative score that was attributed to the category.

  • frequency is the percentage ratio of the category score to the sum of all categories' scores.
  • winner is a Boolean flag set to true if the category was considered particularly relevant.
  • positions is an array containing the positions of the text blocks that contributed to category score.

extractions array

Every item of the extractions array represents an extraction record.
For example, the following item is a record that's an instance of the ENTITY template:

{
    "namespace": "hate-speech_en_1.1",
    "template": "Hate_speech_detection",
    "fields": [
        {
            "name": "full_instance",
            "value": "niggers",
            "positions": [
                {
                    "start": 7,
                    "end": 14
                }
            ]
        },
        {
            "name": "target_5",
            "value": "ethnic group",
            "positions": [
                {
                    "start": 7,
                    "end": 14
                }
            ]
        }
    ]
}

See the article in the guide section to know about the concept of information extraction and all the possible record templates with their fields.

In each item:

  • namespace is the name of the software module performing the extraction.
  • template is the name of the template.
  • fields is the array of record fields.

Each item of the fields array item represents an extracted value where:

  • name is the field's name.
  • value is the field's value.
  • positions is an array containing the extracted field's positions.