Hate Speech Knowledge Model

Overview

The Hate Speech Knowledge Model (display name: Hate Speech EN v#) aims at detecting and classifying instances of direct hate speech delivered through private messages, comments, social media posts and other short English texts.

More specifically, it is designed to both extract the single instances of offensive and violent language and categorize each instance according to different hate speech categories.

Taxonomy and categorization

The model is able to identify three main categories of hate speech based on purpose:

Personal insult
Discrimination and harassment
Threat and violence

The category tree is:

1000 Personal Insult
2000 Discrimination and Harassment
    2100 Racism  
    2200 Sexism
    2300 Ableism
    2400 Religious Hatred
    2500 Homophobia
    2600 Classism
    2700 Body Shaming 
3000 Threat and Violence

There are three main categories and Discrimination and harassment has seven sub-categories to indicate the kind of discrimination.

For example, when analyzing this text:

We should hang John Doe.

output category is 3000 (Threat and violence).

Each category is associated with specific extractions (see below).

Extraction groups and classes

The Hate_speech_detection group have these classes:

Name	Description	Example	Normailzed value
full_instance	Stereotypes, generalizations or hateful messages.	Girls can't drive!
target	Recipients of violent messages, sexual harassment, personal insults.	We should hang John Doe	`individual` or `animal`
target_1		John Doe is a retard	`people with disabilities`
target_2		Fatties are ugly	`individuals`
target_3		Rednecks have very low IQ	`social class`
target_4		All gays should be eliminated	`LGBT group`
target_5		Nigga stink	`ethnic group`
target_6		Believe me, Christians should be crucified.	`religious group`
target_7		Girls can't drive!	`women`
sexual_harassment	Direct abusive communications characterized by sexual contents, appreciations or purposes.	I'd like to grab her tits
violence	Threats or violent purposes.	Let's bomb the government
cyberbullying	Direct abusive language, typically posted online, especially if it contains personal insults or body shaming. It is usually paired with instances classes target or target_2.	You are a bitch

When an extracted value is recognized as a slur or a specific discriminated social group, extracted text is replaced with a standard value—see the Normalized value in the table above—in the extraction output. This for example doesn't apply to:

Let's bomb the government.

where government is extracted without any normalization as the value of class target.

The extraction of all fields except full_instance is related to the categorization (see above) according to these relationships:

Category	Classes
1000 Personal Insult	target, cyberbullying
2100 Racism	target_5
2200 Sexism	target_7, target
2300 Ableism	target_1
2400 Religious Hatred	target_6
2500 Homophobia	target_4
2600 Classism	target_3
2700 Body Shaming	target_2
3000 Threat and Violence	target, target_1, target_2, target_3, target_4, target_5, target_6, target_7, sexual_harassment, violence

Output structure

The model output has the same structure as any other model and is affected by the functional properties of the workflow block.
The peculiar parts of the output are the result of categorization, i.e. the categories array, and the result of information extraction, i.e. the extractions array.

Example

Considering the text:

You are a bitch

The JSON output for categorization and extraction is:

"categories": [
    {
        "frequency": 93.01,
        "hierarchy": [
            "Personal Insult"
        ],
        "id": "1000",
        "label": "Personal Insult",
        "namespace": "hate_speech",
        "positions": [
            {
                "end": 3,
                "start": 0
            },
            {
                "end": 15,
                "start": 10
            }
        ],
        "score": 40,
        "winner": true
    },
    {
        "frequency": 6.96,
        "hierarchy": [
            "Sexism"
        ],
        "id": "2200",
        "label": "Sexism",
        "namespace": "hate_speech",
        "positions": [
            {
                "end": 15,
                "start": 10
            }
        ],
        "score": 3,
        "winner": true
    }
],
"content": "you are a bitch\n",
"entities": [],
"extractions": [
    {
        "fields": [
            {
                "name": "target",
                "positions": [
                    {
                        "end": 3,
                        "start": 0
                    }
                ],
                "value": "individual"
            },
            {
                "name": "cyberbullying",
                "positions": [
                    {
                        "end": 15,
                        "start": 10
                    }
                ],
                "value": "you are a bitch"
            }
        ],
        "namespace": "hate_speech",
        "template": "Hate_speech_detection"
    }
]

where 1000 (Personal Insult) is associated with 2200 (Sexism), that is a sub-category of 2000 (Discrimination and Harassment)

The extractions show the combination of target+cyberbullying.