Hate speech detection
The Hate Speech detector aims at detecting and classifying instances of direct hate speech delivered through private messages, comments, social media posts and other short texts.
More specifically, it is designed to both extract the single instances of offensive and violent language and categorize each instance according to different hate speech categories.
Categorization works in a similar way to document classification and is based on a taxonomy.
Unlike the API resources dedicated to document classification, this is an information detector and the category tree of its taxonomy is not obtainable with API self-documentation resources like for document classification taxonomies, so it is indicated below.
The detector is able to identify three main categories of hate speech based on purpose:
- Personal insult
- Discrimination and harassment
- Threat and violence
Discrimination and harassment can be further divided into seven specific sub-categories that give information about the kind of discrimination perpetrated in the hate speech instances.
The full category tree is:
1000 Personal Insult 2000 Discrimination and Harassment 2100 Racism 2200 Sexism 2300 Ableism 2400 Religious Hatred 2500 Homophobia 2600 Classism 2700 Body Shaming 3000 Threat and Violence
There are three main categories and Discrimination and harassment has sub-categories to indicate the kind of discrimination.
For example, when analyzing this text:
We should hang John Doe.
output category is 3000 (Threat and violence).
Each category is associated with specific extractions.
The information extraction activity of the detector finds and returns records of extracted information. Each record contains data fields and its structure—the possible fields—is called template.
A template can be compared to a table and the template fields to the columns of the table, as shown in the following figure.
Records of the Hate_speech_detection template can have these fields:
|full_instance||Stereotypes, generalizations or hateful messages.||Girls can't drive!|
|target||Recipients of violent messages, sexual harassment, personal insults.||We should hang John Doe|
|target_1||John Doe is a retard|
|target_2||Fatties are ugly|
|target_3||Rednecks have very low IQ|
|target_4||All gays should be eliminated|
|target_6||Believe me, Christians should be crucified.|
|target_7||Girls can't drive!|
|sexual_harassment||Direct abusive communications characterized by sexual contents, appreciations or purposes.||I'd like to grab her tits|
|violence||Threats or violent purposes.||Let's bomb the government|
|cyberbullying||Direct abusive language, typically posted online, especially if it contains personal insults or body shaming. It is usually paired with instances classes target or target_2.||You are a bitch|
When an extracted value is recognized as a slur or a specific discriminated social group, extracted text is replaced with a standard value—see the Normalized value in the table above—in the extraction output. This for example doesn't apply to:
Let's bomb the government.
government is extracted without any normalization as the value of class target.
The extraction of all fields except full_instance is related to the categorization according to these relationships:
|1000 Personal Insult||target, cyberbullying|
|2200 Sexism||target_7, target|
|2400 Religious Hatred||target_6|
|2700 Body Shaming||target_2|
|3000 Threat and Violence||target, target_1, target_2, target_3, target_4, target_5, target_6, target_7, sexual_harassment, violence|
- How to request information detection API resources.
- How to interpret the output of the