2106 JWLLP

Hate Speech as Toxic and Biased Words:
Construction and Analysis of
Korean Hate Speech Corpus
Won Ik Cho (SNU ECE)
2021. 6. 4 @JWLLP

Contents
• Introduction
• Source Corpus
• Guideline and Annotation
• Analysis
• Conclusion
Caution! This presenation contains contents that can be offensive
1

Introduction
• Hate speech
 What are the aspects of hate speech?
• Hate speech and hatred
• Bad words and insulting
• Discrimination and bias
 Various projects undergoing in the name of ...
• Abusive language, Toxic words, etc.
 Social agreement that prevalent hate speech `matters’ a lot
 However, some argues are on:
• What really is `hate speech’?
• Can some expressions be called as `hate speech’?
• Is hate speech really hateful?
2

Introduction
• Hate speech
 What are the aspects of hate speech?
• Hate speech and hatred
• Bad words and insulting
• Discrimination and bias
 Various projects undergoing in the name of ...
• Abusive language, Toxic words, etc.
 Social agreement that prevalent hate speech `matters’ a lot
 However, some argues are on:
• What really is `hate speech’?
• Can some expressions be called as `hate speech’?
• Is hate speech really hateful?
3

Introduction
• Hate speech
 Hate speech detection in practice
• Finding and blinding malicious expressions in game or broadcasting chat
• Blinding posts/comments of Youtube, Facebook or Twitter based on detecting
system
 Does current practical studies consider theoretical/social discussions?
• Current practical studies in Korean hate speech detection
– Detecting swear words and profanity terms: Usually dictionary-based
– Defines the sentences that contain the terms as `hate speech’
– OR sometimes defines the expressions from certain communities as hate speech
– Less study on human annotating the utterances
4

Introduction
• Hate speech
 Hate speech detection in practice
• Finding and blinding malicious expressions in game or broadcasting chat
• Blinding posts/comments of Youtube, Facebook or Twitter based on detecting
system
 Does current practical studies consider theoretical/social discussions?
• Current practical studies in Korean hate speech detection
– Detecting swear words and profanity terms: Usually dictionary-based
– Defines the sentences that contain the terms as `hate speech’
– OR sometimes defines the expressions from certain communities as hate speech
– Less study on human annotating the utterances
5

Introduction
• Hate speech
 In literature (and in other languages)
• Waseem and Hovy (2016)
– Tags English twitter posts, with around 10 or more characteristics that imply hate
speech
• Davidson et al. (2017)
– Mentions the discrepancy between the theoretical definition and real world
expressions of hate speech
– Puts `offensive’ expressions in between `hate’ and `non-hate’, to incorporate the
expressions that are in the grey area
• Sanguinetti et al. (2018)
– Investigates hate speech for the posts on Italian immigrants
» Beyond hate speech, detects if the post is offensive, aggressive, intensive, has
irony and sarcasm, shows stereotype.
» `Stereotype’ as a factor that can be a clue to discrimination
6

Introduction
• Hate speech
 In literature (and in other languages)
• Waseem and Hovy (2016)
– Tags English twitter posts, with around 10 or more characteristics that imply hate
speech
• Davidson et al. (2017)
– Mentions the discrepancy between the theoretical definition and real world
expressions of hate speech
– Puts `offensive’ expressions in between `hate’ and `non-hate’, to incorporate the
expressions that are in the grey area
• Sanguinetti et al. (2018)
– Investigates hate speech for the posts on Italian immigrants
» Beyond hate speech, detects if the post is offensive, aggressive, intensive, has
irony and sarcasm, shows stereotype.
» `Stereotype’ as a factor that can be a clue to discrimination
7

Introduction
• Hate speech
 Research Questions
• RQ1
– How is hate speech displayed in Korean online comments?
» What is bias and which categories are included in?
» How can we represent the amount of toxicity of expressions?
• RQ2
– What characteristics does the Korean hate speech corpus incorporate?
» Does bias accompany the toxicity of expression?
» Does toxicity matter with the type of shown bias?
8

Source Corpus
• Comments from the most popular Korean entertainment news
platform
 Jan. 2018 ~ Feb. 2020
 10,403,368 comments from 23,700 articles
 Sampling and Filtering
 Top 20 comments in the order of Wilson score on the downvote for each
1,580 articles acquired by stratified sampling
• Filter the duplicates and leave comments having more than single
token and less than 100 characters
• 10K comments were selected
9

Guideline and Annotation
• Formulation
 Hate speech
• Discussion with 1,000 comments over total 10,000
• Which factors make the comment `hate speech’?
– Bias
» `People with a specific characteristic may behave in some way’
» May differ from the judgment
– Hate
» Hostility towards a specific group or individual
» Can be represented by some profanity terms, but terms does not imply hate
– Insult
» Expressions that can harm the prestige of individuals or group
» Various profanity terms are included
– Offensive expressions
» Does not count as hate or insult, but may make the readers offensive
» Includes sarcasm, irony, bad guessing, unethical expressions
10

• Formulation
 Hate speech
• Discussion with 1,000 comments over total 10,000
• Which factors make the comment `hate speech’?
– Bias
» `People with a specific characteristic may behave in some way’
» May differ from the judgment
– Hate
» Hostility towards a specific group or individual
» Can be represented by some profanity terms, but terms does not imply hate
– Insult
» Expressions that can harm the prestige of individuals or group
» Various profanity terms are included
– Offensive expressions
» Does not count as hate or insult, but may make the readers offensive
» Includes sarcasm, irony, bad guessing, unethical expressions
11

• Formulation
 Social bias + Toxicity
• Detection of bias (ternary)
– Gender-related bias (Why?)
– Other biases
– None
» Close to the problem of `detection’
» Why concentrated on gender issue?
• Measuring toxicity (ternary)
– Severe hate or insult
– Not hateful but offensive or sarcastic
– None
» Close to the problem of `amount’
» Why formulated as a problem of intensity?
12

• Formulation
 Social bias + Toxicity
• Detection of bias (ternary)
– Gender-related bias (Why?)
– Other biases
– None
» Close to the problem of `detection’
» Why concentrated on gender issue?
• Measuring toxicity (ternary)
– Severe hate or insult
– Not hateful but offensive or sarcastic
– None
» Close to the problem of `amount’
» Why formulated as a problem of intensity?
13

• Guideline
 On bias
• Gender-related bias (left)
and other biases (right)
14

• Guideline
 On toxicity
• Hate (left two) and offensive (right)
15

• Guideline
 Multi-label tagging
• 3 classes for bias
• 3 classes for toxicity
 Given a comment (without context), the annotator should tag each
attribute
 Every comments provided to three random annotators
• Total 32 participants (in pilot and main tagging phase)
• Female : male = 6 : 4 / 20s : 30s : 40s = 3 : 2 : 1
16
1. What kind of bias does the comment contain?
- Gender bias, Other biases, or None
2. Which is the adequate category for the comment in terms of toxicity?
- Hate, Offensive, or None

• Pilot tagging – Which workers would fit?
 Human checked
• Ethical standard not too far from the guideline?
• Is feedback effective for the rejected samples?
 Automatically checked
• Enough taggings done?
• Too frequent cases of skipping the annotation?
17

• Crowd-sourcing – With selected workers
 Feedback for each annotator is not conducted in the sourcing phase
18

Analysis
• Data Post-processing
 After whole annotation (8,000 instances)
• Commonly checked for social bias and toxicity
– If all three annotators differ
» Task managers decide the final label after adjudication
• For toxicity
– Since the problem regarding ‘Intensity’, only (o) and (x) cases need to be reorganized
» Final decision after adjudication
• Failure for decision (unable to majority vote) - discarded
 Annotator agreement (Krippendorff’s alpha): overall moderate
• Bias (binary) – 0.767 (Existence of gender-related bias is relatively explicit)
• Bias (ternary) – 0.492
• Hate (ternary) – 0.496
19

Analysis
• Data Post-processing
 After whole annotation (8,000 instances)
• Commonly checked for social bias and toxicity
– If all three annotators differ
» Task managers decide the final label after adjudication
• For toxicity
– Since the problem regarding ‘Intensity’, only (o) and (x) cases need to be reorganized
» Final decision after adjudication
• Failure for decision (unable to majority vote) - discarded
 Annotator agreement (Krippendorff’s alpha): Overall moderate
• Bias (binary) – 0.767 (Existence of gender-related bias is relatively explicit)
• Bias (ternary) – 0.492
• Hate (ternary) – 0.496
20

Analysis
• Final data
 Data split
• Discarded 659 over 10,000
• Split train/valid/test with the rest
 Data composition
• Test: 974
– Data tagged while constructing the guideline (Most adjusted to the intention of the
guideline)
• Valid: 471
– Data which went through tagging/review/reject and accept in the pilot phase, done
with a large number of annotators (Roughly aligned with the guideline)
• Train: 7,896
– Data which were crowd-sourced with the selected annotators, not reviewed totally
but went through adjudication for some special case
21

Analysis
• Final data
 Characteristics
• Toxic comments possess slightly
larger portion towards None
• For bias, the same does not hold
 Something to remark
• ‘Lots of toxic expressions in celebrity news domain’?
– Though we sampled in the order of downvote, the overall portion does not
necessarily reflect the toxicity of random comments
• ‘Higher portion of toxic comments compared to bias’?
– Though the results tell so, biases are usually implicit and might not have been visible
to the users
» So that they were not accurately reflected to up/downvotes
22

Analysis
• Final data
 Characteristics
• Toxic comments possess slightly
larger portion towards None
• For bias, the same does not hold
 Something to remark
• ‘Lots of toxic expressions in celebrity news domain’?
– Though we sampled in the order of downvote, the overall portion does not
necessarily reflect the toxicity of random comments
• ‘Higher portion of toxic comments compared to bias’?
– Though the results tell so, biases are usually implicit and might not have been visible
to the users
» So that they were not accurately reflected to up/downvotes
23

Analysis
• Final data
 Bias and toxicity
• Toxicity is observed in most texts
with gender-related or other biases
– Gender-related bias?
» 93.76% toxic
– Other biases?
» 90.42% toxic
• In contrast, toxic comments do not necessarily contain biases
 The category of bias and amount of toxicity
• About 1.4 times gender-related bias in `hate’ compared to other biases
– Portion of gender-related bias goes half of other biases in `offensive’
• Maybe largely influenced by our guideline, but still suggests that the amount of
toxicity in celebrity news domain matters a lot with gender-related contents
24

Analysis
• Final data
 Bias and toxicity
• Toxicity is observed in most texts
with gender-related or other biases
– Gender-related bias?
» 93.76% toxic
– Other biases?
» 90.42% toxic
• In contrast, toxic comments do not necessarily contain biases
 The category of bias and amount of toxicity
• About 1.4 times gender-related bias in `hate’ compared to other biases
– Portion of gender-related bias goes half of other biases in `offensive’
• Maybe largely influenced by our guideline, but still suggests that the amount of
toxicity in celebrity news domain matters a lot with gender-related contents
25

Analysis
• Research questions
 RQ1
• How is hate speech displayed
in Korean online comments?
– Social bias and Toxicity
 RQ2
• What characteristics does the
Korean hate speech corpus
incorporate?
– Bias usually accompanies toxicity
– Gender-related bias seems to
accompany more toxic expressions
26

Conclusion
• Discussions on hate speech have diverse viewpoints, from
academia, to social and industry
• Construction of hate speech corpus in Korean links the above
discussions, to be useful in real world hate speech detection
• We observed bias and toxicity in Korean hate speech, which is
weighted to gender-related factors in celebrity news comments
• Our future work includes building up hate speech corpus for
various domain of texts, from formal to colloquial, to deal with the
uncovered cases
27

Conclusion
• Model and data release
 Annotation guideline
• https://www.notion.so/c1ecb7cc52d446cc93d928d172ef8442
 Kaggle competition
• https://www.kaggle.com/c/korean-gender-bias-detection
• https://www.kaggle.com/c/korean-bias-detection/
• https://www.kaggle.com/c/korean-hate-speech-detection/
 Github repository
• https://github.com/kocohub/korean-hate-speech
• For easier data importing
 Koco package
• https://github.com/inmoonlight/koco
– Library to easily access kocohub datasets
– Kocohub contains KOrean COrpus for natural language processing
» https://github.com/kocohub
28

2106 JWLLP

Recommended

Recommended

More Related Content

Similar to 2106 JWLLP

Similar to 2106 JWLLP (20)

More from WarNik Chow

More from WarNik Chow (20)

Recently uploaded

Recently uploaded (20)

2106 JWLLP

Editor's Notes