2206 FAccT_inperson

Building a Dataset to Measure
Toxicity and Social Bias within Language:
A Low-Resource Perspective
Won Ik Cho (SNU ECE)
2022. 6. 22 @FAccT, Seoul, Korea

Introduction
• CHO, Won Ik (조원익)
 B.S. in EE/Mathematics (SNU, ’10~’14)
 Ph.D. student (SNU ECE, ’14~)
• Academic interests
 Built Korean NLP datasets on various
spoken language understanding areas
 Currently interested in computational
approaches of:
• Dialogue analysis
• AI for social good
1

Contents
• Introduction
• Hate speech in real and cyber spaces
 What is hate speech and why does it matter?
 Study on hate speech detection
• In English – Dataset and analysis
• Notable approaches in other languages
• Low-resource perspective: Creating a hate speech corpus from
scratch
 Analysis on existing language resources
 Hate speech as bias detection and toxicity measurement
 Building a guideline for data annotation
 Worker pilot, crowdsourcing, and agreement
• Challenges of hate speech corpus construction
• Conclusion
2

Contents
Caution! This presenation may contain contents that can be offensive to
certain groups of people, such as gender bias, racism, or other
unethical contents including multimodal materials
3

Contents
• Handled in this tutorial
 How to build up a hate speech detection dataset in a specific setting
(language, text domain, etc.)
 How to check the validity of the created hate speech corpus
• Less handled in this tutorial
 Comprehensive definition of hate speech and social bias in the literature
 Reliability of specific ethical guideline for hate speech corpus construction
4

Hate speech in real and cyber spaces
• What is hate speech and why does it matter?
 Difficulty of defining hate speech
• Political and legal term, and not just a theoretical term
• Has no unified/universal definition accepted to all
• Definition differs upon language, culture, domain, discipline, etc.
 Definition given by United Nations
• “Any kind of communication in speech, writing or behaviour, that attacks or
uses pejorative or discriminatory language with reference to a person or a
group on the basis of who they are, in other words, based on their religion,
ethnicity, nationality, race, colour, descent, gender or other identity factor.”
– Not a legal definition
– Broader than the notion of “incitement to discrimination, hostility or violence”
prohibited under international human rights law
5
https://www.un.org/en/hate-speech/understanding-hate-speech/what-is-hate-speech?

• What is hate speech and why does it matter?
 Hate speech in cyber spaces
• Definition is deductive, but its detection is inductive
• Hate speech appears online as various expressions, including:
– Offensive language
– Pejorative expressions
– Discriminative words
– Profanity terms
– Insulting ... etc.
• Whether to include specific terms or expressions in the category of `hate
speech’ is a tricky issue
– What if pejorative expression or profanity term does not target any group or
individuals?
– What if a (sexual) harrassment is considered offensive to readers but not for the
target figure?
6

• Discussion on hate speech detection
 Studies for English
• Waseem and Hovy (2016)
– Annotates tweets upon around 10 features that make the post offensive
7
A tweet is offensive if it
1. uses a sexist or racial slur.
2. attacks a minority.
3. seeks to silence a minority.
4. criticizes a minority (without a well founded argument).
5. promotes, but does not directly use, hate speech or violent crime.
6. criticizes a minority and uses a straw man argument.
7. blatantly misrepresents truth or seeks to distort views on a minority with unfounded claims.
8. shows support of problematic hash tags. E.g. “#BanIslam”, “#whoriental”, “#whitegenocide”
9. negatively stereotypes a minority.
10. defends xenophobia or sexism.
11. contains a screen name that is offensive, as per the previous criteria, the tweet is
ambiguous (at best), and the tweet is on a topic that satisfies any of the above criteria
Waseem and Hovy, Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter, 2016.

 Studies for English
• Davidson et al. (2017)
– Mentions the discrepancy between the theoretical definition and real world
expressions of hate speech
– Puts `offensive’ expressions in between `hate’ and `non-hate’, to incorporate the
expressions that are in the grey area
– Incorporate profanity terms used
prevalent in social media, which
does not necessarily targets minority
but induces offensiveness
8
Davidson et al., Automated Hate Speech Detection and the Problem of Offensive Language, 2017.

 Notable approaches in other languages
• Sanguinetti et al. (2018)
– Investigates hate speech for the posts on Italian immigrants
– Beyond hate speech, tags if the post is offensive, aggressive, intensive, has irony and
sarcasm, shows stereotype
– `Stereotype’ as a factor that can be a clue to discrimination
9
Sanguinetti et al., An Italian Twitter Corpus of Hate Speech against Immigrants, 2018.
• hate speech: no - yes
• aggressiveness: no - weak – strong
• offensiveness: no - weak - strong
• irony: no - yes
• stereotype: no - yes
• intensity: 0 - 1 - 2 - 3 - 4

• Assimakopoulos et al. (2020)
– Motivated by the critical analysis of posts made in reaction to news reports on the
Mediterranean migration crisis and LGBTIQ+ matters in Malta
– Annotates Malta web texts
– Investigates the attitude (positive/negative) of the text, and asks for target if negative,
also asking the way the negativeness is conveyed
10
1. Does the post communicate a positive, negative or neutral attitude? [Positive / Negative / Neutral]
2. If negative, who does this attitude target? [Individual / Group]
• (a) If it targets an individual, does it do so because of the individual’s affiliation to a group? [Yes / No]
If yes, name the group.
• (b) If it targets a group, name the group.
3. How is the attitude expressed in relation to the target group? Select all that apply.
[ Derogatory term / Generalisation / Insult / Sarcasm (including jokes and trolling) / Stereotyping /
Suggestion / Threat ]
4. If the post involves a suggestion, is it a suggestion that calls for violence against the target group? [Yes / No]
Assimakopoulos et al., Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis, 2020.

• Moon et al. (2020)
– Annotation on Korean celebrity news comments
– Investigate the existence of social bias and the degree of toxicity
» Social bias – Gender-related bias and other biases
» Toxicity – Hate/Offensive/None (following Davidson et al. 2017)
11
Moon et al., BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection, 2020.
Detecting social bias
• Is there a gender-related bias, either explicit or implicit, in the text?
• Are there any other kinds of bias in the text?
• A comment that does not incorporate the bias
Measuring toxicity
• Is strong hate or insulting towards the article’s target or related
figures, writers of the article or comments, etc. displayed in a
comment?
• Although a comment is not as much hateful or insulting as the
above, does it make the target or the reader feel offended?
• A comment that does not incorporate any hatred or insulting

Low-resource perspective
• Creating a hate speech corpus from scratch
 ASSUMPTION: There is no manually created hate speech detection corpus
so far for the Korean language (was true before July 2020...)
• Generally, clear motivation is required for hate speech corpus construction
– Why?
» Takes resources (time and money)
» Potential mental harm
» Potential attack towards the researchers
– Nonetheless, it is required in some circumstances
» Detecting offensive language in services
» Severe harm has been displayed publicly
12

 Step 1: Is there anything available?
Analysis on existing language resources
• Language resources on hate speech detection regards various other similar
datasets (though slightly different in definition and goal)
– Dictionary of profanity terms (e.g., hatebase.org)
– Sarcasm detection dataset
– Sentiment analysis dataset
– Offensive language detection dataset
• Why we should search existing resources?
– To lessen the consumption of time and money
– To make the problem easier by building upon existing dataset
– To confirm what we should aim by creating a new dataset
13

• Dictionary of profanity terms
– e.g., https://github.com/doublems/korean-bad-words
• Sarcasm detection dataset
– e.g, https://github.com/SpellOnYou/korean-sarcasm
• Sentiment analysis dataset
– e.g., https://github.com/e9t/nsmc
 The datasets may not completely overlap with hate speech corpus, but at
least they can be a good source of annotation 
• Here, one should think of:
– Text style
– Text domain
– Appearing types of toxicity and bias
14

• Text style
– Written/spoken/web text?
• Text domain
– News/wiki/tweets/chat/comments?
• Appearing types of toxicity and bias
– Gender-related?
– Politics/religion?
– Region/nationality/ethnicity?
• Appearing amount of toxicity and bias
15

• Data collection example (BEEP!)
– Comments from the most popular Korean entertainment news platform
» Jan. 2018 ~ Feb. 2020
» 10,403,368 comments from 23,700 articles
» 1,580 articles acquired by stratified sampling
» Top 20 comments in the order of Wilson score on the downvote for each article
– Filter the duplicates and leave comments having more than single token and less
than 100 characters
– 10K comments were selected
• Data sampling process matters much in the final distribution of the dataset!
16

 Step 2: What should we define first?
Hate speech as bias detection and toxicity measurement
• Local definition of hate speech discussed by Korean sociolinguistics society
– Definition of hate speech
» Expressions that discriminate/hate or incite discrimination/hatred/violence
towards some individual or group of people because they have characteristics
as a social minority
– Types of hate speech
» Discriminative bullying
» Discrimination
» Public insult/threatening
» Inciting hatred
17
Hong et al., Study on the State and Regulation of Hate Speech, 2016.

• Set up criteria
– Analyze ‘Discriminate/hate or incite discrimination/hatred/violence’ as a combination
of ‘Social bias’ and ’Toxicity’
– Further discussion required on social minority
» `Gender, age, profession, religion, nationality, skin color, political stance’ and all
other factors that comprises one’s identity
» Criteria for social minority vs. Who will be acknowledged as social minority
18

• Set up criteria for bias detection
– `People with a specific characteristic may behave in some way’
– Differs from the judgment
» Gender-related bias
» Other biases
» None
19
Cho and Moon, How Does the Hate Speech Corpus Concern Sociolinguistic Discussions? A Case Study on Korean Online News Comments, 2021.

• Set up criteria for toxicity measurement
– Hate
» Hostility towards a specific group or individual
» Can be represented by some profanity terms, but terms do not imply hate
– Insult
» Expressions that can harm the prestige of individuals or group
» Various profanity terms are included
– Offensive expressions
» Does not count as hate or insult, but may make the readers offensive
» Includes sarcasm, irony, bad guessing, unethical expressions
20

• Set up criteria for toxicity measurement
» Severe hate or insult
» Not hateful but offensive or sarcastic
» None
21
Cho and Moon, How Does the Hate Speech Corpus Concern Sociolinguistic Discussions? A Case Study on Korean Online News Comments, 2021.

 Step 3: What is required for the annotation?
Building a guideline for data annotation
• Stakeholders
– Researchers
– Moderators (crowdsourcing platform)
– Workers
• How is guideline used as?
– Setting up research direction (for researchers)
– Task understanding (for moderators)
– Data annotation (for workers)
22

• Guideline is not built at once!
– Usual process
» Making up draft guideline based on source corpus
» Pilot study of researchers & guideline update (𝑁 times iteration)
» Moderators’ and researchers’ alignment on the guideline
» Worker recruitment & pilot tagging
» Guideline update with worker feedback (cautions & exceptions)
» Final guideline (for main annotation)
23

• Draft guideline
– Built based upon a small portion of source corpus (about hundreds of instances)
– Researchers’ intuition is highly involved in
– Concept-based description
» e.g., for `bias’,
`People with a specific characteristic may behave in some way’
(instead of listing up all stereotyped expressions)
• Pilot study
– Researchers’ tagging on slightly larger portion of source corpus (~1K instances)
– Fitting researchers’ intuition on the proposed concepts
» e.g., ``Does this expression contain bias or toxicity?’’
(discussion is important, but don’t fight!)
– Update descriptions or add examples
– Labeling, re-labeling, re-re-labeling...
24

• Pilot study
– Labeling, re-labeling, re-re-labeling... + Agreement?
– Inter-annotator agreement (IAA)
» Calculating the reliability of annotation
» Cohen’s Kappa for two annotators
» Fleiss’ Kappa for more than two annotators
– Sufficiently high agreement? (> 0.6?)
» Let’s go annotating in the wild!
25
Pustejovsky and Stubbs, Natural Language Annotation, 2012.

 Step 4: How is the annotation process conducted and evaluated?
Worker pilot, crowdsourcing, and agreement
• Finding a crowdsourcing platform
– Moderator
» Usually an expert in data creation and management
» Comprehends the task, gives feedback in view of workers
» Helps communication between researchers and workers
» Instructs, and sometimes hurries workers to meet the timeline
» Manages financial or legal issues
» Let researchers concentrate on the task itself
– Without moderator?
» Researchers are the moderator!
(Unless there are some automated functions in the platform)
– With moderator?
» The closest partner of researchers
26

– Existence and experience of the moderator
» Experience of similar dataset construction
» Comprehension of the task & proper feedbacks
» Sufficient worker pool
» Trust between the moderator and workers
– Reasonable cost estimation
» Appropriateness of price per tagging or reviewing
» Appropriateness of worker compensation
» Fit with the budget
27

– Usefulness of the platform UI
» Progress status (In progress, Submitted, Waiting for reviewing... etc.)
» Statistics: The number of workers and reviewers, Average work/review duration...
» Demographics, Worker history by individuals & in total...
28

• Pilot tagging (by workers)
– Goal of worker pilot
» Guideline update in workers’ view (especially on cautions & exceptions)
» Worker selection
– Procedure
» Advertisement or recruitment
» Worker tagging
» Researchers’ (or moderators’) review & rejection
» Workers’ revise & resubmit
29

• Details on worker selection process?
– Human checking
» Ethical standard not too far from the guideline?
» Is feedback effective for the rejected samples?
– Automatic checking
» Enough taggings done?
» Too frequent cases of skipping the annotation?
30
UI screenshots provided by Deep Natural AI.

• Crowdsourcing: A simplified version is required for crowd annotation!
– Multi-class, multi-attribute tagging
» 3 classes for bias
» 3 classes for toxicity
– Given a comment (without context), the annotator should tag each attribute
– Detailed guideline (with examples, cautions, and exceptions) is provided separately
31
1. What kind of bias does the comment contain?
- Gender bias, Other biases, or None
2. Which is the adequate category for the comment in terms of toxicity?
- Hate, Offensive, or None

• Main annotation
– Based on the final version of the guideline
» 3~5 annotators (per sample) for usual classification tasks
– Tagging done by selected workers
» Worker selection and education
» Short quiz (if workers are not selected)
– Annotation toolkit
» Assign samples randomly to workers, with multiple annotators per sample
» Interface developed or provided by the platform (usually takes budget)
» Open-source interfaces (e.g., Labelstudio)
– Data check for further guarantee of quality
» If sufficiently many annotators per sample?
» If not...?
32

• Data selection after main annotation (8,000 samples)
– Data reviewing strategy may differ by subtask
– Researchers decide the final label after adjudication
– Common for bias and toxicity
» Cases with all three annotators differ
– Only for toxicity
» Since the problem regards the continuum of degree,
cases with only hate (o) and none (x) need to be investigated again
– Failure for decision (unable to majority vote) – discarded
33

• Final decision
– Test: 974
» Data tagged while constructing the guideline
(Mostly adjust to the intention of the guideline)
– Validation: 471
» Data which went through tag/review/reject
and accept in the pilot phase,
done with a large number of annotators
(Roughly aligned with the guideline)
– Train: 7,896
» Data which were crowd-sourced with the
selected workers, not reviewed totally but
went through adjudication only for some special cases
• Agreement
– 0.492 for bias detection, 0.496 for toxicity measurement
34
Moon et al., BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection, 2020.

Beyond creation - Model training and deployment
• Model training
– Traditionally
» High performance – relatively easy?
» Low performance – relatively challenging?
– But in PLM-based training these days...
» Pretraining corpora
» Model size
» Model architecture
– Model deployment
» Performance & size
» User feedbacks
35
Yang, Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress, 2021.

Challenges
 Context-dependency
• News comment – articles
• Tweets – threads
• Web community comments – posts
 Multi-modal or noisy inputs
• Image and audio
– Kiela et al. (2020)
- Hateful memes challenge
• Perturbated texts
– Cho and Kim
(2021)
- Leetcodes
- Yaminjeongeum
36
Kiela et al., The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes, 2020.
Cho and Kim, Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated Text, 2021.

Challenges
 Categorical or binary output has limitation
• Limitation of categorizing the degree of intensity
– Hate/offensive/none categorization is sub-optimal
– Polleto et al. (2019)
Scale-based annotation:
Unbalanced Rating Scale
» Used to determine the label
(or used as a target score?)
37
Poletto et al., Annotating Hate Speech: Three Schemes at Comparison, 2019.

Challenges
 Annotation requires multiple label
• Aspect of discrimination may differ by attributes
– Gender, Race, Nationality, Ageism ...
• Tagging `all the target attributes’ that appear?
– Kang et al. (2022)
» Detailed guideline with terms and concepts defined for each atttribute
38
Women
& family
Male Sexual
minorities
Race &
nationality
Ageism Regionalism Religion Other Malicious None
S1 1 0 0 0 1 0 0 0 0 0
S2 0 0 0 0 0 0 0 0 1 0
S3 0 0 0 1 0 0 1 0 0 0
S4 0 0 0 0 0 0 0 0 0 1
Kang et al., Korean Online Hate Speech Dataset for Multilabel Classification - How Can Social Science Improve Dataset on Hate Speech?, 2022.

Challenges
 Privacy and license issues
• Privacy and license can be violated with the text crawling
• Hate speech corpus may contain personal information on (public) figures
• Text could have been brought from elsewhere (copy & paste)
 How about creating hate (and non-hate) speech from scratch?
• Yang et al. (2022): Recruit workers and enable `anonymous’ text generation!
39
Yang et al., APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets, 2022.

Challenges
• Ambiguity is inevitable
 Text may incorporate various ways of interpretation
• Text accompanies omission or replacement to trick the monitoring
• Intention is apparent considering the context
• Temporal diachronicity of hate speech
 Non-hate speech in the past can be interpreted as hate speech these days
 Diachronicity may deter the utility of prediction systems
• e.g., [a name of celebrity who commited crime] before 20xx / after 20xx
• Boundary of hate speech and freedom of speech
 Grey area that cannot be resolved
• Some readers are offended by false positives
• Some users are offended by false negatives
40

Conclusion
• Hate speech prevalent in real and cyber spaces
 Discussions on hate speech have diverse viewpoints, from academia to
society and industry – and they are reflected to the dataset construction
• No corpus is built perfect from the beginning
 ... and hate speech is one of the most difficult kind of corpus to create
• Considerations in low-resource hate speech corpus construction
 Why? How? How much? How well?
• Still more challenges left
 Context, input noise, output format, indecisiveness ...
• Takeaways
 There is discrepancy between theoretical and practical definition of hate
speech, and their aim may differ
 There is no hate speech detection guideline that satisfies ALL, so let’s find
the boundary that satisfies the most and improve it
41

Reference
• Waseem and Hovy, Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter, 2016.
• Davidson et al., Automated Hate Speech Detection and the Problem of Offensive Language, 2017.
• Sanguinetti et al., An Italian Twitter Corpus of Hate Speech against Immigrants, 2018.
• Assimakopoulos et al., Annotating for Hate Speech: The MaNeCo Corpus and Some Input from Critical Discourse Analysis, 2020.
• Moon et al., BEEP! Korean Corpus of Online News Comments for Toxic Speech Detection, 2020.
• Hong et al., Study on the State and Regulation of Hate Speech, 2016.
• Cho and Moon, How Does the Hate Speech Corpus Concern Sociolinguistic Discussions? A Case Study on Korean Online News
Comments, 2021.
• Pustejovsky and Stubbs, Natural Language Annotation, 2012.
• Yang, Transformer-based Korean Pretrained Language Models: A Survey on Three Years of Progress, 2021.
• Kiela et al., The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes, 2020.
• Cho and Kim, Google-trickers, Yaminjeongeum, and Leetspeak: An Empirical Taxonomy for Intentionally Noisy User-Generated
Text, 2021.
• Poletto et al., Annotating Hate Speech: Three Schemes at Comparison, 2019.
• Kang et al., Korean Online Hate Speech Dataset for Multilabel Classification - How Can Social Science Improve Dataset on Hate
Speech?, 2022.
• Yang et al., APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets, 2022.
42

2206 FAccT_inperson

Recommended

Recommended

More Related Content

Similar to 2206 FAccT_inperson

Similar to 2206 FAccT_inperson (20)

More from WarNik Chow

More from WarNik Chow (20)

Recently uploaded

Recently uploaded (20)

2206 FAccT_inperson