[kakaobrain]beep

BEEP! Korean Corpus of Online News
Comments for Toxic Speech Detection
Jihyung Moon*, Won Ik Cho*, Junbum Lee
SocialNLP@ACL2020
KOCO

Introduction
• The topic to be covered:
Korean Corpus of Online News Comments for Toxic Speech Detection
• Main Contribution
• Release the first Korean corpus manually annotated on toxic
attributes
• Provide benchmarks for both bias classification and hate
speech detection
• Boost researches on this topic
2

Introduction
• The topic to be covered:
Korean Corpus of Online News Comments for Toxic Speech Detection
• Main Contribution
• Release the first Korean corpus manually annotated on toxic
attributes
• Set up the annotation guideline
• Provide benchmarks for both bias classification and hate
speech detection
• Boost researches on this topic
3

Motivation
• Recently, Korea had suffered a series of tragic incidents of
two young celebrities that are presumed to be caused by
toxic comments in Entertainment news platform.
• Toxic comments on online platforms have emerged as a
critical social issue.
• Causes mental harm to the target (usually explicit)
• Offends many comment viewers
4

Motivation
• Current Approaches
• Close the comment system in Entertainment news
• (pros) Demolishes all the issues
• (cons) Not the fundamental solution
• Term-based toxic speech detection
• (pros) Easy to apply
• (cons) False predictions / Not all hate speech is detectable using
terms.
• Learning algorithm-based toxic speech detection
• (pros) Better detection ability
• (cons) Require labeled data
5
However, there has been
no Korean corpus yet.

Motivation
• We need to build corpus!
• that can reduce toxic speech in entertainment news comments
• and that are manually labeled
• based on a guideline
• reflecting not only toxic speech features
• but also socio-linguistic features of Korean online texts
6

Data Construction
• Data Collection
• Data Annotation
9

Data Collection
• Comments from the most popular Korean entertainment
news platform
• Most-viewed articles in Jan. 2018 ~ Feb. 2020
• 10,403,368 comments from 23,700 articles
• Sampling and Filtering
• Stratified sample articles and get 1,580 articles
• Top 20 comments in the order of Wilson score on the
downvote for each 1,580 articles
• Filter the duplicates and leave comments having more than
single token and less than 100 characters
à Random sampled 10K comments
10
comment example

Data Annotation
• Overall Process
• Set-up a guideline
• We had looked into 1k sample sentences and made a guideline
• Work with crowd-workers (at least 3 annotators on 1 sentence)
• Pilot study with crowd-workers
• To train and find crowd-workers whose annotation sense is similar
to ours.
• Batch annotation from the selected crowd-workers
• Filter out cases where no majority result exists
11

• 우리가 악성 댓글이라고 생각하는 댓글은 어떤 특징이 있을까?
• 편견
• 혐오
• 모욕
12
Data Annotation
Guideline

• 편견: “이러이러한 사람”은 “이러이러할 것이다 / 이러이러해야 한다.”
• 개인에 대한 판단이 아님
• 개인의 (눈에 보이는) 일부 특성을 그 사람이 속한 집단의 특성과
동일시하여 집단에 대한 생각을 투영시키는 것
• 예) (여자 연예인에게) 가정을 지키고 싶으면 요리배우고 살빼야할듯.
• 이 댓글 작성자의 생각의 이면에는 1) 여성의 가정 속 역할은 상대에게
맛있는 식사를 제공하는 것과 2) 아름다워야 한다는 것이 있음
• 혐오
• 모욕
13
Data Annotation
Guideline

• 편견
• 혐오 ~= 적대감
• 편견과 동반된 경우, 개인이 속한 집단에 대한 혐오가 투영됨
• 예) 한남은 믿거해야쥬
• 예) 동성애가 무슨 자랑도 아니고 역겹네 XXXX들
• 합리적인 비판과 거리가 있음.
• 예) OOO은 빼라..목소리 자체가 짜증이다
• Profanity terms 로도 표출될 수 있으나 단순히 이 용어가 등장한다고
해서 혐오가 느껴지는 것은 아님
• 예) ㅅㅂㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋㅋ
• 모욕
14
Data Annotation
Guideline

• 편견
• 혐오
• 모욕: 대상이 모욕감 혹은 수치감을 느낄 수 있는 발언
• 대상의 사회적 평가를 저하시킬 만한 추상적 판단이나 경멸적 감정을
표현하는 것 1
• 예) 저뮤탈리스크같이 생긴애는 왜자꾸나옴?ㅎ
• 혐오가 동반되지 않을 수 있음
• 예) 스타킹 벗겨서 발가락빨구시퍼용.
• 사실 적시를 통한 모욕도 포함
• 예) ㅋ 접대와 조작의 아이콘 아이즈원 엑스원
151) http://www.law.go.kr/%ED%8C%90%EB%A1%80/(2015%EB%8F%842229)
Data Annotation
Guideline

• 편견
• 분류의 문제에 가까움
• 성에 대한 편견인가?
• 지역에 대한 편견인가?
• 인종에 대한 편견인가?
• …
• 혐오 및 모욕
• 정도의 문제에 가까움
• 성희롱 혹은 성적 대상화 외의 모욕적 표현은 혐오와 깊이 연관
16
Data Annotation
Guideline

• 편견
• Gender-related bias
• Other biases
• None
• 혐오 및 모욕
• Severe hate or insulting
• Not hateful but offensive or sarcastic
• None
17
Data Annotation
Guideline

• 가이드라인 세부 내용
• 예외 케이스
• 댓글의 내용이 미완성이거나 불분명하여 이해되지 않는 경우
• 예) 이거 보는 주 시청자가
• 예) 귄있는 스타일
• 댓글의 내용만으로는 판단되지 않는 경우
• 예) 역시.. 왠지 그럴 것 같았음
• 위 댓글이 열애 기사에서 등장한 경우와 미투 기사에서 등장한 경우
판단 결과가 다름
• 한국어 외의 언어로 작성된 경우
18
Data Annotation
Guideline

• 편견: Gender-related bias
19
Data Annotation
Guideline

• 편견: Other biases
20
Data Annotation
Guideline

• 편견: Other biases
21
Data Annotation
Guideline

• 혐오 및 모욕: Severe hate or insulting
22
Data Annotation
Guideline

• 혐오 및 모욕: Severe hate or insulting
23
Data Annotation
Guideline

• 혐오 및 모욕: Not hateful but offensive or sarcastic
24
Data Annotation
Guideline

• 혐오 및 모욕: None
25
Data Annotation
Guideline

• 예시 문항
26
Data Annotation
Guideline

• 예시 문항
27
Data Annotation
Guideline

• 예시 문항
28
Data Annotation
Guideline

Data Annotation
Crowdsource
• Pilot study
• 목표: annotation sense 가 비슷한 작업자 선정 및 작업자 훈련
• 어떤 가이드라인도 완벽하게 모든 케이스를 커버할 수 없음
• 가이드라인에 기술되지 않은 케이스에 대해서 우리가 한 것과 유사한
라벨을 달 수 있는 작업자가 필요
• 플랫폼 상에서 피드백을 주면서 반려할 수 있었음
• 어려운 문항에 대해 좋은 라벨을 달아준 작업자에게 가중치 부여
• 피드백 반영 속도가 빠른 작업자에게 가중치 부여
• 이 과정에서 성 감수성, 혐오 감수성이 다른 작업자들 탈락
29

• Batch annotation
• Pilot study 에서 선정된 작업자들에게만 할당
• 8,000개의 문장을 2명이서 전수 리뷰할 수 없으므로 작업자들의
결과물을 믿고 accept
• Filtering
• 3명의 검수자 의견이 모두 다른 경우
• 검수자 의견이 크게 다른 경우 (혐오 존재 vs. 혐오 존재 x)
• 659 문장이 제거됨
30
Data Annotation
Crowdsource & Filtering

Data Statistics
• Data composition
• Train: 7,896
• Batch annotation 결과
• Validation: 471
• Pilot study 에서 직접 태깅 결과를 보고 반려/승인 절차를 거친 결과
• Test: 974
• 가이드라인 작성자들의 의도가 가장 잘 반영된 데이터
32

Data Statistics
• Inter-annotator agreement (Krippendorff’s alpha)
• Gender-related Bias (Gender / Others ∪ None) : 0.767
• Bias (Gender / Others / None) : 0.492
• Hate (Hate / Offensive / None) : 0.496
• Label distribution
33

Benchmark Results
• CharCNN vs. BiLSTM vs. BERT
• Models trained on our corpus > Term matching method
• BERT shows the bect performance for all tasks
• Pretrained model’s linguistic knowledge and semantic
information help.
• Bias is useful information to catch the hate speech.
34

Benchmark Results
• CharCNN vs. BiLSTM vs. BERT
• Bias is useful information to catch the hate speech.
35

Release and Competitions
• Github repository (Dataset and Guideline)
• https://github.com/kocohub/korean-hate-speech
• Kaggle competition
• https://www.kaggle.com/c/korean-gender-bias-detection
• https://www.kaggle.com/c/korean-bias-detection/
• https://www.kaggle.com/c/korean-hate-speech-detection/
• Used for an official government AI hackathon (6/17~6/30)
• http://www.aichallenge.or.kr/task/detail.do?taskId=T000018
36

Data Import Package
• For more convenient data importing, we released koco
• https://github.com/inmoonlight/koco
37

Takeaways
• 아직도 “혐오란 무엇인가?”에 대해 똑부러진 답을 하긴 어렵다.
• 글을 쓰는 사람의 (합리적 비판이라고 주장하는) 표현의 자유와
글을 보는 사람의 불편한 감정의 경계 어디쯤에 있는 것 같다.
• 혐오의 기준은 데이터의 활용 목적에 따라 달라져야 할 것이다.
• 우리는 작성자의 표현의 자유를 고려하기보다, 작성된 글을 읽을 사람과 대상의
입장에서 기준을 세웠다.
• 하지만 회사라면 사용자의 표현의 자유를 많이 고려해야 할 것이다.
• 댓글은 너무나 창의적이다.
• 기사의 내용만이 컨텍스트가 아니다. 댓글 작성 시기의 사회적 맥락과
연예인에 대한 메타 정보 또한 중요하다.
• 예) 살빠진 승리같애
38

Acknowledgment
39
Motivation
Pilot studyGuideline
Writing
Modeling
Github
Management
Kaggle
Funding
Proofreading

[kakaobrain]beep

Recommended

Recommended

More Related Content

Featured

Featured (20)

[kakaobrain]beep