Slides from our paper presentation at MKWI 2018 in Lüneburg. This study discusses the potential value of automatic analytics of German texts to detect hate speech. The paper discusses the results with respect to the potential for media organizations and considerations about moderation techniques and algorithmic transparency. More information on www.hatemining.de. Paper can be accessed via http://mkwi2018.leuphana.de/programm/sessions/
MKWI 2018 - Discussing the Value of Hate Speech Detection
1. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
1
08.03.2018
1
08.03.2018
DISCUSSING THE VALUE OF
AUTOMATIC HATE SPEECH DETECTION
IN ONLINE DEBATES
SEBASTIAN KÖFFER, DENNIS RIEHLE,
STEFFEN HÖHENBERGER, JÖRG BECKER
#HateMining
2. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
2
08.03.2018
RESEARCH QUESTION
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?
3. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
3
08.03.2018
EASY STUFF:
GETTING ABUSIVE COMMENTS
Platform Articles Comments % Hate
Compact 328 11,764 30
Focus 3,959 75,857 18
Freie Welt 1,944 13,628 41
Junge Freiheit 333 2,745 35
Rheinische Post 991 3,678 28
Welt 1,921 182,625 23
Zeit 5,812 25,792 12
…
Sum 21,740 376,143 27
Source: www.hatemining.de
4. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
4
08.03.2018
DIFFICULT STUFF:
FEATURE EXTRACTION AND
SUPERVISED LEARNING
Feature group
Accuracy / F-score
Our paper
Nobata et al.
(2016)
Bag of Words 0.68 / 0.51 0.75 / 0.54
Character 2-grams 0.62 / 0.64 -
Character 3-grams 0.66 / 0.65 0.90 / 0.77
Linguistics 0.57 / 0.53 0.64 / 0.51
Word2Vec 0.67 / 0.67 0.84 / 0.67
Doc2Vec 0.65 / 0.63 0.85/ 0.67
Best approach 0.71 / 0.70 0.90 / 0.78
Source: Nobata et al. (2016) Abusive Language Detection in Online User Content
5. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
5
08.03.2018
2016 study
Binary classification for
Hate / Non-hate
Hate-Definition from the
Council of Europe as
orientation
Multiple rating per comment
Results
27% hate classification
many discussions
REALLY HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?
6. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
6
08.03.2018
2016 study
Binary classification for
Hate / Non-hate
Hate definition from the
Council of Europe as
orientation
Multiple rating per comment
Results
27% hate classification
many discussions
HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?
2018 current work
Multi-label classification for
problematic comments
No definitions
Multiple ratings per comment
Results
68% problematic comments
17% hate speech
More information about data
Still a lot of disagreement
7. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
7
08.03.2018
„smartphones von der regierung geschenkt,
bezahlt mit steuergeldern! wer bezahlt die
verträge!wohnung bezahlt von uns!essen bezahlt
von uns!kurse bezahlt von uns!transport bezahlt
von uns!..übrigens hat marocco seine gefängnisse
auf unsere kosten geleert!!...und SIE
VERBRECHEN! TERROR!WOLLEN UNS
VERNICHTEN!SOLLEN SIE endlich IM KANZLERAMT
ANFANGEN!“
EXAMPLES OF THE ENHANCED COMMENT
LABELING PROCEDURE
10 / 10
Problematic
7 hate
3 language
2 insult
3 threat
8. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
8
08.03.2018
RESEARCH OBJECTIVE
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?
9. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
9
08.03.2018
If technology is part of
the problem, then it
should also be part of
the solution.
OBJECTIVE: BRINGING TOGETHER
POPULAR ARGUMENTS IN THE DEBATE
HOW TO COMBAT HATE SPEECH
The society must be
tolerant towards a broad
spectrum of opinions.
Automatic
deletion/filtering of
comments is excessive
censorship.
Manual deletion/filtering
is not feasible any more.
Traditional media needs
software tools to curate the
wisdom of the debate
We need enhanced media
literacy to better
evaluate debate content.
10. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
10
08.03.2018
HOW MANY PERCENT OF USERS WRITE 50
PERCENT OF ALL COMMENTS?
Source: www.hatemining.de
11. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
11
08.03.2018
Large and representative evaluation dataset
Multi-label classification methods with high
accuracy of prediction estimates
THREE BUILDING BLOCKS OF FUTURE
RESEARCH ON HATE SPEECH DETECTION
Sound
methodology
Algorithmic
transparency
Human
involvement
Visualize the decision criteria of algorithms,
e.g. subcategories, toxic words, etc.
Open science paradigm
Semi-automated approaches necessary for
higher acceptance and adaptability
Let professional journalists steer the debate
12. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
12
08.03.2018
Research project „Cyberhate-Mining“ at the
European Research Center for Information Systems
Web: www.hatemining.de
E-Mail: team@hatemining.de
WE ARE ACTIVELY LOOKING FOR
COOPERATION TO CONTINUE OUR
RESEARCH PROJECT
Valuable contacts to the media industry
Money to fund labeling of datasets
Partners for joint research fund applications
Joint data generation and analysis
Contact
#HateMining