MKWI 2018 - Discussing the Value of Hate Speech Detection

Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
1
08.03.2018
1
08.03.2018
DISCUSSING THE VALUE OF
AUTOMATIC HATE SPEECH DETECTION
IN ONLINE DEBATES
SEBASTIAN KÖFFER, DENNIS RIEHLE,
STEFFEN HÖHENBERGER, JÖRG BECKER
#HateMining

2
08.03.2018
RESEARCH QUESTION
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?

3
08.03.2018
EASY STUFF:
GETTING ABUSIVE COMMENTS
Platform Articles Comments % Hate
Compact 328 11,764 30
Focus 3,959 75,857 18
Freie Welt 1,944 13,628 41
Junge Freiheit 333 2,745 35
Rheinische Post 991 3,678 28
Welt 1,921 182,625 23
Zeit 5,812 25,792 12
…
Sum 21,740 376,143 27
Source: www.hatemining.de

4
08.03.2018
DIFFICULT STUFF:
FEATURE EXTRACTION AND
SUPERVISED LEARNING
Feature group
Accuracy / F-score
Our paper
Nobata et al.
(2016)
Bag of Words 0.68 / 0.51 0.75 / 0.54
Character 2-grams 0.62 / 0.64 -
Character 3-grams 0.66 / 0.65 0.90 / 0.77
Linguistics 0.57 / 0.53 0.64 / 0.51
Word2Vec 0.67 / 0.67 0.84 / 0.67
Doc2Vec 0.65 / 0.63 0.85/ 0.67
Best approach 0.71 / 0.70 0.90 / 0.78
Source: Nobata et al. (2016) Abusive Language Detection in Online User Content

5
08.03.2018
2016 study
 Binary classification for
Hate / Non-hate
 Hate-Definition from the
Council of Europe as
orientation
 Multiple rating per comment
Results
 27% hate classification
 many discussions
REALLY HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?

6
08.03.2018
2016 study
 Binary classification for
Hate / Non-hate
 Hate definition from the
Council of Europe as
orientation
 Multiple rating per comment
Results
 27% hate classification
 many discussions
HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?
2018 current work
 Multi-label classification for
problematic comments
 No definitions
 Multiple ratings per comment
Results
 68% problematic comments
 17% hate speech
 More information about data
 Still a lot of disagreement

7
08.03.2018
„smartphones von der regierung geschenkt,
bezahlt mit steuergeldern! wer bezahlt die
verträge!wohnung bezahlt von uns!essen bezahlt
von uns!kurse bezahlt von uns!transport bezahlt
von uns!..übrigens hat marocco seine gefängnisse
auf unsere kosten geleert!!...und SIE
VERBRECHEN! TERROR!WOLLEN UNS
VERNICHTEN!SOLLEN SIE endlich IM KANZLERAMT
ANFANGEN!“
EXAMPLES OF THE ENHANCED COMMENT
LABELING PROCEDURE
10 / 10
Problematic
7 hate
3 language
2 insult
3 threat

8
08.03.2018
RESEARCH OBJECTIVE
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?

9
08.03.2018
If technology is part of
the problem, then it
should also be part of
the solution.
OBJECTIVE: BRINGING TOGETHER
POPULAR ARGUMENTS IN THE DEBATE
HOW TO COMBAT HATE SPEECH
The society must be
tolerant towards a broad
spectrum of opinions.
Automatic
deletion/filtering of
comments is excessive
censorship.
Manual deletion/filtering
is not feasible any more.
Traditional media needs
software tools to curate the
wisdom of the debate
We need enhanced media
literacy to better
evaluate debate content.

10
08.03.2018
HOW MANY PERCENT OF USERS WRITE 50
PERCENT OF ALL COMMENTS?
Source: www.hatemining.de

11
08.03.2018
 Large and representative evaluation dataset
 Multi-label classification methods with high
accuracy of prediction estimates
THREE BUILDING BLOCKS OF FUTURE
RESEARCH ON HATE SPEECH DETECTION
Sound
methodology
Algorithmic
transparency
Human
involvement
 Visualize the decision criteria of algorithms,
e.g. subcategories, toxic words, etc.
 Open science paradigm
 Semi-automated approaches necessary for
higher acceptance and adaptability
 Let professional journalists steer the debate

12
08.03.2018
Research project „Cyberhate-Mining“ at the
European Research Center for Information Systems
Web: www.hatemining.de
E-Mail: team@hatemining.de
WE ARE ACTIVELY LOOKING FOR
COOPERATION TO CONTINUE OUR
RESEARCH PROJECT
 Valuable contacts to the media industry
 Money to fund labeling of datasets
 Partners for joint research fund applications
 Joint data generation and analysis
Contact
#HateMining

MKWI 2018 - Discussing the Value of Hate Speech Detection

Recommended

Recommended

More Related Content

Similar to MKWI 2018 - Discussing the Value of Hate Speech Detection

Similar to MKWI 2018 - Discussing the Value of Hate Speech Detection (20)

Recently uploaded

Recently uploaded (20)

MKWI 2018 - Discussing the Value of Hate Speech Detection