SlideShare a Scribd company logo
1 of 12
Download to read offline
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
1
08.03.2018
1
08.03.2018
DISCUSSING THE VALUE OF
AUTOMATIC HATE SPEECH DETECTION
IN ONLINE DEBATES
SEBASTIAN KÖFFER, DENNIS RIEHLE,
STEFFEN HÖHENBERGER, JÖRG BECKER
#HateMining
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
2
08.03.2018
RESEARCH QUESTION
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
3
08.03.2018
EASY STUFF:
GETTING ABUSIVE COMMENTS
Platform Articles Comments % Hate
Compact 328 11,764 30
Focus 3,959 75,857 18
Freie Welt 1,944 13,628 41
Junge Freiheit 333 2,745 35
Rheinische Post 991 3,678 28
Welt 1,921 182,625 23
Zeit 5,812 25,792 12
…
Sum 21,740 376,143 27
Source: www.hatemining.de
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
4
08.03.2018
DIFFICULT STUFF:
FEATURE EXTRACTION AND
SUPERVISED LEARNING
Feature group
Accuracy / F-score
Our paper
Nobata et al.
(2016)
Bag of Words 0.68 / 0.51 0.75 / 0.54
Character 2-grams 0.62 / 0.64 -
Character 3-grams 0.66 / 0.65 0.90 / 0.77
Linguistics 0.57 / 0.53 0.64 / 0.51
Word2Vec 0.67 / 0.67 0.84 / 0.67
Doc2Vec 0.65 / 0.63 0.85/ 0.67
Best approach 0.71 / 0.70 0.90 / 0.78
Source: Nobata et al. (2016) Abusive Language Detection in Online User Content
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
5
08.03.2018
2016 study
 Binary classification for
Hate / Non-hate
 Hate-Definition from the
Council of Europe as
orientation
 Multiple rating per comment
Results
 27% hate classification
 many discussions
REALLY HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
6
08.03.2018
2016 study
 Binary classification for
Hate / Non-hate
 Hate definition from the
Council of Europe as
orientation
 Multiple rating per comment
Results
 27% hate classification
 many discussions
HARD STUFF:
HOW TO GET LABELED COMMENT DATA?
HOW TO LABEL THE COMMENTS?
2018 current work
 Multi-label classification for
problematic comments
 No definitions
 Multiple ratings per comment
Results
 68% problematic comments
 17% hate speech
 More information about data
 Still a lot of disagreement
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
7
08.03.2018
„smartphones von der regierung geschenkt,
bezahlt mit steuergeldern! wer bezahlt die
verträge!wohnung bezahlt von uns!essen bezahlt
von uns!kurse bezahlt von uns!transport bezahlt
von uns!..übrigens hat marocco seine gefängnisse
auf unsere kosten geleert!!...und SIE
VERBRECHEN! TERROR!WOLLEN UNS
VERNICHTEN!SOLLEN SIE endlich IM KANZLERAMT
ANFANGEN!“
EXAMPLES OF THE ENHANCED COMMENT
LABELING PROCEDURE
10 / 10
Problematic
7 hate
3 language
2 insult
3 threat
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
8
08.03.2018
RESEARCH OBJECTIVE
Can we use analytics
to reduce hate
speech on the web
1)Yes, but it‘s
complicated.
2) If so, should
we do this?
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
9
08.03.2018
If technology is part of
the problem, then it
should also be part of
the solution.
OBJECTIVE: BRINGING TOGETHER
POPULAR ARGUMENTS IN THE DEBATE
HOW TO COMBAT HATE SPEECH
The society must be
tolerant towards a broad
spectrum of opinions.
Automatic
deletion/filtering of
comments is excessive
censorship.
Manual deletion/filtering
is not feasible any more.
Traditional media needs
software tools to curate the
wisdom of the debate
We need enhanced media
literacy to better
evaluate debate content.
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
10
08.03.2018
HOW MANY PERCENT OF USERS WRITE 50
PERCENT OF ALL COMMENTS?
Source: www.hatemining.de
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
11
08.03.2018
 Large and representative evaluation dataset
 Multi-label classification methods with high
accuracy of prediction estimates
THREE BUILDING BLOCKS OF FUTURE
RESEARCH ON HATE SPEECH DETECTION
Sound
methodology
Algorithmic
transparency
Human
involvement
 Visualize the decision criteria of algorithms,
e.g. subcategories, toxic words, etc.
 Open science paradigm
 Semi-automated approaches necessary for
higher acceptance and adaptability
 Let professional journalists steer the debate
Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates
Multikonferenz Wirtschaftsinformatik, Lüneburg
12
08.03.2018
Research project „Cyberhate-Mining“ at the
European Research Center for Information Systems
Web: www.hatemining.de
E-Mail: team@hatemining.de
WE ARE ACTIVELY LOOKING FOR
COOPERATION TO CONTINUE OUR
RESEARCH PROJECT
 Valuable contacts to the media industry
 Money to fund labeling of datasets
 Partners for joint research fund applications
 Joint data generation and analysis
Contact
#HateMining

More Related Content

Similar to MKWI 2018 - Discussing the Value of Hate Speech Detection

Family Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationFamily Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationePractice.eu
 
Data Science: 2018 Media & Influencer Analysis
Data Science: 2018 Media & Influencer AnalysisData Science: 2018 Media & Influencer Analysis
Data Science: 2018 Media & Influencer AnalysisZeno Group
 
Attentio - Simon McDermott
Attentio - Simon McDermottAttentio - Simon McDermott
Attentio - Simon McDermottInfluence People
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
 
Got the message? Communication behaviour in Germany
Got the message? Communication behaviour in GermanyGot the message? Communication behaviour in Germany
Got the message? Communication behaviour in GermanyRené C.G. Arnold
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Georg Rehm
 
Digital Public Affairs by Fleishman-Hillard
Digital Public Affairs by Fleishman-Hillard Digital Public Affairs by Fleishman-Hillard
Digital Public Affairs by Fleishman-Hillard Hanneke Verhelst
 
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlang
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlangHashtag Campaign Case Study: Rubbellos.at - #einlebenlang
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlangDie Socialisten
 
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...Anand Sheombar
 
Meeting Solutions: 2018 Media & Influencer Analysis
Meeting Solutions: 2018 Media & Influencer AnalysisMeeting Solutions: 2018 Media & Influencer Analysis
Meeting Solutions: 2018 Media & Influencer AnalysisZeno Group
 
BigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyBigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyMelissa Hoover
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysisAntaraBhattacharya12
 
Social media anettebrandt_21.9
Social media anettebrandt_21.9Social media anettebrandt_21.9
Social media anettebrandt_21.9nettib
 
Automation & Robotics: 2018 Media & Influencer Analysis
Automation & Robotics: 2018 Media & Influencer AnalysisAutomation & Robotics: 2018 Media & Influencer Analysis
Automation & Robotics: 2018 Media & Influencer AnalysisZeno Group
 
Media Lounge Emarketer Presentation 032708
Media Lounge Emarketer Presentation 032708Media Lounge Emarketer Presentation 032708
Media Lounge Emarketer Presentation 032708Dmytro Lysiuk
 
eMarketer Webinar: Key Digital Trends for 2011
eMarketer Webinar: Key Digital Trends for 2011eMarketer Webinar: Key Digital Trends for 2011
eMarketer Webinar: Key Digital Trends for 2011eMarketer
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOKPOLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOKIJwest
 

Similar to MKWI 2018 - Discussing the Value of Hate Speech Detection (20)

Family Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipationFamily Policies – A Promising Field of eParticipation
Family Policies – A Promising Field of eParticipation
 
Data Science: 2018 Media & Influencer Analysis
Data Science: 2018 Media & Influencer AnalysisData Science: 2018 Media & Influencer Analysis
Data Science: 2018 Media & Influencer Analysis
 
Attentio - Simon McDermott
Attentio - Simon McDermottAttentio - Simon McDermott
Attentio - Simon McDermott
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
Got the message? Communication behaviour in Germany
Got the message? Communication behaviour in GermanyGot the message? Communication behaviour in Germany
Got the message? Communication behaviour in Germany
 
Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...Observations on Annotations – From Computational Linguistics and the World Wi...
Observations on Annotations – From Computational Linguistics and the World Wi...
 
Digital Public Affairs by Fleishman-Hillard
Digital Public Affairs by Fleishman-Hillard Digital Public Affairs by Fleishman-Hillard
Digital Public Affairs by Fleishman-Hillard
 
IEEE big data 2015
IEEE big data 2015IEEE big data 2015
IEEE big data 2015
 
New Marketing
New MarketingNew Marketing
New Marketing
 
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlang
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlangHashtag Campaign Case Study: Rubbellos.at - #einlebenlang
Hashtag Campaign Case Study: Rubbellos.at - #einlebenlang
 
eParticipation and eDemocracy in Austria
eParticipation and eDemocracy in AustriaeParticipation and eDemocracy in Austria
eParticipation and eDemocracy in Austria
 
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...
EOOH: the Development of a Multiplatform and Multilingual Online Hate Speech ...
 
Meeting Solutions: 2018 Media & Influencer Analysis
Meeting Solutions: 2018 Media & Influencer AnalysisMeeting Solutions: 2018 Media & Influencer Analysis
Meeting Solutions: 2018 Media & Influencer Analysis
 
BigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategyBigFoot Digital: Dramaturgical self and content marketing strategy
BigFoot Digital: Dramaturgical self and content marketing strategy
 
Python report on twitter sentiment analysis
Python report on twitter sentiment analysisPython report on twitter sentiment analysis
Python report on twitter sentiment analysis
 
Social media anettebrandt_21.9
Social media anettebrandt_21.9Social media anettebrandt_21.9
Social media anettebrandt_21.9
 
Automation & Robotics: 2018 Media & Influencer Analysis
Automation & Robotics: 2018 Media & Influencer AnalysisAutomation & Robotics: 2018 Media & Influencer Analysis
Automation & Robotics: 2018 Media & Influencer Analysis
 
Media Lounge Emarketer Presentation 032708
Media Lounge Emarketer Presentation 032708Media Lounge Emarketer Presentation 032708
Media Lounge Emarketer Presentation 032708
 
eMarketer Webinar: Key Digital Trends for 2011
eMarketer Webinar: Key Digital Trends for 2011eMarketer Webinar: Key Digital Trends for 2011
eMarketer Webinar: Key Digital Trends for 2011
 
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOKPOLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
POLITICAL OPINION ANALYSIS IN SOCIAL NETWORKS: CASE OF TWITTER AND FACEBOOK
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 

MKWI 2018 - Discussing the Value of Hate Speech Detection

  • 1. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 1 08.03.2018 1 08.03.2018 DISCUSSING THE VALUE OF AUTOMATIC HATE SPEECH DETECTION IN ONLINE DEBATES SEBASTIAN KÖFFER, DENNIS RIEHLE, STEFFEN HÖHENBERGER, JÖRG BECKER #HateMining
  • 2. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 2 08.03.2018 RESEARCH QUESTION Can we use analytics to reduce hate speech on the web 1)Yes, but it‘s complicated. 2) If so, should we do this?
  • 3. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 3 08.03.2018 EASY STUFF: GETTING ABUSIVE COMMENTS Platform Articles Comments % Hate Compact 328 11,764 30 Focus 3,959 75,857 18 Freie Welt 1,944 13,628 41 Junge Freiheit 333 2,745 35 Rheinische Post 991 3,678 28 Welt 1,921 182,625 23 Zeit 5,812 25,792 12 … Sum 21,740 376,143 27 Source: www.hatemining.de
  • 4. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 4 08.03.2018 DIFFICULT STUFF: FEATURE EXTRACTION AND SUPERVISED LEARNING Feature group Accuracy / F-score Our paper Nobata et al. (2016) Bag of Words 0.68 / 0.51 0.75 / 0.54 Character 2-grams 0.62 / 0.64 - Character 3-grams 0.66 / 0.65 0.90 / 0.77 Linguistics 0.57 / 0.53 0.64 / 0.51 Word2Vec 0.67 / 0.67 0.84 / 0.67 Doc2Vec 0.65 / 0.63 0.85/ 0.67 Best approach 0.71 / 0.70 0.90 / 0.78 Source: Nobata et al. (2016) Abusive Language Detection in Online User Content
  • 5. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 5 08.03.2018 2016 study  Binary classification for Hate / Non-hate  Hate-Definition from the Council of Europe as orientation  Multiple rating per comment Results  27% hate classification  many discussions REALLY HARD STUFF: HOW TO GET LABELED COMMENT DATA? HOW TO LABEL THE COMMENTS?
  • 6. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 6 08.03.2018 2016 study  Binary classification for Hate / Non-hate  Hate definition from the Council of Europe as orientation  Multiple rating per comment Results  27% hate classification  many discussions HARD STUFF: HOW TO GET LABELED COMMENT DATA? HOW TO LABEL THE COMMENTS? 2018 current work  Multi-label classification for problematic comments  No definitions  Multiple ratings per comment Results  68% problematic comments  17% hate speech  More information about data  Still a lot of disagreement
  • 7. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 7 08.03.2018 „smartphones von der regierung geschenkt, bezahlt mit steuergeldern! wer bezahlt die verträge!wohnung bezahlt von uns!essen bezahlt von uns!kurse bezahlt von uns!transport bezahlt von uns!..übrigens hat marocco seine gefängnisse auf unsere kosten geleert!!...und SIE VERBRECHEN! TERROR!WOLLEN UNS VERNICHTEN!SOLLEN SIE endlich IM KANZLERAMT ANFANGEN!“ EXAMPLES OF THE ENHANCED COMMENT LABELING PROCEDURE 10 / 10 Problematic 7 hate 3 language 2 insult 3 threat
  • 8. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 8 08.03.2018 RESEARCH OBJECTIVE Can we use analytics to reduce hate speech on the web 1)Yes, but it‘s complicated. 2) If so, should we do this?
  • 9. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 9 08.03.2018 If technology is part of the problem, then it should also be part of the solution. OBJECTIVE: BRINGING TOGETHER POPULAR ARGUMENTS IN THE DEBATE HOW TO COMBAT HATE SPEECH The society must be tolerant towards a broad spectrum of opinions. Automatic deletion/filtering of comments is excessive censorship. Manual deletion/filtering is not feasible any more. Traditional media needs software tools to curate the wisdom of the debate We need enhanced media literacy to better evaluate debate content.
  • 10. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 10 08.03.2018 HOW MANY PERCENT OF USERS WRITE 50 PERCENT OF ALL COMMENTS? Source: www.hatemining.de
  • 11. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 11 08.03.2018  Large and representative evaluation dataset  Multi-label classification methods with high accuracy of prediction estimates THREE BUILDING BLOCKS OF FUTURE RESEARCH ON HATE SPEECH DETECTION Sound methodology Algorithmic transparency Human involvement  Visualize the decision criteria of algorithms, e.g. subcategories, toxic words, etc.  Open science paradigm  Semi-automated approaches necessary for higher acceptance and adaptability  Let professional journalists steer the debate
  • 12. Köffer, Riehle, Höhenberger, Becker (2018) Discussing the Value of Automatic Hate Speech Detection in Online Debates Multikonferenz Wirtschaftsinformatik, Lüneburg 12 08.03.2018 Research project „Cyberhate-Mining“ at the European Research Center for Information Systems Web: www.hatemining.de E-Mail: team@hatemining.de WE ARE ACTIVELY LOOKING FOR COOPERATION TO CONTINUE OUR RESEARCH PROJECT  Valuable contacts to the media industry  Money to fund labeling of datasets  Partners for joint research fund applications  Joint data generation and analysis Contact #HateMining