SlideShare a Scribd company logo
1 of 18
Download to read offline
Online Hate Speech
Towards automated moderation
Emily Y. Spahn
Galvanize Data Science Immersive, Seattle, Mar 2016
What is Hate Speech?
Speech advocating incitement to harm based on
the target's membership in a group
Definition
The Problem with
Hate Speech Online
Alienates Users
&
Costs Time
AUTOMATE!
⇒ Build a model to predict if a
comment is hate speech, and if so,
against what group.
What can we do about online hate speech?
DATA SOURCE: May 2015 Reddit comments
Data Used: subreddits & data labeling
Reddit Comments from May 2015
54.5+ million comments with metadata, from 50,138 subreddits
Hateful Subreddits
11 hateful subreddits
565,494 hateful comments:
● 56% body size
● 33.6% gender
● 9.4% race
● 1% religion
Not Hateful Subreddits
13 not hateful subreddits
1,012,052 not hateful comments:
● 75% sometimes controversial
but well-moderated subreddits
● 11.2 % gender
● 7.7 % religion
● 5.4 % body size
● 0.4 % race
Tools Used
Computing & Analysis
Natural Language
Processing &
Classification Modeling
NLTK
Modeling
TF-IDF on 1.1 million
comments
XGBoost multi-class
classifier
Word2Vec for word
embeddings
TF-IDF: Term Frequency-Inverse Document Frequency
words in comments
Image from http://brandonrose.org/clustering
matrix of numbers
i : the word
j: the document
Bag of words + factor to weight rarely occurring words more than common ones
Gradient Boosted Trees Classifier
From XGBoost Documentation
Decision trees:
Gradient Boosted Trees Classifier
From XGBoost Documentation
Tree Ensembles
Gradient Boosted Trees Classifier
Working on labeled data:
Create one tree & run model
Find residuals (differences between model result & labeled data)
Create 2nd tree to fit to the residuals
New results = results from 1st tree + those from 2nd tree
Find new residuals
Repeat, adding a tree to the model each time to fit the
residuals, until you reach a cut off criteria.
ROC Curve: Examine classification model success Most important features
fat
like
peopl
just
white
dont
fuck
im
becaus
game
jew
women
weight
say
Potential Use Cases for the Predictive Model
More time for the mods!
User posts hateful comment
Model flags comment as hateful
Comment is in limbo until a
human moderator reads it
Human evaluates comment and
publishes or deletes
Power to the People!
Indicate via user icons or status
information those who have a
recent history of hateful comments.
Let site users decide if they want to
read what this person has to say.
Word2Vec: Most Similar Words
“fat”
skinny
ugly
lazy
lard
fatshit
fatass
slenderman
gtbanned
stupid
hamplanet
skinny
overweight
obese
underweight
and
muscular
that
body
is
anorexic
Thank You!
Emily Y Spahn
spahn@uw.edu
@eyspahn
https://github.com/eyspahn/OnlineHateSpeech
Clip art in the presentation from https://openclipart.org/
Example Comment
Data Used: subreddits Hateful Subreddits
Subreddit
Name
Comment
Count
Hate Type
CoonTown 51979 Race
WhiteRights 1352 Race
Transfags 2362 Gender
SlutJustice 209 Gender
TheRedPill 59145 Gender
KotakuInAction 128156 Gender
IslamUnveiled 110 Religion
GasTheKikes 919 Religion
AntiPOZi 4740 Religion
fatpeoplehate 311183 Size
TalesofFateHate 5239 Size
Not Hateful Subreddits
Subreddit Name
politics DebateReligion
worldnews religion
history islam
blackladies Judaism
lgbt BodyAcceptance
TransSpace fatlogic
TwoChromosomes women

More Related Content

Viewers also liked

Social structure
Social structureSocial structure
Social structureSocProf
 
Wo W Community Of Practice
Wo W Community Of PracticeWo W Community Of Practice
Wo W Community Of PracticeSatyajit Das
 
Ferdinand Toenmies
Ferdinand ToenmiesFerdinand Toenmies
Ferdinand ToenmiesPam Green
 
Ferdinand tonnies contribution to Social Sciences.
Ferdinand tonnies contribution to Social Sciences.Ferdinand tonnies contribution to Social Sciences.
Ferdinand tonnies contribution to Social Sciences.Muhammad Awais
 
Web2.0 For Community of Practice
Web2.0 For Community of PracticeWeb2.0 For Community of Practice
Web2.0 For Community of PracticePeter Rawsthorne
 
Fighting Hate Speech Online - Filip Stojanovski
Fighting Hate Speech Online - Filip StojanovskiFighting Hate Speech Online - Filip Stojanovski
Fighting Hate Speech Online - Filip StojanovskiMetamorphosis
 
Nation And Identity
Nation And IdentityNation And Identity
Nation And IdentityClive McGoun
 
Creating a Learning Culture on Your Team
Creating a Learning Culture on Your TeamCreating a Learning Culture on Your Team
Creating a Learning Culture on Your TeamKeeley Sorokti
 
Sociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesSociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesWildan Al-Qudsy
 
Communities of Practice: Conversations To Collaboration
Communities of Practice: Conversations To CollaborationCommunities of Practice: Conversations To Collaboration
Communities of Practice: Conversations To CollaborationCollabor8now Ltd
 
How to create an active and vibrant online community of practice apm webinar
How to create an active and vibrant online community of practice apm webinarHow to create an active and vibrant online community of practice apm webinar
How to create an active and vibrant online community of practice apm webinarAssociation for Project Management
 
introduction-to-sociology-and-anthropology
 introduction-to-sociology-and-anthropology introduction-to-sociology-and-anthropology
introduction-to-sociology-and-anthropologyJoseph Salimbangon
 

Viewers also liked (16)

Social structure
Social structureSocial structure
Social structure
 
Wo W Community Of Practice
Wo W Community Of PracticeWo W Community Of Practice
Wo W Community Of Practice
 
Ferdinand Toenmies
Ferdinand ToenmiesFerdinand Toenmies
Ferdinand Toenmies
 
Ferdinand tonnies contribution to Social Sciences.
Ferdinand tonnies contribution to Social Sciences.Ferdinand tonnies contribution to Social Sciences.
Ferdinand tonnies contribution to Social Sciences.
 
Web2.0 For Community of Practice
Web2.0 For Community of PracticeWeb2.0 For Community of Practice
Web2.0 For Community of Practice
 
Fighting Hate Speech Online - Filip Stojanovski
Fighting Hate Speech Online - Filip StojanovskiFighting Hate Speech Online - Filip Stojanovski
Fighting Hate Speech Online - Filip Stojanovski
 
Imagined Communities
Imagined CommunitiesImagined Communities
Imagined Communities
 
Nation And Identity
Nation And IdentityNation And Identity
Nation And Identity
 
Creating a Learning Culture on Your Team
Creating a Learning Culture on Your TeamCreating a Learning Culture on Your Team
Creating a Learning Culture on Your Team
 
Sociolinguistics Speech Communities
Sociolinguistics Speech CommunitiesSociolinguistics Speech Communities
Sociolinguistics Speech Communities
 
Communities of Practice: Conversations To Collaboration
Communities of Practice: Conversations To CollaborationCommunities of Practice: Conversations To Collaboration
Communities of Practice: Conversations To Collaboration
 
How to create an active and vibrant online community of practice apm webinar
How to create an active and vibrant online community of practice apm webinarHow to create an active and vibrant online community of practice apm webinar
How to create an active and vibrant online community of practice apm webinar
 
Arts marketing
Arts marketingArts marketing
Arts marketing
 
introduction-to-sociology-and-anthropology
 introduction-to-sociology-and-anthropology introduction-to-sociology-and-anthropology
introduction-to-sociology-and-anthropology
 
Nation and Nationalism Theories
Nation and Nationalism TheoriesNation and Nationalism Theories
Nation and Nationalism Theories
 
Speech Community
Speech CommunitySpeech Community
Speech Community
 

Similar to Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

Top Rated:Improve Personality.
Top Rated:Improve Personality.Top Rated:Improve Personality.
Top Rated:Improve Personality.Akshay Kamble
 
Estola 5 20-16 ml_conf - when recommendation systems go bad
Estola   5 20-16 ml_conf - when recommendation systems go badEstola   5 20-16 ml_conf - when recommendation systems go bad
Estola 5 20-16 ml_conf - when recommendation systems go badEvan Estola
 
Social media analytics - Making sense of Big Data
Social media analytics - Making sense of Big DataSocial media analytics - Making sense of Big Data
Social media analytics - Making sense of Big DataHenrik Hammer Eliassen
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignJonathan Stray
 
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16MLconf
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsAdam Proehl
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You? Anne Adrian
 
Blogging For A Better Classroom
Blogging For A Better ClassroomBlogging For A Better Classroom
Blogging For A Better ClassroomVicki Davis
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentSandy Man
 
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...Data Con LA
 
Blogging For A Better Classroom 200pm
Blogging For A Better Classroom 200pmBlogging For A Better Classroom 200pm
Blogging For A Better Classroom 200pmVicki Davis
 
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptx
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptxYouemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptx
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptxYoueminAngeRoxaneMie
 
Searching internet presentation
Searching internet presentationSearching internet presentation
Searching internet presentationSuzanne Arafat
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php applicationMichele Orselli
 
When recommendation go bad
When recommendation go badWhen recommendation go bad
When recommendation go badIntoTheMinds
 
Reflection In Web2.0 Style
Reflection In Web2.0 StyleReflection In Web2.0 Style
Reflection In Web2.0 StyleMartin Homik
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsParang Saraf
 
Recommender Systems and the Human Factor
Recommender Systems and the Human FactorRecommender Systems and the Human Factor
Recommender Systems and the Human FactorMark Graus
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 

Similar to Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016 (20)

Top Rated:Improve Personality.
Top Rated:Improve Personality.Top Rated:Improve Personality.
Top Rated:Improve Personality.
 
Estola 5 20-16 ml_conf - when recommendation systems go bad
Estola   5 20-16 ml_conf - when recommendation systems go badEstola   5 20-16 ml_conf - when recommendation systems go bad
Estola 5 20-16 ml_conf - when recommendation systems go bad
 
Social media analytics - Making sense of Big Data
Social media analytics - Making sense of Big DataSocial media analytics - Making sense of Big Data
Social media analytics - Making sense of Big Data
 
Frontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter DesignFrontiers of Computational Journalism week 3 - Information Filter Design
Frontiers of Computational Journalism week 3 - Information Filter Design
 
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
Evan Estola, Lead Machine Learning Engineer, Meetup at MLconf SEA - 5/20/16
 
Pubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & ToolsPubcon Vegas 2010 - Social Media: Measurements & Tools
Pubcon Vegas 2010 - Social Media: Measurements & Tools
 
What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?   				What Do Future Technology and Trends Mean for You?
What Do Future Technology and Trends Mean for You?
 
Blogging For A Better Classroom
Blogging For A Better ClassroomBlogging For A Better Classroom
Blogging For A Better Classroom
 
Rigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deploymentRigourous evaluation of nlp models in real world deployment
Rigourous evaluation of nlp models in real world deployment
 
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
Big Data Day LA 2015 - Data Science at Whisper - From content quality to pers...
 
Blogging For A Better Classroom 200pm
Blogging For A Better Classroom 200pmBlogging For A Better Classroom 200pm
Blogging For A Better Classroom 200pm
 
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptx
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptxYouemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptx
Youemin_Ange_Roxane_Miessan_-_(Final_Exam)_Your_Major-Career_Investigation.pptx
 
Searching internet presentation
Searching internet presentationSearching internet presentation
Searching internet presentation
 
Searching internet presentation
Searching internet presentationSearching internet presentation
Searching internet presentation
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php application
 
When recommendation go bad
When recommendation go badWhen recommendation go bad
When recommendation go bad
 
Reflection In Web2.0 Style
Reflection In Web2.0 StyleReflection In Web2.0 Style
Reflection In Web2.0 Style
 
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest EventsEMBERS AutoGSR: Automated Coding of Civil Unrest Events
EMBERS AutoGSR: Automated Coding of Civil Unrest Events
 
Recommender Systems and the Human Factor
Recommender Systems and the Human FactorRecommender Systems and the Human Factor
Recommender Systems and the Human Factor
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 

More from Seattle DAML meetup

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Seattle DAML meetup
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Seattle DAML meetup
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Seattle DAML meetup
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Seattle DAML meetup
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Seattle DAML meetup
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Seattle DAML meetup
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Seattle DAML meetup
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Seattle DAML meetup
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Seattle DAML meetup
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015Seattle DAML meetup
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Seattle DAML meetup
 

More from Seattle DAML meetup (11)

Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016Karin Strauss - DNA Storage, July 2016
Karin Strauss - DNA Storage, July 2016
 
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
Alex Korbonits, "AUC at what costs?" Seattle DAML June 2016
 
Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...Understanding disparities using the American Community Survey - Sean Green, M...
Understanding disparities using the American Community Survey - Sean Green, M...
 
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016Frequent Pattern Mining - Krishna Sridhar, Feb 2016
Frequent Pattern Mining - Krishna Sridhar, Feb 2016
 
Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016Streaming Hypothesis Reasoning - William Smith, Jan 2016
Streaming Hypothesis Reasoning - William Smith, Jan 2016
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015
 
Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015Hunting criminals with hybrid analytics -- October 2015
Hunting criminals with hybrid analytics -- October 2015
 
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
Machine Learning in Biology and Why It Doesn't Make Sense - Theo Knijnenburg,...
 
Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015Adventures in Data Visualization - Jeff Heer, May 2015
Adventures in Data Visualization - Jeff Heer, May 2015
 
The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015The Road to Data Science - Joel Grus, June 2015
The Road to Data Science - Joel Grus, June 2015
 
Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015Scaling decision trees - George Murray, July 2015
Scaling decision trees - George Murray, July 2015
 

Recently uploaded

Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomyDrAnita Sharma
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 

Recently uploaded (20)

Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
basic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomybasic entomology with insect anatomy and taxonomy
basic entomology with insect anatomy and taxonomy
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 

Towards Automatic Moderation of Online Hate Speech - Emily Spahn, March 2016

  • 1. Online Hate Speech Towards automated moderation Emily Y. Spahn Galvanize Data Science Immersive, Seattle, Mar 2016
  • 2. What is Hate Speech? Speech advocating incitement to harm based on the target's membership in a group Definition
  • 3. The Problem with Hate Speech Online Alienates Users & Costs Time
  • 4. AUTOMATE! ⇒ Build a model to predict if a comment is hate speech, and if so, against what group. What can we do about online hate speech?
  • 5. DATA SOURCE: May 2015 Reddit comments
  • 6. Data Used: subreddits & data labeling Reddit Comments from May 2015 54.5+ million comments with metadata, from 50,138 subreddits Hateful Subreddits 11 hateful subreddits 565,494 hateful comments: ● 56% body size ● 33.6% gender ● 9.4% race ● 1% religion Not Hateful Subreddits 13 not hateful subreddits 1,012,052 not hateful comments: ● 75% sometimes controversial but well-moderated subreddits ● 11.2 % gender ● 7.7 % religion ● 5.4 % body size ● 0.4 % race
  • 7. Tools Used Computing & Analysis Natural Language Processing & Classification Modeling NLTK
  • 8. Modeling TF-IDF on 1.1 million comments XGBoost multi-class classifier Word2Vec for word embeddings
  • 9. TF-IDF: Term Frequency-Inverse Document Frequency words in comments Image from http://brandonrose.org/clustering matrix of numbers i : the word j: the document Bag of words + factor to weight rarely occurring words more than common ones
  • 10. Gradient Boosted Trees Classifier From XGBoost Documentation Decision trees:
  • 11. Gradient Boosted Trees Classifier From XGBoost Documentation Tree Ensembles
  • 12. Gradient Boosted Trees Classifier Working on labeled data: Create one tree & run model Find residuals (differences between model result & labeled data) Create 2nd tree to fit to the residuals New results = results from 1st tree + those from 2nd tree Find new residuals Repeat, adding a tree to the model each time to fit the residuals, until you reach a cut off criteria.
  • 13. ROC Curve: Examine classification model success Most important features fat like peopl just white dont fuck im becaus game jew women weight say
  • 14. Potential Use Cases for the Predictive Model More time for the mods! User posts hateful comment Model flags comment as hateful Comment is in limbo until a human moderator reads it Human evaluates comment and publishes or deletes Power to the People! Indicate via user icons or status information those who have a recent history of hateful comments. Let site users decide if they want to read what this person has to say.
  • 15. Word2Vec: Most Similar Words “fat” skinny ugly lazy lard fatshit fatass slenderman gtbanned stupid hamplanet skinny overweight obese underweight and muscular that body is anorexic
  • 16. Thank You! Emily Y Spahn spahn@uw.edu @eyspahn https://github.com/eyspahn/OnlineHateSpeech Clip art in the presentation from https://openclipart.org/
  • 18. Data Used: subreddits Hateful Subreddits Subreddit Name Comment Count Hate Type CoonTown 51979 Race WhiteRights 1352 Race Transfags 2362 Gender SlutJustice 209 Gender TheRedPill 59145 Gender KotakuInAction 128156 Gender IslamUnveiled 110 Religion GasTheKikes 919 Religion AntiPOZi 4740 Religion fatpeoplehate 311183 Size TalesofFateHate 5239 Size Not Hateful Subreddits Subreddit Name politics DebateReligion worldnews religion history islam blackladies Judaism lgbt BodyAcceptance TransSpace fatlogic TwoChromosomes women