2. Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
3. Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
4. Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
The referenced comment: score 1261
“Thing is, even people against [offensive subreddit] are
leaving...”
5. Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
The referenced comment: score 1261
“Thing is, even people against [offensive subreddit] are
leaving...”
Think of the moderators
Need to sift through growing number of flagged comments
7. Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
8. Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
Two algorithms
Logistic regression on
% uppercase ALL CAPS
% internetspeak rofl
% profanity ****
9. Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
Two algorithms
Logistic regression on
% uppercase ALL CAPS
% internetspeak rofl
% profanity ****
Naive Bayes on term frequencies (1-grams to 4-grams)
16. Evaluation
Logistic and naive Bayes are similarly accurate 62%
BUT accuracy is not quite what I’m aiming for:
Logistic
pred. bad pred. good
actual bad 1214 630
actual good 577 825
17. Evaluation
Logistic and naive Bayes are similarly accurate 62%
BUT accuracy is not quite what I’m aiming for:
Logistic
pred. bad pred. good
actual bad 1214 630
actual good 577 825
naive Bayes
pred. bad pred. good
actual bad 1670 1149
actual good 121 306