SlideShare a Scribd company logo
1 of 17
Download to read offline
cReddit score
Gautam Sisodia
June 30, 2015
Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
The referenced comment: score 1261
“Thing is, even people against [offensive subreddit] are
leaving...”
Hateful comments on online forums
Reddit blog post, May 2015
#1 reason users don’t recommend the site: “avoid exposing
friends to hate and offensive content”
Comment example: score = upvotes - downvotes = -64
“Shut up. That’s oversimplifying this whole issue...”
The referenced comment: score 1261
“Thing is, even people against [offensive subreddit] are
leaving...”
Think of the moderators
Need to sift through growing number of flagged comments
cReddit score
Enter credditscore.me: flag hurtful comments before they’re
posted
Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
Two algorithms
Logistic regression on
% uppercase ALL CAPS
% internetspeak rofl
% profanity ****
Data and models
Data
Take the good (score > 15) and bad (score < 0) comments
Each class is 10% (∼12000 comments) of the full data set
Two algorithms
Logistic regression on
% uppercase ALL CAPS
% internetspeak rofl
% profanity ****
Naive Bayes on term frequencies (1-grams to 4-grams)
Evaluation
They perform similarly
ALL CAPS, no caps
About me!
The effect of metadata on score
Evaluation
Logistic and naive Bayes are similarly accurate 62%
Evaluation
Logistic and naive Bayes are similarly accurate 62%
BUT accuracy is not quite what I’m aiming for:
Evaluation
Logistic and naive Bayes are similarly accurate 62%
BUT accuracy is not quite what I’m aiming for:
Logistic
pred. bad pred. good
actual bad 1214 630
actual good 577 825
Evaluation
Logistic and naive Bayes are similarly accurate 62%
BUT accuracy is not quite what I’m aiming for:
Logistic
pred. bad pred. good
actual bad 1214 630
actual good 577 825
naive Bayes
pred. bad pred. good
actual bad 1670 1149
actual good 121 306

More Related Content

Similar to cReddit Score

SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docxSOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
whitneyleman54422
 
Rubric Detail A rubric lists grading criteria that instruct
  Rubric Detail  A rubric lists grading criteria that instruct  Rubric Detail  A rubric lists grading criteria that instruct
Rubric Detail A rubric lists grading criteria that instruct
ajoy21
 
Influence Of Recommendations
Influence Of RecommendationsInfluence Of Recommendations
Influence Of Recommendations
Iterative Path
 

Similar to cReddit Score (14)

Pt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding ProsPt 1 Analyzing Your Website Workshop for Wedding Pros
Pt 1 Analyzing Your Website Workshop for Wedding Pros
 
Quantifying the Invisible Audience in Social Networks
Quantifying the Invisible Audience in Social NetworksQuantifying the Invisible Audience in Social Networks
Quantifying the Invisible Audience in Social Networks
 
Text_Mining_en
Text_Mining_enText_Mining_en
Text_Mining_en
 
Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11Leveraging Social Media for Student Engagement - Updated 8/8/11
Leveraging Social Media for Student Engagement - Updated 8/8/11
 
CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...CRO and SEO together: what happens when what's good for users isn't good for ...
CRO and SEO together: what happens when what's good for users isn't good for ...
 
SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docxSOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
SOC212 - Application Question #2Due Friday, April 8th at 1159pm.docx
 
Vendor Audit Powerpoint Presentation Slides
Vendor Audit Powerpoint Presentation SlidesVendor Audit Powerpoint Presentation Slides
Vendor Audit Powerpoint Presentation Slides
 
Rubric Detail A rubric lists grading criteria that instruct
  Rubric Detail  A rubric lists grading criteria that instruct  Rubric Detail  A rubric lists grading criteria that instruct
Rubric Detail A rubric lists grading criteria that instruct
 
Vendor Audit PowerPoint Presentation Slides
Vendor Audit PowerPoint Presentation SlidesVendor Audit PowerPoint Presentation Slides
Vendor Audit PowerPoint Presentation Slides
 
Neo4j Data Science Presentation
Neo4j Data Science PresentationNeo4j Data Science Presentation
Neo4j Data Science Presentation
 
Points Don't Mean Prizes
Points Don't Mean PrizesPoints Don't Mean Prizes
Points Don't Mean Prizes
 
Influence Of Recommendations
Influence Of RecommendationsInfluence Of Recommendations
Influence Of Recommendations
 
Bootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4jBootstrapping Recommendations with Neo4j
Bootstrapping Recommendations with Neo4j
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 

Recently uploaded

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
jk0tkvfv
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
acoha1
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
pwgnohujw
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
yulianti213969
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
mikehavy0
 

Recently uploaded (20)

如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1DAA Assignment Solution.pdf is the best1
DAA Assignment Solution.pdf is the best1
 
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get CytotecAbortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
Abortion pills in Doha {{ QATAR }} +966572737505) Get Cytotec
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
Abortion Clinic in Kempton Park +27791653574 WhatsApp Abortion Clinic Service...
 

cReddit Score

  • 2. Hateful comments on online forums Reddit blog post, May 2015 #1 reason users don’t recommend the site: “avoid exposing friends to hate and offensive content”
  • 3. Hateful comments on online forums Reddit blog post, May 2015 #1 reason users don’t recommend the site: “avoid exposing friends to hate and offensive content” Comment example: score = upvotes - downvotes = -64 “Shut up. That’s oversimplifying this whole issue...”
  • 4. Hateful comments on online forums Reddit blog post, May 2015 #1 reason users don’t recommend the site: “avoid exposing friends to hate and offensive content” Comment example: score = upvotes - downvotes = -64 “Shut up. That’s oversimplifying this whole issue...” The referenced comment: score 1261 “Thing is, even people against [offensive subreddit] are leaving...”
  • 5. Hateful comments on online forums Reddit blog post, May 2015 #1 reason users don’t recommend the site: “avoid exposing friends to hate and offensive content” Comment example: score = upvotes - downvotes = -64 “Shut up. That’s oversimplifying this whole issue...” The referenced comment: score 1261 “Thing is, even people against [offensive subreddit] are leaving...” Think of the moderators Need to sift through growing number of flagged comments
  • 6. cReddit score Enter credditscore.me: flag hurtful comments before they’re posted
  • 7. Data and models Data Take the good (score > 15) and bad (score < 0) comments Each class is 10% (∼12000 comments) of the full data set
  • 8. Data and models Data Take the good (score > 15) and bad (score < 0) comments Each class is 10% (∼12000 comments) of the full data set Two algorithms Logistic regression on % uppercase ALL CAPS % internetspeak rofl % profanity ****
  • 9. Data and models Data Take the good (score > 15) and bad (score < 0) comments Each class is 10% (∼12000 comments) of the full data set Two algorithms Logistic regression on % uppercase ALL CAPS % internetspeak rofl % profanity **** Naive Bayes on term frequencies (1-grams to 4-grams)
  • 11. ALL CAPS, no caps
  • 13. The effect of metadata on score
  • 14. Evaluation Logistic and naive Bayes are similarly accurate 62%
  • 15. Evaluation Logistic and naive Bayes are similarly accurate 62% BUT accuracy is not quite what I’m aiming for:
  • 16. Evaluation Logistic and naive Bayes are similarly accurate 62% BUT accuracy is not quite what I’m aiming for: Logistic pred. bad pred. good actual bad 1214 630 actual good 577 825
  • 17. Evaluation Logistic and naive Bayes are similarly accurate 62% BUT accuracy is not quite what I’m aiming for: Logistic pred. bad pred. good actual bad 1214 630 actual good 577 825 naive Bayes pred. bad pred. good actual bad 1670 1149 actual good 121 306