Your SlideShare is downloading. ×
Industrialize Sentiment Analysis for Comment Moderation
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Industrialize Sentiment Analysis for Comment Moderation


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Industrialize Sentiment Analysis for Comment Moderation Maggie Xiong Huffington Post
  • 2. Basic Comment Moderation Process  User comments on an article  Moderator publishes or rejects a comment based on a set of guidelines  “10 commandments”  Comments for different articles come in every second. We would need a small army to handle the moderation. The comment should contribute to the discussion, conveying a respectful message, thought  or idea, whether or not it agrees with another user or the author. The comment should not intentionally misspell words, use non-alphabetic characters, or use  extra or missing spaces to bypass moderation. The comment should not attack, demean, belittle, or stereotype any person or group. ...
  • 3. JuLiA to the Rescue  Sentiment analysis suite - JuLiA  Supports various preprocessing options  Stemming, stopwords, etc  Includes a number of popular ML algorithms  SVM, naïve Bayes, AdaBoost (decision tree), etc  Uses hadoop for parallelizing the training of different models and for the exploration of the parameter space  Train 1000's of models with different param setup in parallel  Pick the winner for production  Ensemble the different winners for even higher accuracy
  • 4. Training Data  Goldset  About 20000 comments (~13000 train, ~7000 holdout)  Publish-or-reject votes from 3 moderators Christian and Gay? One Politician's Personal Interview (VIDEO) I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it. Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America' what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick Paul Ryan Spending Cuts Face Backlash From Moderate Republicans You seem to take a negative view of democrats and draw reference to a study "I co- authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving of course I'm just using republican/fox news logic here"
  • 5. Training Process 73 923 balanced_winnow 5 1 10 … 73 923 balanced_winnow 5 2 10 … 73 923 balanced_winnow 5 3 10 … 73 923 balanced_winnow 5 1 20 … 73 923 balanced_winnow 5 2 20 … 73 923 balanced_winnow 5 3 20 … … Train Request (a parameter set per line) Investments are taxed as capital gains..... 1 It was the overleveraged and underregulated banks … 1 I am afraid we may be headed for … 1 In the famous words of Homer Simpson, “it takes 2 to lie …” 0 … Training Data Model 1Model 1 Model 2Model 2 Model 3Model 3 Model 4Model 4 Model 5Model 5 Model kModel k Hadoop Cluster
  • 6. Results  Single best model: Naïve Bayes
  • 7. Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 8. Pool for Better Results  Logistic regression using multiple model results
  • 9. Pool for Better Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 10. Further Steps  Improve the training data set  Data gathered within moderators' normal work flow  More votes per comment  More comments  Per vertical models  Incorporate comment-to-article similarity
  • 11. In addition to saving his own life, Zimmerman likely save a couple other lives as well.
  • 12. Thanks!  Conversation and Machine Learning teams  We are hiring! –