SlideShare a Scribd company logo
Industrialize Sentiment Analysis
for Comment Moderation
Maggie Xiong
Huffington Post
Basic Comment Moderation Process

User comments on an article

Moderator publishes or rejects a comment based on a
set of guidelines

“10 commandments”

Comments for different articles come in every second.
We would need a small army to handle the moderation.
The comment should contribute to the discussion, conveying a respectful message, thought 
or idea, whether or not it agrees with another user or the author.
The comment should not intentionally misspell words, use non-alphabetic characters, or use 
extra or missing spaces to bypass moderation.
The comment should not attack, demean, belittle, or stereotype any person or group.
...
JuLiA to the Rescue

Sentiment analysis suite - JuLiA

Supports various preprocessing options

Stemming, stopwords, etc

Includes a number of popular ML algorithms

SVM, naïve Bayes, AdaBoost (decision tree), etc

Uses hadoop for parallelizing the training of different
models and for the exploration of the parameter space

Train 1000's of models with different param setup in parallel

Pick the winner for production

Ensemble the different winners for even higher accuracy
Training Data

Goldset

About 20000 comments (~13000 train, ~7000 holdout)

Publish-or-reject votes from 3 moderators
Christian and Gay? One Politician's Personal Interview (VIDEO)
I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have
read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's
your interpretation of the scripture then make sure you abide by it.
Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America'
what an angry petty little man he is. issues too. lots of issues he needs to work on. He
certainly has nothing of value to offer or to say. he's a screwed up little prick
Paul Ryan Spending Cuts Face Backlash From Moderate Republicans
You seem to take a negative view of democrats and draw reference to a study "I co-
authored with Robert Book".....sort of like a Muslim professor writing a book on
Christianity your biases disqualify you from offering anything other than a self serving
opinion....now of course I'm just using republican/fox news logic here"
Training Process
73 923 balanced_winnow 5 1 10 …
73 923 balanced_winnow 5 2 10 …
73 923 balanced_winnow 5 3 10 …
73 923 balanced_winnow 5 1 20 …
73 923 balanced_winnow 5 2 20 …
73 923 balanced_winnow 5 3 20 …
…
Train Request (a parameter set per line)
Investments are taxed as capital gains..... 1
It was the overleveraged and underregulated banks … 1
I am afraid we may be headed for … 1
In the famous words of Homer Simpson, “it takes 2 to lie …” 0
…
Training Data
Model 1Model 1
Model 2Model 2
Model 3Model 3
Model 4Model 4
Model 5Model 5
Model kModel k
Hadoop Cluster
Results

Single best model: Naïve Bayes
Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Pool for Better Results

Logistic regression using multiple model results
Pool for Better Results

Model decision on goldset approved comments

Model decision on goldset rejected comments
Further Steps

Improve the training data set

Data gathered within moderators' normal work flow

More votes per comment

More comments

Per vertical models

Incorporate comment-to-article similarity
In addition to saving his
own life, Zimmerman likely
save a couple other lives
as well.
Thanks!

Conversation and Machine Learning teams

We are hiring!
– maggie.xiong@huffingtonpost.com

More Related Content

Similar to Industrialize Sentiment Analysis for Comment Moderation

Modelling Heuristics
Modelling HeuristicsModelling Heuristics
Modelling Heuristics
Mathias Verraes
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
Bruno Capuano
 
Politics Of Usability 09
Politics Of Usability 09Politics Of Usability 09
Politics Of Usability 09
Michael Rawlins
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Gabriel Moreira
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Dhiana Deva
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 
Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008
John Sorflaten, PhD, CUXP
 
Design patterns - The Good, the Bad, and the Anti-Pattern
Design patterns -  The Good, the Bad, and the Anti-PatternDesign patterns -  The Good, the Bad, and the Anti-Pattern
Design patterns - The Good, the Bad, and the Anti-Pattern
Barry O Sullivan
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Elad Rosenheim
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
Bruno Capuano
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with Hadoop
Sangchul Song
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net
Bruno Capuano
 
Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.
Kevin Shea
 

Similar to Industrialize Sentiment Analysis for Comment Moderation (14)

Modelling Heuristics
Modelling HeuristicsModelling Heuristics
Modelling Heuristics
 
2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net2020 04 10 Catch IT - Getting started with ML.Net
2020 04 10 Catch IT - Getting started with ML.Net
 
Politics Of Usability 09
Politics Of Usability 09Politics Of Usability 09
Politics Of Usability 09
 
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
Discovering User's Topics of Interest in Recommender Systems @ Meetup Machine...
 
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008Winning At The Politics Of Usability Proposal 18 June 2008
Winning At The Politics Of Usability Proposal 18 June 2008
 
Design patterns - The Good, the Bad, and the Anti-Pattern
Design patterns -  The Good, the Bad, and the Anti-PatternDesign patterns -  The Good, the Bad, and the Anti-Pattern
Design patterns - The Good, the Bad, and the Anti-Pattern
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
 
2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net2020 01 21 Data Platform Geeks - Machine Learning.Net
2020 01 21 Data Platform Geeks - Machine Learning.Net
 
Machine Learning with Hadoop
Machine Learning with HadoopMachine Learning with Hadoop
Machine Learning with Hadoop
 
Keepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivosKeepler Data Tech | Entendiendo tus propios modelos predictivos
Keepler Data Tech | Entendiendo tus propios modelos predictivos
 
2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net2020 04 04 NetCoreConf - Machine Learning.Net
2020 04 04 NetCoreConf - Machine Learning.Net
 
Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.Enterprise 2.0 Adoption Models.
Enterprise 2.0 Adoption Models.
 

Recently uploaded

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 

Recently uploaded (20)

Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 

Industrialize Sentiment Analysis for Comment Moderation

  • 1. Industrialize Sentiment Analysis for Comment Moderation Maggie Xiong Huffington Post
  • 2.
  • 3. Basic Comment Moderation Process  User comments on an article  Moderator publishes or rejects a comment based on a set of guidelines  “10 commandments”  Comments for different articles come in every second. We would need a small army to handle the moderation. The comment should contribute to the discussion, conveying a respectful message, thought  or idea, whether or not it agrees with another user or the author. The comment should not intentionally misspell words, use non-alphabetic characters, or use  extra or missing spaces to bypass moderation. The comment should not attack, demean, belittle, or stereotype any person or group. ...
  • 4. JuLiA to the Rescue  Sentiment analysis suite - JuLiA  Supports various preprocessing options  Stemming, stopwords, etc  Includes a number of popular ML algorithms  SVM, naïve Bayes, AdaBoost (decision tree), etc  Uses hadoop for parallelizing the training of different models and for the exploration of the parameter space  Train 1000's of models with different param setup in parallel  Pick the winner for production  Ensemble the different winners for even higher accuracy
  • 5. Training Data  Goldset  About 20000 comments (~13000 train, ~7000 holdout)  Publish-or-reject votes from 3 moderators Christian and Gay? One Politician's Personal Interview (VIDEO) I'm curious if you have ever watched the film "For The Bible Tells Me So" or if you have read the book "Torn" by Justin Lee. Bottom line: Biblical interpretation varies. If that's your interpretation of the scripture then make sure you abide by it. Rick Santorum On Middle Class: 'That's Marxism Talk,' 'There's No Class In America' what an angry petty little man he is. issues too. lots of issues he needs to work on. He certainly has nothing of value to offer or to say. he's a screwed up little prick Paul Ryan Spending Cuts Face Backlash From Moderate Republicans You seem to take a negative view of democrats and draw reference to a study "I co- authored with Robert Book".....sort of like a Muslim professor writing a book on Christianity your biases disqualify you from offering anything other than a self serving opinion....now of course I'm just using republican/fox news logic here"
  • 6. Training Process 73 923 balanced_winnow 5 1 10 … 73 923 balanced_winnow 5 2 10 … 73 923 balanced_winnow 5 3 10 … 73 923 balanced_winnow 5 1 20 … 73 923 balanced_winnow 5 2 20 … 73 923 balanced_winnow 5 3 20 … … Train Request (a parameter set per line) Investments are taxed as capital gains..... 1 It was the overleveraged and underregulated banks … 1 I am afraid we may be headed for … 1 In the famous words of Homer Simpson, “it takes 2 to lie …” 0 … Training Data Model 1Model 1 Model 2Model 2 Model 3Model 3 Model 4Model 4 Model 5Model 5 Model kModel k Hadoop Cluster
  • 8. Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 9. Pool for Better Results  Logistic regression using multiple model results
  • 10. Pool for Better Results  Model decision on goldset approved comments  Model decision on goldset rejected comments
  • 11. Further Steps  Improve the training data set  Data gathered within moderators' normal work flow  More votes per comment  More comments  Per vertical models  Incorporate comment-to-article similarity
  • 12. In addition to saving his own life, Zimmerman likely save a couple other lives as well.
  • 13. Thanks!  Conversation and Machine Learning teams  We are hiring! – maggie.xiong@huffingtonpost.com