Your SlideShare is downloading. ×
0
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Empirical Sentiment Accuracy Bounds
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Empirical Sentiment Accuracy Bounds

1,201

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,201
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. On Empirical Sentiment Accuracy Bounds Shawn Rutledge, Chief Scientist
  • 2. Visible’s Sentiment ApproachVisible was one of the first Social MediaMonitoring solution in Algorithms the market. • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 3. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for We have 10s ofmillions of human Features enterprises.annotated social • Deep experience 107+ labels, 105+ media posts • Social NLP & Context topics, 102+ enterprises. Data • Massive proprietary data Copyright © 2011 Visible. All rights reserved.
  • 4. Visible’s Sentiment Approach Algorithms • State of the art A sentiment model • Beyond overhyped NLP based on years of labeling social data for Features enterprises. • Deep experience 107+ labels, 105+ • Social NLP & Context topics, 102+ enterprises. Basically all break- Datathrough in the last two • Massive proprietary data decades have come from better data Copyright © 2011 Visible. All rights reserved.
  • 5. Sentiment, The Accuracy Disconnect• Claims: “We have 97% Accuracy” There is a disconnect between the hype and the experience in the• Experience: “The best marketplace vendor tested had 50% accuracy at the post level”• Experience: Sentiment Accuracy most dissatisfying feature according to Forrester research, only 45% satisfied with vendor sentiment accuracy Copyright © 2011 Visible. All rights reserved.
  • 6. Key Findings After spending several years of research with the best available data, here are some of the key findings.1. Solve relevance first, sentiment second.2. Accuracy is the wrong measure to optimize.3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 7. Key Findings1. Solve relevance first, sentiment second.2. Accuracy is the wrong measure to optimize. We won’t have time to cover the first two. The third could be an alternate title for this talk.3. Sentiment is more subjective than you think it is. Copyright © 2011 Visible. All rights reserved.
  • 8. Audit Findings, Large Financial Institution A typical study. Double Blind, Multi-Reviewer Study:1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human2. At least two auditors grade each label. Blind to label source. labeled and AI labeled sentiment Reviewers can’t tell the difference between Visible’s statistical models and human annotators. Copyright © 2011 Visible. All rights reserved.
  • 9. Audit Findings, Large Financial InstitutionDouble Blind, Multi-Reviewer Study:1. Same posts labeled by both human No statistically significant labeling practice and automation. difference between human2. At least two auditors grade each label. Blind toSo is Sentiment “solved”? label source. labeled and AI labeled sentiment But…Auditors agree with each other only 73% of the time [95%CI: 69%-77%]. No, Auditors think people and automation are both poor. And they don’t agree with each other. Copyright © 2011 Visible. All rights reserved.
  • 10. Key Audit Findings, Large Financial Institution Social Media Professionals Grading Human Annotations Another way of looking at the same study Both auditors At least one agree with auditor agrees label only 58% with label 91% of the time of the timeProxy for Proxy for “hard” “easy”graders graders 58% - 91% is a huge range. Copyright © 2011 Visible. All rights reserved.
  • 11. True Across a Wide Variety of Problems This talk Multi-Reviewer 3rd party audits across a promised variety of Brands consistently show bounds and relatively low agreement rates.here they are.About 81% Inter-Annotator Agreement [IQR: 78% - 83%] Copyright © 2011 Visible. All rights reserved.
  • 12. True Across a Wide Variety of Problems Multi-Reviewer 3rd party audits across a variety of Brands consistently show relatively low agreement rates.About 81% Inter-Annotator Agreement [IQR: 78% - 83%] 80% is also consistent with academic research Copyright © 2011 Visible. All rights reserved.
  • 13. Take Aways1. Yes, your team2. Evaluating sentiment takes care3. Accuracy claims inbetter than average drivers. We all think we’re the 90s are either exaggerated or naïve (over-fit) of us have heard something like the Similarly, although most 80% agreement statistic, we don’t think it applies to us. The4. It main thing I want you totake away from this talk istight will take effort to get your team in that it agreement in the People withinyou, disagree with your does apply to you. team, sitting on sentiment your department, you cube next to definitions5. Real breakthroughs inofsentiment accuracy will about 20% the time. come from personalization Copyright © 2011 Visible. All rights reserved.
  • 14. Take Aways1. Yes, your team2. Evaluating sentiment takes care3. Accuracy claims in the 90s are either exaggerated or naïve (over-fit)4. It will The implicationsto get yourtaking in tight take effort are also worth team agreement When people claim accuracies to heart. on sentiment definitions much higher than 80% they are either5. Real breakthroughs in sentiment accuracy will lying or they don’t know what they are come from personalization . doing (overfit to one dataset) Copyright © 2011 Visible. All rights reserved.
  • 15. Take Aways1. Yes, your what has happened in Search, real breakthroughs will come Similar to team though personalization. Deeper linguistics (dealing with sarcasm, humor,2. Evaluating sentiment takesbut can’t help break the 80% barrier. contextual knowledge) are interesting care3. Accuracythe work into getting90s are either exaggerated (with If teams put claims in the tight, consistent sentiment definitions or naïve (over-fit) then do algorithms have a chance to do that well. >80% agreement), only4. It will take effort to get your team in tight agreement on sentiment definitions5. Real breakthroughs in sentiment accuracy will come from personalization Copyright © 2011 Visible. All rights reserved.
  • 16. @shawnrut @Visible VisibleTechnologies.comThank You!

×