Sentiment, News, and the Polarity Problem, Leslie Barrett


Published on

Although sentiment analysis has a strong history of success on 
customer feedback and certain blogs and editorials, accuracy results are 
mixed for data in the absence of an opinion holder. In particular, news data 
poses some unique challenges to accuracy for sentiment analysis due to the 
blending of what I will call "objective" polarity with opinion-based 
polarity. How is (document-level) Sentiment to be determined, for example, 
in an article about the Haitian earthquake that discusses humanitarian aid? 
Similarly, an article about Bernard Madoff’s jail sentence shows a highly 
negative “objective polarity” somehow mitigated by a subsequent action. And 
how can we tease an author’s opinion from the semantics of objective polarity 
where they exist in news data? Author opinion (often referred to as “bias”) 
in news data is subtle in its indication by design. This talk discusses the grounding of the concept of "sentiment" within the greater 
context of the Semantics of Opposition.

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Sentiment, News, and the Polarity Problem, Leslie Barrett

  1. 1. Sentiment, News and the Polarity Problem<br />Leslie Barrett<br /><br />April 13, 2010<br />
  2. 2. Sentiment and Opinion<br />Are sentiment and opinion the same?<br />Are feelings the same as beliefs?<br />Sentiment can be applied to opinion but not the other way around (Kim And Hovy 2004)<br />The question is – should it apply to anything else? Does it make sense in narrative, exposition, news data?<br />How much text should we apply it to?<br />
  3. 3. Sources<br />Sentiment analysis has been applied where opinion is the norm – blogs and Tweets<br />It has also been applied where opinion is designed to be subtle, if expressed at all – news data<br />So maybe news data is never really objective, or else maybe sentiment is really used as simple polarity – separating the world into human ideas of positive and negative “buckets” blind to objectivity<br />
  4. 4. Polarity <br />Polarity is the stuff through which sentiment is measured<br />Sentiment is usually considered to have the “poles” positive and negative<br />These are most often “translated” into “good” and “bad”<br />Sentiment analysis is really considered useful for telling us what is “good” and “bad” in our information stream<br />
  5. 5. The “Machine”<br />So the sentiment analysis machine takes in some text and tells us whether that text says something “good” or “bad”.<br />OK…..but before we unveil our machine, we need to ask some important but often overlooked questions:<br /> - what text is going in?<br /> - where does “good” stop and “bad” begin?<br />- what is the text “about”?<br />
  6. 6. Why do we need<br />Sentiment Analysis<br />Beavis?<br />So we’ll know what we’re thinking!<br />
  7. 7. Let’s Try Feeding the Machine News Data!<br />News Headlines sound like a pretty straightforward text type to apply sentiment to, given what we’ve just said.<br />Even though news is supposed to be “objective”, headlines sell papers and often can be dramatic<br />Keywords like “crash”, “downturn” and “disaster” are abundant and strong sentiment indictors.<br /> - but are headlines enough? <br /> - we may want document-level sentiment for news<br />- does it matter what the news is “about”?<br />
  8. 8. Some “real” headlines<br />Short-lived<br />Coup<br />Disappoints<br />Bears<br />
  9. 9. Beware of Headlines in Financial News<br />financial news especially is really a genre unto itself<br />Its polarity perspective is skewed constantly by pundit “benchmarking”<br />Beating bad expectations is better than a good quarter that falls short – in pundit opinion<br />
  10. 10. Can Sentiment Analysis “beat expectations”?<br />All kinds of negatives here but the document-level sentiment should be positive – that’s how an analyst would see it<br />So if you skew to this, what about other news?<br />
  11. 11. Objectively “bad” Events Happen<br />Some events don’t require an opinion holder <br />They simply have a generally agreed upon negative or positive polarity <br />And we need to get them right because they affect other events (e.g. crop yields, etc)<br />
  12. 12. When Bad Things Happen to Positive Sentiment<br />But objectively bad events have their own problems, even in the absence of “expectations”. <br />The problem with polarity measures outside of the presence of an opinion holder is topic drift<br />An editorial or blog is likely to stick to one sentiment, but bad events can have the dreaded “silver lining”<br />
  13. 13. Disaster+Relief Can Spell Trouble<br />Despite some strong negative polarity indicators like “traumatized”, “disaster” and “tsunami” this article has an overall positive theme<br />
  14. 14. Don’t Quote Me!<br />Another problem in news data is “opinion blend”<br />Often you have an author’s opinion but other opinions that may differ – directly or indirectly cited<br />Or an author using quotes to showcase two different opinions<br />Coverage of a “debate” for example can get very difficult for even a human to judge<br />
  15. 15. Attribution vs. Quoting<br />The author clearly does not believe the positive topic of the article<br />But Clinton believes it<br />So is this positive sentiment about Clinton? <br />
  16. 16. Pundits vs. Authors vs. Topics<br />How can I be sure that “bad news” about my client is about my client?<br />Make sure the named entity in question is a topic of the document<br />So-called “document mates” don’t matter<br />Do author names matter? Should I extract them?<br />Yes! Over time if you classify by author name against other entities you might detect bias<br />Do the same for known “pundits” on a topic…..same result may emerge<br />
  17. 17. What’s it all About?<br />Some data just tends to be multi-thematic or non-thematic<br />In particular, market and financial reports, which often make their way into news feeds, tend to be this way.<br />It is very hard to get a reasonable sentiment reading on either type of document. <br />
  18. 18. SEC Reports: too big, too many sections<br />There is the Management Discussion, which can have appropriate sentiment scores<br />But there are so many other sections, no single theme<br />Many sections have boilerplate, such as the accounting review<br />
  19. 19. Scraping<br />Your data is only as good as your news feed.<br />Sometimes a site will deliver excess content that creeps into the text field of a feed<br />That content could be an ad or even another article, skewing the sentiment reading for the expected article and hurting topic detection too.<br />
  20. 20. Field Overlap from a Typical News Page<br />
  21. 21. What to Do?<br />Stop doing Sentiment Analysis on news data?<br />NO!<br />News data is very valuable for reputation management<br />Also can be valuable for investment firms *if* you can tease out the jargon and pundit-speak<br />Document-level is still OK!<br />
  22. 22. Best Practices<br />Good topic detection<br /> - see what’s closely aligned with a theme and eliminate non-thematic or weak-thematic documents<br />Good feed maintenance<br /> - you or your feed provider need to spot check for scraping problems<br />
  23. 23. Tricks & Tips<br />Data extraction for problem documents<br />If document sections are identified with tags, use them (this is true for SEC reports) and extract the “good” data (see Pang and Lee 2004 on extracting document portions)<br />Write regular expression libraries to find quoted and cited material. Remove or use separately<br />Topic drift is harder but….<br />you can extract the first n paragraphs. Main topical material in news generally in top 25% of document<br />Secondary topics don’t carry same weight<br />
  24. 24. What’s Next for Polarity?<br />Future directions for news-based sentiments analysis are based on looking outside of Positive and Negative poles<br />Think about all the “opposites” in the world<br />Sweet/sour<br />Cold/hot<br />Inside/outside<br />Wet/dry<br />Hard/soft<br />
  25. 25. Leverage the Semantics of Opposition<br />There are many types of opposition to study and they can be used in different ways<br />Complementary opposites (male,female)<br />Reversatives (backwards, forwards)<br />Scalar opposites (tall, short)<br />A good deal of semantic research that has yet to be leveraged for opinion analysis and classification (Mettinger, Pustejovsky, Kennedy, Miller, inter alia…)<br />
  26. 26. Opposites and Opinions<br />Let’s think of some opinions that fit into poles not definable in terms of “positive” and “negative”<br />Conserative vs. Liberal<br />Government Expansion vs. Privatization<br />Can these positions be detected automatically?<br />………..<br />
  27. 27. Appendix/Bibliography<br />Kim, Soo-Min and Eduard Hovy. 2004. Determining the Sentiment of Opinions. Proceedings of COlING-04. pp. 1367--1373. Geneva, Switzerland. <br />James Pustejovsky, "Events and the Semantics of Opposition" in Events as Grammatical Objects , C. Tenny and J. Pustejovsky (eds.), 2000, CSLI Publications. <br />Arthur Mettinger, Aspects of Semantic Opposition in English, Clarendon Press, Oxford, 1994<br />Bo Pang and Lillian Lee, “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts”, In Proceedings of the Association for Computational Linguistics, 2004<br />
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.