Practical Sentiment Analysis

8,165 views

Published on

An informative tutorial on practical sentiment analysis, natural language processing, and semi-supervised learning. Learn how your company can leverage the crowd for sentiment analysis of structured and un-structured content.


Dr. Jason Baldridge is co-founder and Chief Scientist at People Pattern, and Associate Professor, Linguistics, at the University of Texas

Published in: Data & Analytics

Practical Sentiment Analysis

  1. 1. Practical Sentiment Analysis Tutorial Jason Baldridge @jasonbaldridge Sentiment Analysis Symposium 2014 Associate Professor Co-founder & Chief Scientist Wednesday, March 5, 14
  2. 2. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 About the presenter Associate Professor, Linguistics Department, The University of Texas at Austin (2005-present) Ph.D., Informatics, The University of Edinburgh, 2002 MA (Linguistics), MSE (Computer Science), The University of Pennsylvania, 1998 Co-founder & Chief Scientist, People Pattern (2013-present) Built Converseon’s Convey text analytics engine, with Philip Resnik and Converseon programmers. 2 Wednesday, March 5, 14
  3. 3. Why NLP is hard Sentiment analysis overview Document classification break Aspect-based sentiment analysis Visualization Semi-supervised learning break Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  4. 4. Why NLP is hard Sentiment analysis overview Document classification Aspect-based sentiment analysis Visualization Semi-supervised learning Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  5. 5. Text is pervasive = big opportunities Wednesday, March 5, 14
  6. 6. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Texts as bags of words (with apologies) (http://www.wordle.net/) 6 Wednesday, March 5, 14
  7. 7. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Texts as bags of words (with apologies) (http://www.wordle.net/) http://www.wired.com/magazine/2010/12/ff_ai_essay_airevolution/ 6 Wednesday, March 5, 14
  8. 8. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 leg on manthe dog bit Wednesday, March 5, 14
  9. 9. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 legonmanthe dogbit thethe Wednesday, March 5, 14
  10. 10. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 legonmanthe dogbit thethe mandog Wednesday, March 5, 14
  11. 11. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 legonmanthe dogbit thethe mandog Wednesday, March 5, 14
  12. 12. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 legonmanthe dogbit thethe mandog Subject Object Modifier Wednesday, March 5, 14
  13. 13. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 That is of course not the full story... Texts are not just bags-of-words. Order and syntax affect interpretation of utterances. 7 legonmanthe dogbit thethe mandog Subject Object Location Wednesday, March 5, 14
  14. 14. What does this sentence mean? I saw her duck with a telescope. Slide by Lillian Lee Wednesday, March 5, 14
  15. 15. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. Slide by Lillian Lee Wednesday, March 5, 14
  16. 16. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  17. 17. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  18. 18. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  19. 19. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  20. 20. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  21. 21. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb Slide by Lillian Lee Wednesday, March 5, 14
  22. 22. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun Slide by Lillian Lee Wednesday, March 5, 14
  23. 23. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun Slide by Lillian Lee Wednesday, March 5, 14
  24. 24. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun Slide by Lillian Lee Wednesday, March 5, 14
  25. 25. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun Slide by Lillian Lee Wednesday, March 5, 14
  26. 26. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun Slide by Lillian Lee Wednesday, March 5, 14
  27. 27. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun [http://www.clker.com/clipart-green-eyes-3.html] Slide by Lillian Lee Wednesday, March 5, 14
  28. 28. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun [http://www.clker.com/clipart-3163.html] Slide by Lillian Lee Wednesday, March 5, 14
  29. 29. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun [http://www.simonpalfrader.com/category/tournament-poker] Slide by Lillian Lee Wednesday, March 5, 14
  30. 30. What does this sentence mean? [http://casablancapa.blogspot.com/2010/05/fore.htm]l [http://www.supercoloring.com/pages/duck-outline/] I saw her duck with a telescope. verb noun [http://casablancapa.blogspot.com/2010/05/fore.htm]l Slide by Lillian Lee Wednesday, March 5, 14
  31. 31. Ambiguity is pervasive the a are of I [Steve Abney] Slide by Lillian Lee Wednesday, March 5, 14
  32. 32. Ambiguity is pervasive the a are of I [Steve Abney] an “are” (100 m2) another “are” Slide by Lillian Lee Wednesday, March 5, 14
  33. 33. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him. Wednesday, March 5, 14
  34. 34. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him. Max fell. John pushed him. Wednesday, March 5, 14
  35. 35. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him.(Because) Explanation Max fell. John pushed him. Wednesday, March 5, 14
  36. 36. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him.(Because) Explanation Max fell. John pushed him.(Then) Continuation Wednesday, March 5, 14
  37. 37. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him.(Because) Explanation Max fell. John pushed him.(Then) Continuation pushing precedes falling Wednesday, March 5, 14
  38. 38. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 And it goes further... Rhetorical structure affects the interpretation of the text as a whole. 10 Max fell. John pushed him.(Because) Explanation Max fell. John pushed him.(Then) Continuation pushing precedes falling falling precedes pushing Wednesday, March 5, 14
  39. 39. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  40. 40. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] To get a spare tire (donut) for his car? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  41. 41. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] store where donuts shop? or is run by donuts? or looks like a big donut? or made of donut? or has an emptiness at its core? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  42. 42. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] I stopped smoking freshman year, but John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  43. 43. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] Describes where the store is? Or when he stopped? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  44. 44. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] Well, actually, he stopped there from hunger and exhaustion, not just from work. John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  45. 45. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] At that moment, or habitually? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  46. 46. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] That’s how often he thought it? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  47. 47. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] But actually, a coffee only stays good for about 10 minutes before it gets cold. John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  48. 48. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] Similarly: In America a woman has a baby every 15 minutes. Our job is to find that woman and stop her. John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  49. 49. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] the particular coffee that was good every few hours? the donut store? the situation? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  50. 50. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] too expensive for what? what are we supposed to conclude about what John did? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  51. 51. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 201411 What’s hard about this story? [Slide from Jason Eisner] how do we connect “it” to “expensive”? John stopped at the donut store on his way home from work. He thought a coffee was good every few hours. But it turned out to be too expensive there. Wednesday, March 5, 14
  52. 52. NLP has come a long way Wednesday, March 5, 14
  53. 53. Sentiment analysis overview Why NLP is hard Document classification Aspect-based sentiment analysis Visualization Semi-supervised learning Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  54. 54. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Sentiment analysis: background [slide from Lillian Lee] People search for and are affected by online opinions. TripAdvisor, Rotten Tomatoes, Yelp, Amazon, eBay, YouTube, blogs, Q&A and discussion sites According to a Comscore ’07 report and an ’08 Pew survey: 60% of US residents have done online product research, and 15% do so on a typical day. 73%-87% of US readers of online reviews of services say the reviews were significant influences. (more on economics later) But, 58% of US internet users report that online information was missing, impossible to find, confusing, and/or overwhelming. Creating technologies that find and analyze reviews would answer a tremendous information need. 14 Wednesday, March 5, 14
  55. 55. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Broader implications: economics [slide from Lillian Lee] Consumers report being willing to pay from 20% to 99% more for a 5-star-rated item than a 4-star-rated item. [comScore] But, does the polarity and/or volume of reviews have measurable, significant influence on actual consumer purchasing? Implications for bang-for-the-buck, manipulation, etc. 15 Wednesday, March 5, 14
  56. 56. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Social media analytics: acting on sentiment 16 Richard Lawrence, Prem Melville, Claudia Perlich, Vikas Sindhwani, Estepan Meliksetian et al. In ORMS Today, Volume 37, Number 1, February, 2010. Wednesday, March 5, 14
  57. 57. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity classification [slide from Lillian Lee] Consider just classifying an avowedly subjective text unit as either positive or negative (“thumbs up or “thumbs down”). One application: review summarization. Elvis Mitchell, May 12, 2000: It may be a bit early to make such judgments, but Battlefield Earth may well turn out to be the worst movie of this century. Can’t we just look for words like “great”, “terrible”, “worst”? Yes, but ... learning a sufficient set of such words or phrases is an active challenge. 17 Wednesday, March 5, 14
  58. 58. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Using a lexicon [slide from Lillian Lee] From a small scale human study: 18 Proposed word lists Accuracy Subject 1 Positive: dazzling, brilliant, phenomenal, excellent, fantastic Negative: suck, terrible, awful, unwatchable, hideous 58% Subject 2 Positive: gripping, mesmerizing, riveting, spectacular, cool, awesome, thrilling, badass, excellent, moving, exciting Negative: bad, cliched, sucks, boring, stupid, slow 64% Automatically determined (from data) Positive: love, wonderful, best, great, superb, beautiful, still Negative: bad, worst, stupid, waste, boring, ?, ! 69% Wednesday, March 5, 14
  59. 59. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity words are not enough [slide from Lillian Lee] Can’t we just look for words like “great” or “terrible”? Yes, but ... This laptop is a great deal. A great deal of media attention surrounded the release of the new laptop. This laptop is a great deal ... and I’ve got a nice bridge you might be interested in. 19 Wednesday, March 5, 14
  60. 60. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity words are not enough Polarity flippers: some words change positive expressions into negative ones and vice versa. Negation: America still needs to be focused on job creation. Not among Obama's great accomplishments since coming to office !! [From a tweet in 2010] Contrastive discourse connectives: I used to HATE it. But this stuff is yummmmmy :) [From a tweet in 2011 -- the tweeter had already bolded “HATE” and “But”!] Multiword expressions: other words in context can make a negative word positive: That movie was shit. [negative] That movie was the shit. [positive] (American slang from the 1990’s) 20 Wednesday, March 5, 14
  61. 61. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 More subtle sentiment (from Pang and Lee) With many texts, no ostensibly negative words occur, yet they indicate strong negative polarity. “If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.” (review by Luca Turin and Tania Sanchez of the Givenchy perfume Amarige, in Perfumes: The Guide, Viking 2008.) “She runs the gamut of emotions from A to B.” (Dorothy Parker, speaking about Katharine Hepburn.) “Jane Austen’s books madden me so that I can’t conceal my frenzy from the reader. Every time I read ‘Pride and Prejudice’ I want to dig her up and beat her over the skull with her own shin-bone.” (Mark Twain.) 21 Wednesday, March 5, 14
  62. 62. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Thwarted expectations (from Pang and Lee) 22 This film should be brilliant. It sounds like a great plot, the actors are first grade, and the supporting cast is good as well, and Stallone is attempting to deliver a good performance. However, it can’t hold up. There are also highly negative texts that use lots of positive words, but ultimately are reversed by the final sentence. For example This is referred to as a thwarted expectations narrative because in the final sentence the author sets up a deliberate contrast to the preceding discourse, giving it more impact. Wednesday, March 5, 14
  63. 63. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity classification: it’s more than positive and negative Positive: “As a used vehicle, the Ford Focus represents a solid pick.” Negative: “Still, the Focus' interior doesn't quite measure up to those offered by some of its competitors, both in terms of materials quality and design aesthetic.” Neutral: “The Ford Focus has been Ford's entry-level car since the start of the new millennium.” Mixed: “The current Focus has much to offer in the area of value, if not refinement.” 23 http://www.edmunds.com/ford/focus/review.html Wednesday, March 5, 14
  64. 64. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Other dimensions of sentiment analysis Subjectivity: is an opinion even being expressed? Many statements are simply factual. Target: what exactly is an opinion being expressed about? Important for aggregating interesting and meaningful statistics about sentiment. Also, it affects how the language use indicates polarity: e.g, unpredictable is usually positive for movie reviews, but is very negative for a car’s steering Ratings: rather than a binary decision, it is often of interest to provide or interpret predictions about sentiment on a scale, such as a 5-star system. 24 Wednesday, March 5, 14
  65. 65. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Other dimensions of sentiment analysis Perspective: an opinion can be positive or negative depending on who is saying it entry-level could be good or bad for different people it also affects how an author describes a topic: e.g. pro-choice vs pro-life, affordable health care vs obamacare. Authority: was the text written by someone whose opinion matters more than others? it is more important to identify and address negative sentiment expressed by a popular blogger than a one-off commenter or supplier of a product reviewer on a sales site follower graphs (where applicable) are very useful in this regard Spam: is the text even valid or at least something of interest? many tweets and blog post comments are just spammers trying to drive traffic to their sites 25 Wednesday, March 5, 14
  66. 66. Document Classification Why NLP is hard Sentiment analysis overview Aspect-based sentiment analysis Visualization Semi-supervised learning Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  67. 67. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Text analysis, in brief 27 f( , ,... ) = [ , ,... ] Wednesday, March 5, 14
  68. 68. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Text analysis, in brief 27 f( , ,... ) = [ , ,... ] Sentiment labels Parts-of-speech Named Entities Topic assignments Geo-coordinates Syntactic structures Translations Wednesday, March 5, 14
  69. 69. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Text analysis, in brief 27 f( , ,... ) = [ , ,... ] Sentiment labels Parts-of-speech Named Entities Topic assignments Geo-coordinates Syntactic structures Translations Rules Annotation & Learning - annotated examples - annotated knowledge - interactive annotation and learning Scalable human annotation Wednesday, March 5, 14
  70. 70. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Document classification: automatically label some text Language identification: determine the language that a text is written in Spam filtering: label emails, tweets, blog comments as spam (undesired) or ham (desired) Routing: label emails to an organization based on which department should respond to them (e.g. complaints, tech support, order status) Sentiment analysis: label some text as being positive or negative (polarity classification) Georeferencing: identify the location (latitude and longitude) associated with a text 28 Wednesday, March 5, 14
  71. 71. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Desiderata for text analysis function f task is well-defined outputs are meaningful precision, recall, etc. are measurable and sufficient for desired use 29 Performant Wednesday, March 5, 14
  72. 72. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Desiderata for text analysis function f task is well-defined outputs are meaningful precision, recall, etc. are measurable and sufficient for desired use 29 Performant affordable access to annotated examples and/or knowledge sources able to exploit indirect or noisy annotations access to unlabeled examples and ability to exploit them tools to learn f are available or can be built within budget Reasonable cost (time & money) Wednesday, March 5, 14
  73. 73. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Four sentiment datasets 30 Dataset Topic Year # Train # Dev #Test Reference Debate08 Obama vs McCain debate 2008 795 795 795 Shamma, et al. (2009) "Tweet the Debates: Understanding Community Annotation of Uncollected Sources." HCR Health care reform 2010 839 838 839 Speriosu et al. (2011) "Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph." STS (Stanford) Twitter Sentiment 2009 - 216 - Go et al. (2009) "Twitter sentiment classification using distant supervision" IMDB IMDB movie reviews 2011 25,000 25,000 - Mas et al. (2011) "Learning WordVectors for Sentiment Analysis" Wednesday, March 5, 14
  74. 74. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Rule-based classification Identify words and patterns that are indicative of positive or negative sentiment: polarity words: e.g. good, great, love; bad, terrible, hate polarity ngrams: the shit (+), must buy (+), could care less (-) casing: uppercase often indicates subjectivity punctuation: lots of ! and ? indicates subjectivity (often negative) emoticons: smiles like :) are generally positive, while frowns like :( are generally negative Use each pattern as a rule; if present in the text, the rule indicates whether the text is positive or negative. How to deal with conflicts? (E.g. multiple rules apply, but indicate both positive and negative?) Simple: count number of matching rules and take the max. 31 Wednesday, March 5, 14
  75. 75. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Simplest polarity classifier ever? 32 def polarity(document) = if (document contains “good”) positive else if (document contains “bad”) negative else neutral Debate08 HCR STS IMDB 20.5 21.6 19.4 27.4 No better than flipping a (three-way) coin? Code and data here: https://github.com/utcompling/sastut Wednesday, March 5, 14
  76. 76. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 + is positive, - is negative, ~ is neutral Wednesday, March 5, 14
  77. 77. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Corpus labels + is positive, - is negative, ~ is neutral Wednesday, March 5, 14
  78. 78. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Corpus labels Machine predictions + is positive, - is negative, ~ is neutral Wednesday, March 5, 14
  79. 79. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions + is positive, - is negative, ~ is neutral Wednesday, March 5, 14
  80. 80. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions Correct predictions + is positive, - is negative, ~ is neutral Wednesday, March 5, 14
  81. 81. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions Correct predictions + is positive, - is negative, ~ is neutral (5+140+18)/795 = 0.205 Wednesday, March 5, 14
  82. 82. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions Correct predictions Incorrect predictions + is positive, - is negative, ~ is neutral (5+140+18)/795 = 0.205 Wednesday, March 5, 14
  83. 83. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions Column showing outcomes of documents labeled negative by the machine Correct predictions Incorrect predictions + is positive, - is negative, ~ is neutral (5+140+18)/795 = 0.205 Wednesday, March 5, 14
  84. 84. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The confusion matrix We need to look at the confusion matrix and breakdowns for each label. For example, here it is for Debate08: 33 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 Total count of documents in the corpus Corpus labels Machine predictions Row showing outcomes of documents labeled negative in the corpus Column showing outcomes of documents labeled negative by the machine Correct predictions Incorrect predictions + is positive, - is negative, ~ is neutral (5+140+18)/795 = 0.205 Wednesday, March 5, 14
  85. 85. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Precision, Recall, and F-score: per category scores Precision: the number of correct guesses (true positives) for the category divided by all guesses for it (true positives and false positives) Recall: the number of correct guesses (true positives) for the category divided by all the true documents in that category (true positives plus false negatives) F-score: derived measure combining precision and recall. 34 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 P R F - ~ + Avg 83.3 1.1 2.2 18.3 99.3 30.1 72.0 9.0 16.0 57.9 36.5 16.4 P = TP/(TP+FP) R = TP/(TP+FN) F = 2PR/(P+R) P~ = 140+442+182 140 = .183 R- = 5+442+7 5 = .011 F+ = .72+.09 2 × .72 × .09 = .16 Wednesday, March 5, 14
  86. 86. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 What does it tell us? Overall accuracy is low, because the model overpredicts neutral. Precision is pretty good for negative, and okay for positive. This means the simple rules “has the word ‘good’” and “has the word ‘bad’” are good predictors. 35 - ~ + - ~ + 5 442 7 454 1 140 0 141 0 182 18 200 6 764 25 795 P R F - ~ + Avg 83.3 1.1 2.2 18.3 99.3 30.1 72.0 9.0 16.0 57.9 36.5 16.4 Wednesday, March 5, 14
  87. 87. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Where do the rules go wrong? Confusion matrix for STS: 36 The one negative-labeled tweet that is actually positive, using the very positive expression “bad ass” (thus matching “bad”). Booz Allen Hamilton has a bad ass homegrown social collaboration platform.Way cool! #ttiv - ~ + - ~ + 0 73 2 75 0 31 2 33 1 96 11 108 1 200 15 216 Wednesday, March 5, 14
  88. 88. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 A bigger lexicon (rule set) and a better rule Good improvements for five minutes of effort! Why such a large improvement for IMDB? 37 pos_words = {"good","awesome","great","fantastic","wonderful"} neg_words = {"bad","terrible","worst","sucks","awful","dumb"} def polarity(document) = num_pos = count of words in document also in pos_words num_neg = count of words in document also in neg_words if (num_pos == 0 and num_neg == 0) neutral else if (num_pos > num_neg) positive else negative Debate08 HCR STS IMDB Super simple Small lexicon 20.5 21.6 19.4 27.4 21.5 22.1 25.5 51.4 Wednesday, March 5, 14
  89. 89. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 IMDB: no neutrals! Data is from 10 star movie ratings (>=7 are pos, <= 4 are neg) Compare the confusion matrices! 38 “Good/Bad” rule Small lexicon with counting rule - ~ + - ~ + 2324 5476 4700 12500 0 0 0 0 651 7325 4524 12500 2975 12801 9224 25000 Accuracy: 27.4 - ~ + - ~ + 5744 3316 3440 12500 0 0 0 0 1147 4247 7106 12500 6891 7563 10546 25000 Accuracy: 51.4 Wednesday, March 5, 14
  90. 90. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Sentiment lexicons: Bing Liu’s opinion lexicon Bing Liu maintains and freely distributes a sentiment lexicon consisting of lists of strings. Distribution page (direct link to rar archive) Positive words: 2006 Negative words: 4783 Useful properties: includes mis-spellings, morphological variants, slang, and social-media mark-up Note: may be used for academic and commercial purposes. 39 Slide by Chris Potts Wednesday, March 5, 14
  91. 91. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Sentiment lexicons: MPQA The MPQA (Multi-Perspective Question Answering) Subjectivity Lexicon is maintained by Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (Wiebe, Wilson, and Cardie 2005). Note: distributed under a GNU Public License (not suitable for most commercial uses). 40 Slide by Chris Potts Wednesday, March 5, 14
  92. 92. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Other sentiment lexicons SentiWordNet (Baccianella, Esuli, and Sebastiani 2010) attaches positive and negative real-valued sentiment scores to WordNet synsets (Fellbaum1998). Note: recently changed license to permissive, commercial-friendly terms. Harvard General Inquirer is a lexicon attaching syntactic, semantic, and pragmatic information to part-of-speech tagged words (Stone, Dunphry, Smith, and Ogilvie 1966). Linguistic Inquiry and Word Counts (LIWC) is a proprietary database consisting of a lot of categorized regular expressions. Its classifications are highly correlated with those of the Harvard General Inquirer. 41 Slide by Chris Potts Wednesday, March 5, 14
  93. 93. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 When you have a big lexicon, use it! 42 Debate08 HCR STS IMDB Super simple Small lexicon Opinion lexicon 20.5 21.6 19.4 27.4 21.5 22.1 25.5 51.4 47.8 42.3 62.0 73.6 Using Bing Liu’s Opinion Lexicon, scores across all datasets go up dramatically. Well above (three-way) coin-flipping! Wednesday, March 5, 14
  94. 94. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 If you don’t have a big lexicon, bootstrap one There is a reasonably large literature on creating sentiment lexicons, using various sources such as WordNet (knowledge source) and review data (domain-specific data source). Advantage of review data: often able to obtain easily for many languages. See Chris Potts’ 2011 SAS tutorial for more details: http://sentiment.christopherpotts.net/lexicons.html A simple, intuitive measure is the log-likelihood ratio, which I’ll show for IMDB data. 43 Wednesday, March 5, 14
  95. 95. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Log-likelihood ratio: basic recipe Given: a corpus of positive texts, negative texts, and a held out corpus For each word in the vocabulary, calculate its probability in each corpus. E.g. for the positive corpus: Compute its log-liklihood ratio for positive vs negative documents: Rank all words from highest LLR to lowest. 44 Wednesday, March 5, 14
  96. 96. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 LLR examples computed from IMDB reviews 45 edie 16.069394855429000 antwone 15.85538381213240 din 15.747494864581600 goldsworthy 15.552434312463400 gunga 15.536930128612500 kornbluth -15.090106131301700 kareena -15.11542393212590 tashan -15.233206936755000 hobgoblins -15.233206936755000 slater -15.318364724832600 + - Wednesday, March 5, 14
  97. 97. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 LLR examples computed from IMDB reviews 45 edie 16.069394855429000 antwone 15.85538381213240 din 15.747494864581600 goldsworthy 15.552434312463400 gunga 15.536930128612500 kornbluth -15.090106131301700 kareena -15.11542393212590 tashan -15.233206936755000 hobgoblins -15.233206936755000 slater -15.318364724832600 + - Filter: Wednesday, March 5, 14
  98. 98. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 LLR examples computed from IMDB reviews 45 edie 16.069394855429000 antwone 15.85538381213240 din 15.747494864581600 goldsworthy 15.552434312463400 gunga 15.536930128612500 kornbluth -15.090106131301700 kareena -15.11542393212590 tashan -15.233206936755000 hobgoblins -15.233206936755000 slater -15.318364724832600 + - perfection 2.204227744897310 captures 2.0551924704260400 wonderfully 2.020824971323010 powell 1.9933170865620900 refreshing 1.867299924519800 pointless -2.477406360270270 blah -2.57814744950696 waste -2.673668672544840 unfunny -2.7084876042405500 seagal -3.6618321047833000 Filter: Wednesday, March 5, 14
  99. 99. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Top 25 filtered positive and negative words using LLR on IMDB 46 perfection captures wonderfully powell refreshing flynn delightful gripping beautifully underrated superb delight welles unforgettable touching favorites extraordinary stewart brilliantly friendship wonderful magnificent finest marie jackie horrible unconvincing uninteresting insult uninspired sucks miserably boredom cannibal godzilla lame wasting remotely awful poorly laughable worst lousy redeeming atrocious pointless blah waste unfunny seagal + - Some obvious film domain dependence, but also lots of generally good valence determinations. Wednesday, March 5, 14
  100. 100. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Using the learned lexicon There are various ways to use the LLR ranks: Take the top N of positive and negative and use them as the positive and negative sets. Combine the top N with another lexicon (e.g. the super small one or the Opinion Lexicon). Take the top N and manually prune words that are not generally applicable. Use the LLR values as the input to a more complex (and presumably more capable) algorithm. Here we’ll try three things: IMDB100: the top 100 positive and 100 negative filtered words IMDB1000: the top 1000 positive and 1000 negative filtered words Opinion Lexicon + IMDB1000: take the union of positive terms in Opinion Lexicon and IMDB1000, and same for the negative terms. 47 Wednesday, March 5, 14
  101. 101. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Better lexicons can get pretty big improvements! 48 Debate08 HCR STS IMDB Super simple Small lexicon Opinion lexicon IMDB100 IMDB1000 Opinion Lexicon + IMDB1000 20.5 21.6 19.4 27.4 21.5 22.1 25.5 51.4 47.8 42.3 62.0 73.6 24.1 22.6 35.7 77.9 58.0 45.6 50.5 66.0 62.4 49.1 56.0 66.1 Wednesday, March 5, 14
  102. 102. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Better lexicons can get pretty big improvements! 48 Debate08 HCR STS IMDB Super simple Small lexicon Opinion lexicon IMDB100 IMDB1000 Opinion Lexicon + IMDB1000 20.5 21.6 19.4 27.4 21.5 22.1 25.5 51.4 47.8 42.3 62.0 73.6 24.1 22.6 35.7 77.9 58.0 45.6 50.5 66.0 62.4 49.1 56.0 66.1 Nonetheless: for the reasons mentioned previously, this strategy eventually runs out of steam. It is a starting point. Wednesday, March 5, 14
  103. 103. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification The rule-based approach requires defining a set of ad hoc rules and explicitly managing their interaction. If we instead have lots of examples of texts of different categories, we can learn a function that maps new texts to one category or the other. What were rules become features that are extracted from the input; their importance is extracted from statistics in a labeled training set. These features are dimensions; their values for a given text plot it into space. 49 Wednesday, March 5, 14
  104. 104. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification Idea: software learns from examples it has seen. Find the boundary between different classes of things, such as spam versus not-spam emails. 50 Ham Spam Wednesday, March 5, 14
  105. 105. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification Idea: software learns from examples it has seen. Find the boundary between different classes of things, such as spam versus not-spam emails. 50 Ham Spam Wednesday, March 5, 14
  106. 106. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification Idea: software learns from examples it has seen. Find the boundary between different classes of things, such as spam versus not-spam emails. 50 Ham Spam Wednesday, March 5, 14
  107. 107. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification Idea: software learns from examples it has seen. Find the boundary between different classes of things, such as spam versus not-spam emails. 50 Ham Spam Wednesday, March 5, 14
  108. 108. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Machine learning for classification Given a set of labeled points, there are many standard methods for learning linear classifiers. Some popular ones are: Naive Bayes Logistic Regression / Maximum Entropy Perceptrons Support Vector Machines (SVMs) The properties of these classifier types are widely covered in tutorials, code, and homework problems. There are various reasons to prefer one or the other of these, depending on amount of training material, tolerance for longer training times, and the complexity of features used. 51 Wednesday, March 5, 14
  109. 109. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for document classification All of the linear classifiers require documents to be represented as points in some n-dimensional space. Each dimension corresponds to a feature, or observation about a subpart of a document. A feature’s value is typically the number of times it occurs. Ex: Consider the document “That new 300 movie looks sooo friggin bad ass. Totally BAD ASS!” The feature “the lowercase form of the word ‘bad’” has a value of 2, and the feature “is_negative_word” would be 4 (“bad”,“ass”,“BAD”,“ASS”). For many documentation classification tasks (e.g. spam classification), bag-of-words features are unreasonably effective. However, for more subtle tasks, including polarity classification, we usually employ more interesting features. 52 Wednesday, March 5, 14
  110. 110. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . Wednesday, March 5, 14
  111. 111. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass Wednesday, March 5, 14
  112. 112. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass w=so Wednesday, March 5, 14
  113. 113. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> Wednesday, March 5, 14
  114. 114. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> Wednesday, March 5, 14
  115. 115. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> wt=that_art wt=new_adj wt=300_noun wt=movie_noun wt=looks_verb wt=sooo_adv wt=friggin_adv wt=bad_adj wt=ass_noun Wednesday, March 5, 14
  116. 116. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> wt=that_art wt=new_adj wt=300_noun wt=movie_noun wt=looks_verb wt=sooo_adv wt=friggin_adv wt=bad_adj wt=ass_noun NP NP NP NP NP NP VP S Wednesday, March 5, 14
  117. 117. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> wt=that_art wt=new_adj wt=300_noun wt=movie_noun wt=looks_verb wt=sooo_adv wt=friggin_adv wt=bad_adj wt=ass_noun NP NP NP NP NP NP VP S subtree=S_NP_movie-S_VP_looks-S_VP_NP_bad_ass Wednesday, March 5, 14
  118. 118. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> wt=that_art wt=new_adj wt=300_noun wt=movie_noun wt=looks_verb wt=sooo_adv wt=friggin_adv wt=bad_adj wt=ass_noun NP NP NP NP NP NP VP S subtree=NP_sooo_bad_ass subtree=S_NP_movie-S_VP_looks-S_VP_NP_bad_ass Wednesday, March 5, 14
  119. 119. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Features for classification 53 That new 300 movie looks sooo friggin BAD ASS . w=that w=new w=300 w=movie w=looks w=sooo w=friggin w=bad w=ass art adj noun noun verb adv adv adj noun punc w=so bi=<START>_that bi=that_new bi=new_300 bi=300_movie bi=movie_looks bi=looks_sooo bi=sooo_friggin bi=friggin_bad bi=bad_ass bi=ass_. bi=._<END> wt=that_art wt=new_adj wt=300_noun wt=movie_noun wt=looks_verb wt=sooo_adv wt=friggin_adv wt=bad_adj wt=ass_noun NP NP NP NP NP NP VP S subtree=NP_sooo_bad_ass subtree=S_NP_movie-S_VP_looks-S_VP_NP_bad_ass FEATURE ENGINEERING... (deep learning might help ease the burden) Wednesday, March 5, 14
  120. 120. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Complexity of features Features can be defined on very deep aspects of the linguistic content, including syntactic and rhetorical structure. The models for these can be quite complex, and often require significant training material to learn them, which means it is harder to employ them for languages without such resources. I’ll show an example for part-of-speech tagging in a bit. Also: the more fine-grained the feature, the more likely it is rare to see in one’s training corpus. This requires more training data, or effective semi-supervised learning methods. 54 Wednesday, March 5, 14
  121. 121. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Recall the four sentiment datasets 55 Dataset Topic Year # Train # Dev #Test Reference Debate08 Obama vs McCain debate 2008 795 795 795 Shamma, et al. (2009) "Tweet the Debates: Understanding Community Annotation of Uncollected Sources." HCR Health care reform 2010 839 838 839 Speriosu et al. (2011) "Twitter Polarity Classification with Label Propagation over Lexical Links and the Follower Graph." STS (Stanford) Twitter Sentiment 2009 - 216 - Go et al. (2009) "Twitter sentiment classification using distant supervision" IMDB IMDB movie reviews 2011 25,000 25,000 - Mas et al. (2011) "Learning WordVectors for Sentiment Analysis" Wednesday, March 5, 14
  122. 122. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Logistic regression, in domain 56 Debate08 HCR STS IMDB Opinion Lexicon + IMDB1000 Logistic Regression w/ bag-of-words Logistic Regression w/ extended features 62.4 49.1 56.0 66.1 60.9 56.0 (no labeled training set) 86.7 70.2 60.5 - When training on labeled documents from the same corpus. Models trained with Liblinear (via ScalaNLP Nak) Note: for IMDB, the logistic regression classifier only predicts positive or negative (because there are no neutral training examples), effectively making it easier than for the lexicon- based method. Wednesday, March 5, 14
  123. 123. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Logistic regression (using extended features), cross-domain 57 Debate08 HCR STS Debate08 HCR Debate08+HCR 70.2 51.3 56.5 56.4 60.5 54.2 70.3 61.2 59.7 Trainingcorpora Evaluation corpora In domain training examples add 10-15% absolutely accuracy (56.4 -> 70.2 for Debate08, and 51.3 -> 60.5 for HRC). More labeled examples almost always help, especially if you have no in-domain training data (e.g. 56.5/54.2 -> 59.7 for STS). Wednesday, March 5, 14
  124. 124. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accuracy isn’t enough, part 1 The class balance can shift considerably without affecting the accuracy! 58 58+24+47 216 = 59.7 D08+HRC on STS - ~ + - ~ + 58 12 5 75 7 24 2 33 34 27 47 108 99 63 54 216 8+15+106 216 = 59.7 (Made up) Positive-heavy classifier - ~ + - ~ + 8 12 55 75 7 24 11 33 1 1 106 108 16 28 172 216 Wednesday, March 5, 14
  125. 125. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accuracy isn’t enough, part 1 Need to also consider the per-category precision, recall, and f-score. 59 - ~ + - ~ + 58 12 5 75 7 24 2 33 34 27 47 108 99 63 54 216 P R F - ~ + Avg 58.6 77.3 66.7 38.1 72.7 50.0 87.0 43.5 58.0 61.2 64.5 58.2 Acc: 59.7 Big differences in precision for the three categories! Wednesday, March 5, 14
  126. 126. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accuracy isn’t enough, part 2 Errors on neutrals are typically less grievous than positive/ negative errors, yet raw accuracy makes one pay the same penalty. 60 D08+HRC on STS One solution: allow varying penalties such that no points are awarded for positive/negative errors, but some partial credit is given for positive/neutral and negative/neutral ones. - ~ + - ~ + 58 12 5 75 7 24 2 33 34 27 47 108 99 63 54 216 Wednesday, March 5, 14
  127. 127. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accuracy isn’t enough, part 3 Who says the gold standard is correct? There is often significant variation among human annotators, especially for positive vs neutral and negative vs neutral. Solution one: work on your annotations (including creating conventions) until you get very high inter-annotator agreement. This arguably reduces the linguistic variability/subtlety characterized in the annotations. Also, humans often fail to get the intended sentiment, e.g. sarcasm. Solution two: measure performance differently. For example, given a set of examples annotated by three or more human annotators and the machine, is the machine distinguishable from the humans in terms of the amount it disagrees with their annotations? 61 Wednesday, March 5, 14
  128. 128. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accuracy isn’t enough, part 4 Often, what is of interest is an aggregate sentiment for some topic or target. E.g. given a corpus of tweets about cars, 80% of the mentions of the Ford Focus are positive while 70% of the mentions of the Chevy Malibu are positive. Note: you can get the sentiment value wrong for some of the documents while still getting the overall, aggregate sentiment correct (as errors can cancel each other). Note also: generally, this requires aspect-based analysis (more later). 62 Wednesday, March 5, 14
  129. 129. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Caveat emptor, part 1 In measuring accuracy, the methodology can vary dramatically from vendor to vendor, at times in unclear ways. For example, some seem to measure accuracy by presenting a human judge with examples annotated by a machine. The human then marks which examples they believe were incorrect. Accuracy is then num_correct/num_examples. Problem: people get lazy and often end up giving the machine the benefit of the doubt. I have even heard that some vendors take their high- confidence examples and do the above exercise. This is basically cheating: high-confidence machine label assignments are on average more correct than low- confidence ones. 63 Wednesday, March 5, 14
  130. 130. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Caveat emptor, part 2 Performance on in-domain data is nearly always better than out-of-domain (see the previous experiments). The nature of the world is that the language of today is a step away from the language of yesterday (when you developed your algorithm or trained your model). Also, because there are so many things to talk about (and because people talk about everything), a given model is usually going to end up employed in domains it never saw in its training data. 64 Wednesday, March 5, 14
  131. 131. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Caveat emptor, part 3 With nice, controlled datasets like those given previously, the experimenter has total control over which documents her algorithm is applied too. However, a deployed system will likely confront many irrelevant documents, e.g. documents written in other languages Sprint the company wants tweets by their customers, but also get many tweets of people talking about the activity of sprinting. documents that match, but which are not about the target of interest documents that should have matched, but were missed in retrieval Thus, identification of relevant documents and even sub- documents with relevant targets, is an important component of end-to-end sentiment solutions. 65 Wednesday, March 5, 14
  132. 132. Aspect-based sentiment analysis Why NLP is hard Sentiment analysis overview Document classification Visualization Semi-supervised learning Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  133. 133. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Is it coherent to ask what the sentiment of a document is? Documents tend to discuss many entities and ideas, and they can express varying opinions, even toward the same entity. This is true even in tweets, e.g. positive towards the HCR bill negative towards Mitch McConnell 67 Here's a #hcr proposal short enough for Mitch McConnell to read: pass the damn bill now Wednesday, March 5, 14
  134. 134. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Fine-grained sentiment Two products, iPhone and Blackberry Overall positive to iPhone, negative to Blackberry Postive aspect/features of iPhone: touch screen, voice quality. Negative (for the mother): expensive. 68 Slide adapted from Bing Liu Wednesday, March 5, 14
  135. 135. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Components of fine-grained analysis Opinion targets: entities and their features/aspects Sentiment orientations: positive, negative, or neutral Opinion holders: persons holding the opinions Time: when the opinions are expressed 69 Slide adapted from Bing Liu Wednesday, March 5, 14
  136. 136. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 An entity e is a product, person, event, organization, or topic. e is represented as a hierarchy of components, sub-components, and so on. Each node represents a component and is associated with a set of attributes of the component. An opinion can be expressed on any node or attribute of the node. For simplicity, we use the term aspects (features) to represent both components and attributes. Entity and aspect (Hu and Liu, 2004; Liu, 2006) 70 iPhone screen battery {cost,size,appearance,...} {battery_life,size,...}{...} ... Slide adapted from Bing Liu Wednesday, March 5, 14
  137. 137. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Opinion definition (Liu, Ch. in NLP handbook, 2010) An opinion is a quintuple (e,a,so,h,t) where: e is a target entity. a is an aspect/feature of the entity e. so is the sentiment value of the opinion from the opinion holder h on feature a of entity e at time t. so is positive, negative or neutral (or more granular ratings). h is an opinion holder. t is the time when the opinion is expressed. Examples from the previous passage: 71 (iPhone, GENERAL, +,Abc123, 5-1-2008) (iPhone, touch_screen, +,Abc123, 5-1-2008) (iPhone, cost, -, mother_of(Abc123), 5-1-2008) Slide adapted from Bing Liu Wednesday, March 5, 14
  138. 138. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 The goal: turn unstructured text into structured opinions Given an opinionated document (or set of documents) discover all quintuples (e,a, so, h, t) or solve a simpler form of it, such as the document level task considered earlier Having extracted the quintuples, we can feed them into traditional visualization and analysis tools. 72 Slide adapted from Bing Liu Wednesday, March 5, 14
  139. 139. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Several sub-problems e is a target entity: Named Entity Extraction (more) a is an aspect of e: Information Extraction so is sentiment: Sentiment Identification h is an opinion holder: Information/Data Extraction t is the time: Information/Data Extraction 73 Slide adapted from Bing Liu All of these tasks can make use of deep language processing methods, including parsing, coreference, word sense disambiguation, etc. Wednesday, March 5, 14
  140. 140. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Named entity recognition Given a document, identify all text spans that mention an entity (person, place, organization, or other named thing). Requires having performed tokenization, and possibly part-of- speech tagging. Though it is a bracketing task, it can be transformed into a sequence task using BIO labels (Begin, Inside, Outside) Usually, discriminative sequence models like Maxent Markov Models and Conditional Random Fields are trained on such sequences, and used for prediction. 74 Mr. [John Smith]Person traveled to [NewYork City]Location to visit [ABC Corporation]Organization. Mr. John Smith traveled to New York City to visit ABC Corporation O B-PER I-PER O O B-LOC I-LOC I-LOC O O B-ORG I-ORG Wednesday, March 5, 14
  141. 141. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 OpenNLP Pipeline demo Sentence detection Tokenization Part-of-speech tagging Chunking NER: persons and organizations 75 PierreVinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr.Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate. Wednesday, March 5, 14
  142. 142. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Things are tricky in Twitterland - need domain adaptation 76 .@Peters4Michigan's camp tells me he'll appear w Obama in MI tomorrow. Not as scared as other Sen Dems of the prez: .[@Peters4Michigan]PER 's camp tells me he'll appear w [Obama]PER in [MI]LOC tomorrow. Not as scared as other Sen [Dems]ORG of the prez: Named entities referred to with @-mentions (makes things easier, but also harder for model solely trained on newswire text) Tokenization: many new innovations, including “.@account” at begining of tweet (which blocks it being an @-reply to that account) Abbreviations mess with features learned on standard text, e.g. “w” for with (as above), or even for George W. Bush: And who changed that? Remember Dems, many on foreign soil, criticizing W vehemently? Speaking of rooting against a Prez .... Wednesday, March 5, 14
  143. 143. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Identifying targets and aspects We can specify targets, their sub-components, and their attributes: But language is varied and evolving, so we are likely to miss many ways to refer to targets and their aspects. E.g. A person declaring knowledge about phones might forget (or not even know) that “juice” is a way of referring to power consumption. Also: there are many ways of referring to product lines (and their various releases, e.g. iPhone 4s) and their competitors, and we often want to identify these semi- automatically. Much research has worked on bootstrapping these. See Bing Liu’s tutorial for an excellent overview: http://www.cs.uic.edu/~liub/FBS/Sentiment-Analysis-tutorial-AAAI-2011.pdf 77 iPhone screen battery {cost,size,appearance,...} {battery_life,size,...}{...} ... Wednesday, March 5, 14
  144. 144. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Target-based feature engineering Given a sentence like “We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly.” NER to identify the “Porsche Panamera” as the target Aspect identification to see that opinions are being expressed about the car’s driving and styling. Sentiment analysis to identify positive sentiment toward the driving and negative toward the styling. Targeted sentiment analysis require positional features use string relationship to the target or aspect or use features from a parse of the sentence (if you can get it) 78 Wednesday, March 5, 14
  145. 145. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 In addition to the standard document-level features used previously, we build features particularized for each target. These are just a subset of the many possible features. Positional features 79 We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. Wednesday, March 5, 14
  146. 146. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 In addition to the standard document-level features used previously, we build features particularized for each target. These are just a subset of the many possible features. Positional features 79 We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. Wednesday, March 5, 14
  147. 147. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 In addition to the standard document-level features used previously, we build features particularized for each target. These are just a subset of the many possible features. Positional features 79 We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. Wednesday, March 5, 14
  148. 148. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 In addition to the standard document-level features used previously, we build features particularized for each target. These are just a subset of the many possible features. Positional features 79 We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. We love how the Porsche Panamera drives, but its bulbous exterior is unfortunately ugly. Wednesday, March 5, 14
  149. 149. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Challenges Positional features greatly expands the space of possible features. We need more training data to estimate parameters for such features. Highly specific features increase the risk of overfitting to whatever training data you have. Deep learning has a lot of potential to help with learning feature representations that are effective for the task by reducing the need for careful feature engineering. But obviously: we need to be able to use this sort of evidence in order to do the job well via automated means. 80 Wednesday, March 5, 14
  150. 150. Visualization Why NLP is hard Sentiment analysis overview Document classification Aspect-based sentiment analysis Semi-supervised learning Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  151. 151. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Visualize, but be careful when doing so It's often the case that a visualization can capture nuances in the data that numerical or linguistic summaries cannot easily capture. Visualization is an art and a science in its own right. The following advice from Tufte (2001, 2006) is easy to keep in mind (if only so that your violations of it are conscious and motivated): Draw attention to the data, not the visualization. Use a minimum of ink. Avoid creating graphical puzzles. Use tables where possible. 82 Slide by Chris Potts Wednesday, March 5, 14
  152. 152. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Sentiment lexicons: SentiWordNet 83 Slide by Chris Potts Wednesday, March 5, 14
  153. 153. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter Sentiment results for Netflix. 84 Slide by Chris Potts Wednesday, March 5, 14
  154. 154. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitrratr blends the data and summarization together 85 Slide by Chris Potts Wednesday, March 5, 14
  155. 155. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Relationships between modifiers in WordNet similar-to graph 86 Slide by Chris Potts Wednesday, March 5, 14
  156. 156. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Relationships between modifiers in WordNet similar-to graph 87 Slide by Chris Potts Wednesday, March 5, 14
  157. 157. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Visualizing discussions: Wikipedia deletions [http://notabilia.net/] 88 Could be used as a visualization for evolving sentiment over time in a discussion among many individuals. Wednesday, March 5, 14
  158. 158. Semi-supervised Learning Why NLP is hard Sentiment analysis overview Document classification Aspect-based sentiment analysis Visualization Stylistics & author modeling Beyond text Wrap up Wednesday, March 5, 14
  159. 159. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling 90 Scaling for text analysis tasks typically requires more than big computation or big data. ➡ Most interesting tasks involve representations “below” the text itself. Wednesday, March 5, 14
  160. 160. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling 90 Scaling for text analysis tasks typically requires more than big computation or big data. ➡ Most interesting tasks involve representations “below” the text itself. Being “big” helps when you know what you are computing and how you can compute it. ➡ GIGO, and “big” garbage is still garbage. Wednesday, March 5, 14
  161. 161. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling 90 Scaling for text analysis tasks typically requires more than big computation or big data. ➡ Most interesting tasks involve representations “below” the text itself. Being “big” helps when you know what you are computing and how you can compute it. ➡ GIGO, and “big” garbage is still garbage. Scaling often requires being creative about how to learn f from relatively little explicit information about the task. ➡ Semi-supervised methods and indirect supervision to the rescue. Wednesday, March 5, 14
  162. 162. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  163. 163. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  164. 164. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  165. 165. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  166. 166. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  167. 167. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  168. 168. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Scaling annotations 91 Wednesday, March 5, 14
  169. 169. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accurate tool Extremely low annotation 92 Annotation is relatively expensive Wednesday, March 5, 14
  170. 170. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accurate tool Extremely low annotation 92 Annotation is relatively expensive Wednesday, March 5, 14
  171. 171. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accurate tool Extremely low annotation ? 92 Annotation is relatively expensive Wednesday, March 5, 14
  172. 172. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Accurate tool Extremely low annotation ? 92 Annotation is relatively expensive We lack sufficient resources for most languages, most domains and most problems. Semi-supervised learning approaches become essential. ➡ See Philip Resnik’s SAS 2011 keynote: http://vimeo.com/32506363 Wednesday, March 5, 14
  173. 173. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Example: Learning part-of-speech taggers 93 They often book flights . The red book fell . Wednesday, March 5, 14
  174. 174. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Example: Learning part-of-speech taggers 93 They often book flights . The red book fell . N Adv V N PUNC D Adj N V PUNC Wednesday, March 5, 14
  175. 175. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Example: Learning part-of-speech taggers 93 They often book flights . The red book fell . N Adv V N PUNC D Adj N V PUNC Wednesday, March 5, 14
  176. 176. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Example: Learning part-of-speech taggers 93 They often book flights . The red book fell . POS Taggers are usually trained on hundreds of thousands of annotated word tokens.What if we have almost nothing? N Adv V N PUNC D Adj N V PUNC Wednesday, March 5, 14
  177. 177. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMM 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  178. 178. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMMEM 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  179. 179. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMMEM 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  180. 180. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMM Tag Dict Generalization EM 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  181. 181. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMM Tag Dict Generalization EM cover the vocabulary 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  182. 182. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMM Model Minimization Tag Dict Generalization EM cover the vocabulary remove noise 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  183. 183. © 2013 Jason M Baldridge Sentiment Analysis Symposium, March 2014 annotation HMM Model Minimization Tag Dict Generalization EM cover the vocabulary remove noise train 94 The overall strategy: grow, shrink, learn Wednesday, March 5, 14
  184. 184. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Extremely low annotation scenario [Garrette & Baldridge 2013] Obtain word types or tokens annotated with their parts-of-speech by a linguist in under two hours 95 the D book N, V often Adv red Adj, N Types:  construct  a  tag   dic.onary  from  scratch   (not  simulated) Tokens:  standard  word-­‐by-­‐word   annota.on They often book flights . N Adv V N PUNC The red book fell . D Adj N V PUNC Wednesday, March 5, 14
  185. 185. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Strategy: connect annotations to raw corpus and propagate them 96 Raw Corpus Tokens TypesWednesday, March 5, 14
  186. 186. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Strategy: connect annotations to raw corpus and propagate them 96 Raw Corpus Tokens TypesWednesday, March 5, 14
  187. 187. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Strategy: connect annotations to raw corpus and propagate them 96 Raw Corpus Tokens TypesWednesday, March 5, 14
  188. 188. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Strategy: connect annotations to raw corpus and propagate them 96 Raw Corpus Tokens TypesWednesday, March 5, 14
  189. 189. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Strategy: connect annotations to raw corpus and propagate them 96 Raw Corpus Tokens TypesWednesday, March 5, 14
  190. 190. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  191. 191. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  192. 192. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  193. 193. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  194. 194. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  195. 195. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  196. 196. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  197. 197. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  198. 198. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  199. 199. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Wednesday, March 5, 14
  200. 200. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Label propagation for video recommendation, in brief 97 Alice Bob Eve Basil Marceaux for Tennessee Governor Jimmy Fallon: Whip My Hair Radiohead: Paranoid Android Pink Floyd: The Wall (Full Movie) Local updates, so scales easily! Wednesday, March 5, 14
  201. 201. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog Tag dictionary generalization 98 Wednesday, March 5, 14
  202. 202. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog Type Annotations ____________ ____________ the dog DT NN Tag dictionary generalization 98 Wednesday, March 5, 14
  203. 203. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog Type Annotations ____________ ____________ the dog DT NN Tag dictionary generalization 98 Wednesday, March 5, 14
  204. 204. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog Token Annotations ____________ ____________ Type Annotations ____________ ____________ the dog the dog walks DT NN VBZ DT NN Tag dictionary generalization 98 Wednesday, March 5, 14
  205. 205. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog Token Annotations ____________ ____________ Type Annotations ____________ ____________ the dog the dog walks DT NN DT NN Tag dictionary generalization 98 Wednesday, March 5, 14
  206. 206. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 NEXT_walksPREV_<b> PREV_the PRE1_tPRE2_th SUF1_g TYPE_the TYPE_thug TYPE_dog DT NN DT NN Tag dictionary generalization 98 Wednesday, March 5, 14
  207. 207. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 99 Wednesday, March 5, 14
  208. 208. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Tag dictionary generalization 99 Wednesday, March 5, 14
  209. 209. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Tag dictionary generalization 99 Wednesday, March 5, 14
  210. 210. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 100 Wednesday, March 5, 14
  211. 211. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 100 Wednesday, March 5, 14
  212. 212. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 101 Wednesday, March 5, 14
  213. 213. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 101 Wednesday, March 5, 14
  214. 214. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 102 Wednesday, March 5, 14
  215. 215. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 102 Wednesday, March 5, 14
  216. 216. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 103 Wednesday, March 5, 14
  217. 217. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 103 Wednesday, March 5, 14
  218. 218. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN Tag dictionary generalization 104 Wednesday, March 5, 14
  219. 219. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 DT NN DT NN TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 104 Wednesday, March 5, 14
  220. 220. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 104 Wednesday, March 5, 14
  221. 221. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 104 Wednesday, March 5, 14
  222. 222. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 105 Wednesday, March 5, 14
  223. 223. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 105 Wednesday, March 5, 14
  224. 224. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Result: TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 105 Wednesday, March 5, 14
  225. 225. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Result: • a tag distribution on every token TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 105 Wednesday, March 5, 14
  226. 226. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Result: • a tag distribution on every token • an expanded tag dictionary (non-zero tags) TOK_the_1 TOK_dog_2TOK_the_4 TOK_thug_5 Tag dictionary generalization 105 Wednesday, March 5, 14
  227. 227. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) Wednesday, March 5, 14
  228. 228. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) Wednesday, March 5, 14
  229. 229. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) Wednesday, March 5, 14
  230. 230. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) Wednesday, March 5, 14
  231. 231. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) Wednesday, March 5, 14
  232. 232. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 0 25 50 75 100 English Kinyarwanda Malagasy EM only EM only + Our approach + Our approach Tokens Types EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach EM only EM only + Our approach + Our approach Total Accuracy 106 Results (two hours of annotation) With 4 hours + a bit more ➡ 90% [Garrette, Mielens, & Baldridge 2013] Wednesday, March 5, 14
  233. 233. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity classification for Twitter 107 Obama looks good. #tweetdebate #current+ - McCain is not answering the questions #tweetdebate Sen McCain would be a very popular President - $5000 tax refund per family! #tweetdebate+ - "it's like you can see Obama trying to remember all the "talking points" and get his slogans out there #tweetdebate" Wednesday, March 5, 14
  234. 234. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity classification for Twitter 107 Obama looks good. #tweetdebate #current+ - McCain is not answering the questions #tweetdebate Sen McCain would be a very popular President - $5000 tax refund per family! #tweetdebate+ - "it's like you can see Obama trying to remember all the "talking points" and get his slogans out there #tweetdebate" Logistic regression... and... done! Wednesday, March 5, 14
  235. 235. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Polarity classification for Twitter 107 Obama looks good. #tweetdebate #current+ - McCain is not answering the questions #tweetdebate Sen McCain would be a very popular President - $5000 tax refund per family! #tweetdebate+ - "it's like you can see Obama trying to remember all the "talking points" and get his slogans out there #tweetdebate" Logistic regression... and... done! What if instance labels aren’t there? Wednesday, March 5, 14
  236. 236. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 No explicitly labeled examples? 108 Wednesday, March 5, 14
  237. 237. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 No explicitly labeled examples? 108 Positive/negative ratio using polarity lexicon. ➡ Easy & works okay for many cases, but fails spectactularly elsewhere. Wednesday, March 5, 14
  238. 238. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 No explicitly labeled examples? 108 Positive/negative ratio using polarity lexicon. ➡ Easy & works okay for many cases, but fails spectactularly elsewhere. Emoticons as labels + logistic regression. ➡ Easy, but emoticon to polarity mapping is actually vexed. Wednesday, March 5, 14
  239. 239. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 No explicitly labeled examples? 108 Positive/negative ratio using polarity lexicon. ➡ Easy & works okay for many cases, but fails spectactularly elsewhere. Emoticons as labels + logistic regression. ➡ Easy, but emoticon to polarity mapping is actually vexed. Label propagation using the above as seeds. ➡ Noisy labels provide soft indicators, the graph smooths things out. Wednesday, March 5, 14
  240. 240. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 No explicitly labeled examples? 108 Positive/negative ratio using polarity lexicon. ➡ Easy & works okay for many cases, but fails spectactularly elsewhere. Emoticons as labels + logistic regression. ➡ Easy, but emoticon to polarity mapping is actually vexed. Label propagation using the above as seeds. ➡ Noisy labels provide soft indicators, the graph smooths things out. If you have annotations, you can use those too. ➡ Including ordered labels like star ratings: see Talukdar & Crammer 2009 Wednesday, March 5, 14
  241. 241. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  242. 242. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 “Obama”,“silly”,“petty” Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  243. 243. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 “Obama”,“silly”,“petty” = Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  244. 244. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  245. 245. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  246. 246. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 is happy Obama is president Obama’s doing great! Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  247. 247. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 is happy Obama is president Obama’s doing great! “Obama”,“silly”,“petty” Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  248. 248. Using social interaction: Twitter sentiment Obama is making the repubs look silly and petty bird images from http://www.mytwitterlayout.com/ http://starwars.wikia.com/wiki/R2-D2 is happy Obama is president Obama’s doing great! = (hopefully) “Obama”,“silly”,“petty” Papers: Speriosu et al. 2011;Tan et al. KDD 2011 Wednesday, March 5, 14
  249. 249. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Wednesday, March 5, 14
  250. 250. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Wednesday, March 5, 14
  251. 251. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Wednesday, March 5, 14
  252. 252. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Wednesday, March 5, 14
  253. 253. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny Wednesday, March 5, 14
  254. 254. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 Wordn-grams we can’t love ny i love Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny Wednesday, March 5, 14
  255. 255. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 OpinionFinder care hate love Wordn-grams we can’t love ny i love Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny Wednesday, March 5, 14
  256. 256. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 OpinionFinder care hate love Wordn-grams we can’t love ny i love Emoticons ;-):(:) Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny Wednesday, March 5, 14
  257. 257. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 OpinionFinder care hate love Wordn-grams we can’t love ny i love Emoticons ;-):(:) Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny - + + Wednesday, March 5, 14
  258. 258. © 2014 Jason M Baldridge Sentiment Analysis Symposium, March 2014 Twitter polarity graph with knowledge and noisy seeds 110 OpinionFinder care hate love Wordn-grams we can’t love ny i love Emoticons ;-):(:) Alice I love #NY! :) Ahhh #Obamacare Bob We can’t pass this :( #killthebill I hate #Obamacare! #killthebill Eve We need health care! Let’s get it passed! :) Hashtags killthebillobamacareny + +- - + + Wednesday, March 5, 14

×