Intro to sentiment analysis


Published on

Presented on 15 March 2014 at Women Techmakers Vienna

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Intro to sentiment analysis

  1. 1. “Information about people’s opinions plays important role for more accurate decision making in a number of domains. There are number of opinion-rich resources available on the internet e.g. personal blogs, reviews, specialized websites which helps in understanding the opinions of others.” Yenkar, Pranali Sawarkar, SD - A conceptual study of Sentiment Mining, 2012 1
  2. 2. 2
  3. 3. “The term opinion mining appears in a paper by Dave et al. that was published in the proceedings of the 2003 WWW conference; the publication venue may explain the popularity of the term within communities strongly associated with Web search or information retrieval. According to Dave et al., the ideal opinion-mining tool would “process a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Much of the subsequent research self-identified as opinion mining fits this description in its emphasis on extracting and analyzing judgments on various aspects of given items. However, the term has recently also been interpreted more broadly to include many different types of analysis of evaluative text. The history of the phrase sentiment analysis parallels that of “opinion mining” in certain respects. The term “sentiment” used in reference to the automatic analysis of evaluative text and tracking of the predictive judgments therein appears in 2001 papers by Das and Chen [66] and Tong [297], due to these authors’ interest in analyzing market sentiment. It subsequently occurred within 2002 papers by Turney [299] and Pang et al. [235], which were published in the proceedings of the annual meeting of the Association for Computational Linguistics (ACL) and the annual conference on Empirical Methods in Natural Language Processing (EMNLP). Moreover, Nasukawa and Yi [221] entitled their 2003 paper, “Sentiment analysis: Capturing favorability using natural language processing”, and a paper in the same year byYi et al. [324]was named “Sentiment Analyzer: Extracting sentiments about a given topic using natural language processing techniques”. These events together may explain the popularity of “sentiment analysis” among communities self-identified as focused on NLP. A sizeable number of papers mentioning “sentiment analysis” focus on the specific application of classifying reviews as to their polarity (either positive or negative), a fact that appears to have caused some authors to suggest that the phrase refers specifically to this narrowly defined task. However, nowadays many construe the term more broadly to mean the computational treatment of opinion, sentiment, and subjectivity in text.” B.Bang, L. Lee - Opinion Mining and Sentiment Analysis 2008 3
  4. 4. References: •2001 Sanjiv Das and Mike Chen. Yahoo! for Amazon: Extracting market sentiment from stock message boards. In Proceedings of the Asia Pacific Finance Association Annual Conference (APFA), 2001. •Richard M. Tong. An operational system for detecting and tracking opinions in on- line discussion. In Proceedings of the Workshop on Operational Text Classification (OTC), 2001. •Satoshi Morinaga, Kenji Yamanishi, Kenji Tateishi, and Toshikazu Fukushima. Mining product reputations on the web. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), pages 341–349, 2002. Industry track. •Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In Proceedings of WWW, pages 519–528, 2003. •Tetsuya Nasukawa and Jeonghee Yi. Sentiment analysis: Capturing favorability using natural language processing. In Proceedings of the Conference on Knowledge Capture (K-CAP), 2003. 4
  5. 5. •real time •uses ANEW in a Python implementation •green is happy, red is sad tweet 5
  6. 6. This is a quite advertised application that shows a real time Twitter mood map, this time for US. As the previous Twitter UK mood map, this application is also based on the ANEW word list. The algorithm does not use simple average calculations over the sentiments of a sentence but works with relative fractions. An example of how this algorithm works is found in the research paper of O’Connor and Balasubramanyan. To determine the emotion of a sentence they take the number of positive words in that sentence and divide it with the total number of words in the sentence. The real time mood map displays dark blue and orange circles. Dark blue indicate negative posts and orange circles indicate positive posts. The diameter of the circles emphasize the number of posts of the area. On this mood map one cannot read the actually posted tweets. • B. O’Connor and R. Balasubramanyan. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. ICWSM, 11:122–129, 2010. 6
  7. 7. 7
  8. 8. Machine Learning is everywhere now in our life: • simplest example is Amazon and the product recommendations it shows you - it learns from your previous bought products and recommends new items • moreover - everything which has to do with lots and lots of data, a machine can try to make sense of it based on what answers we need: learn about disease in medical records, customer behavior in web click data and so on… • a machine learning can be used to learn to recognize your handwriting on an envelope • text classification 8
  9. 9. ANEW has around 2500 English rated words 9
  10. 10. 1- 4 negative , 4-6, neutral, 6-9 positive 10
  11. 11. This real time mood map for UK is based on Twitter posts. You can watch how Tweeter users feel at this right moment and also see what they post. Taking a look in the background the algorithm is based on ANEW word list and is implemented in Python. Taken as example the following sentence: I have an example sentence have. We can separate the sentence into words like this [i, have, an, example, sentence, have] such that its tokens list would contain [(I,1),(have,2), (an,1),(example,1),(sentence,1))]. The actual rating of the sentence is done according to the following equation applying this for all tokens in a repetitive cycle: emotion = emotion + valence(of word) * number of times it appears count = count +number of times the words appears sentence emotion = emotion / count From a range of red to dark green it displays on the UK map the negative or positive tweets. Through clicking a point one can also see the live tweet just posted. • P. Patel and D. Saul. Mood map python code. URL priyeshpatel/moodmap/blob/master/moodmap/ Accessed: 2014-02-16. 11
  12. 12. What does a bad road mean? How can we rate the sentence/text? 12
  13. 13. Davidov, Dmitry, Oren Tsur, and Ari Rappoport. "Semi-supervised recognition of sarcastic sentences in twitter and amazon." Proceedings of the Fourteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics, 2010. 1. “thank you Janet Jackson for yet another year of Super Bowl classic rock!” (Twitter) 2. “He’s with his other woman: XBox 360. It’s 4:30 fool. Sure I can sleep through the gunfire” (Twitter) 3. “Wow GPRS data speeds are blazing fast.” (Twitter) 4. “[I] Love The Cover” (book, amazon) 5. “Defective by design” (music player, amazon) 13
  14. 14. 14
  15. 15. • Precision - how right are we? • Recall - did we find all relevant solutions? • Accuracy - % of true or correct responses from the total. Wikipedia : 15
  16. 16. 16
  17. 17. 17