• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Sentiment Analysis and Social Media: How and Why
 

Sentiment Analysis and Social Media: How and Why

on

  • 1,482 views

 

Statistics

Views

Total Views
1,482
Views on SlideShare
1,480
Embed Views
2

Actions

Likes
4
Downloads
0
Comments
0

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Sentiment Analysis and Social Media: How and Why Sentiment Analysis and Social Media: How and Why Presentation Transcript

    • Sentiment Analysis in Social Media How and WhyDavide Feltoni Gurini 1Sistemi Intelligenti per Internet A.A. 2012/2013
    • About me and Contacts1° Year Ph.D. in Computer Engineering at Roma Tre UniversityVia della Vasca Navale 79, RomeA.I. Lab. Room 2.02aContacts: • feltoni@dia.uniroma3.it • http://about.me/davidefeltoni 2
    • Outline Social Networks and Web 2.0• Sentiment analysis: what is it?• Sentiment analysis: applications• Sentiment analysis: an inside look• TwitterSA• TwitterSA Soccer Match Analysis• TwitterSA Predicting Elections 3
    • Social Networks and Web 2.0 Common Uses:• Spend time on internet!• Virtual network of friends• Gather info about something• Share people what we do• Know what other people do• Know what is happening in the world• … 4
    • Social Networks and Web 2.0Behind the scenes• Data Analyst • Statistical studies • How to collect and store big data• Business Analyst • Social marketing • Web Advertising• Web and App Developer • Integrate social in games and apps• Ph.D. and Researcher • Research methods and publish articles 5
    • Social Networks and Web 2.0 Evolution of Web• 1.0 (web for experts) 2.0 (web for everyone)• Users are not only reading• Web is a data container• Users now generate more content • Blog, Social networks, write reviews, … • Collaborative sites (Wikipedia, Communities, Forum) 6
    • Social Networks: Overview of Italy 7Source: nielsen 2012
    • Social Networks: Some Statistics Facebook monthly active users now total nearly 850 million People spent about 11 hour at month considering only Facebook There are 175 million tweets sent from Twitter every day in 2012 The 2012 election broke records with 31.7 million political tweets. 625,000 new users on Google+ every day 27% of small and 34% of medium businesses are using social media for business (+20% YoY) 8Source: www.huffingtonpost.com
    • Social Networks and Web 2.0: Big Audience, Big Data • Even older people use social networks • Important for web data analysis 9
    • Social Networks: Real Time Update 2012 Greece clashes • Fresh news: search engine vs social network 10
    • Social Networks: Simultaneous Use 11Source: nielsen 2012
    • Social Networks and Web 2.0: Why Are So Important?• Know what most of people think in Web• Big data available almost free• Great audience and big slice of the real population• Real time feedback of news• Real time comments about tv shows or events• Share contents with million people• Clustering users according to age, interests, page likes..• … 12
    • Outline• Social Networks and Web 2.0 Sentiment analysis: what is it?• Sentiment analysis: applications• Sentiment analysis: an inside look• TwitterSA• TwitterSA Soccer Match Analysis• TwitterSA Predicting Elections 13
    • What is Sentiment Analysis? Theory«Sentiment Analysis or Opinion Miningis the computational study of opinions,sentiments and emotions expressed intext. -- Bing Liu, 2010,«Sentiment Analysis and Subjectivity»PracticalUsing NLP, statistics, or Machine Learning methods toextract and identify, how positive – negative is the sentimentcontent expressed in a review, blog, discussion, news, 14comment or in any other document.
    • Why are online opinions so important?• «Opinions» are the influencers of our behaviours.• Before «making a decision», we usually seek out the opinion of the others: – Buy a product, rent a car, reserve an hotel room, looking for a good restaurant… Individuals: seek opinions from friends and family Organizations: use surveys, opinion polls, consultants 15
    • Why are online opinions so important? 16
    • Sentiment Analysis Application Areas– Organization/brand • Know the org/brand reputation in Web • Know consumers opinion about a product • Understand consumers needs • …– Individuals • Make decision before buy something • Know aggregate sentiment for a product review • Find public opinion about person, politician, …– Research Studies • Predict political results • Predict box office • Citizen polls 17
    • Sentiment Analysis Application Areas– Marketing 2.0 • Advertisement Placements: – Place ads if one praises a product – Place ads from competitor if one dislikes a product • Join Sentiment and recommendation systems – Know what kind of people praise a product (age, interest..) • How people are responding to a product release/ad campaign– Social tv • Know what people think and audience about a tv show • Do interactive polls during tv journal 18
    • Sentiment Analysis Application Areas 19
    • Sentiment Analysis Application AreasSome tools can also measure the overall sentiment expressed in blogs and social networksExample: An earthquake produced a lot of negative sentiments 20
    • Sentiment Analysis Application Areas Sentiment integration with search engines 21
    • Sentiment Analysis Application Areas 22
    • Also a fascinating problem Intellectually challenging & many applications  A popular research topic in recent years (Shanahan, Qu, and Wiebe, 2006 (edited book); Surveys - Pang and Lee 2008; Liu, 2006 and 2011; 2010)  More than 100 companies in USA alone  Many workshop and conference  http://sentimentsymposium.com/  www.gplsi.dlsi.ua.es/congresos/wassa2012/ A Large Research Area  Opinion mining, Text Mining  Sentiment and Subjectivity analysis  Artificial Intelligence  Natural Language Processing  Computational Linguistic  Etc. 23
    • Sentiment Analysis and Social Web How to do that?Easy: search the Web and find a Sentiment Analysis tools• http://www.twitalyzer.com/index.asp• http://twendz.waggeneredstrom.com/• http://www.sentiment140.com/• http://www.blogmeter.it• http://twitrratr.com/• http://www.socialmention.com• http://www.lovewillconquer.co.uk/• Hundred more..And professional sites for companies• www.radian6.com 24• www.sysomos.com
    • Sentiment Analysis online toolsBut you will find that Rita Levi Montalcini wasn’tvery popular 25
    • Sentiment Analysis online toolsOr was she? 26
    • So again, how to do that? 27
    • Outline• Social Networks and Web 2.0• Sentiment analysis: what is it?• Sentiment analysis: applications Sentiment Analysis: an inside look• TwitterSA• TwitterSA Soccer Match Analysis• TwitterSA Predicting Elections 28
    • What is an opinion (1)?“I bought an iPhone a few days ago. It is such a nice phone. Thetouch screen is really cool. The voice quality is clear too. It ismuch better than my old Blackberry, which was a terriblephone and so difficult to type with its tiny keys. However, mymother was mad with me as I did not tell her before I boughtthe phone. She also thought the phone was too expensive, …”Looking at this review is possible to do:• Document-level sentiment analysis: is this review + or -?• Sentence-level sentiment analysis: is each sentence + or -?• Entity-level sentiment analysis: is iPhone + or -? 29
    • What is an opinion (2)? An opinion is a quintuple (𝒆 𝒋 , 𝒂 𝒋𝒌 , 𝒔𝒐 𝒊𝒋𝒌𝒍 , 𝒉 𝒊 , 𝒕 𝒍 )where 𝑒 𝑗 is the target entity (person, product, organization, event or a generic topic) 𝑎 𝑗𝑘 is an aspect/feature of the entity 𝑠𝑜 𝑖𝑗𝑘𝑙 is the sentiment value of the opinion polarity : usually positive, negative or neutral ℎ 𝑖 is the opinion holder 𝑡 𝑙 is the time when opinion is expressed 30
    • What is an opinion (3)? Entity – Feature – Polarity – Opinion Holder – Time• I bought an iPhone and the touch screen is really cool. (Positive)• My old Blackberry, which was a terrible phone and so difficult to type with its tiny keys (Negative)  In quintuples (iPhone, touch screen, positive, Author, review data) (Blackberry, keys, negative, Author, review data) 31
    • Sentiment Analysis is hard (1)! Manage Negations• Direct Negation: ‘I dont like my new Iphone’• Ambiguous Negation: ‘Not only is this phone expensive but its also heavy and difficult to use’• Indirect Negation: ‘Perhaps it is a great phone, but I fail to see why’ Co-reference Resolution• ‘We watched the movie and went to dinner; it was awful’ What does ‘it’ refers to?? Slang and Writing Errors• Shortform: nite (night), sayin (saying).• Acronyms: lol (laugh out loud), iirc (if I remember correctly).• Writing Errors: wouls(would), rediculous (ridiculous).• Punctuation Errors: im (Im), dont (dont).• Slang: that was well mint (that was very good).• Repeated Letters: that was soooooo greeeat (that was so great).• Alphanumeric Words: 2night(tonight), str8(straight). 32
    • Sentiment Analysis is hard (2)! Entity Disambiguation ? 33
    • Sentiment Analysis is hard (3)! Manage Comparative• ‘Federer is better than Nadal’ Federer (+) Nadal (-) Domain Dependent Opinion• ‘The battery life is long’ (+)• ‘The waiting time to enter at restaurant was too long’ (-) More Challenges• Opinion Spam• Sarcasm• More general complexity of natural language• … 34
    • Sentiment Analysis is hard (4)! A company posted an ad for writing fake reviews on amazon.com(65 cents per review) 35
    • Sentiment Analysis: Known Approaches Building opinion words lexicon• Lexical Methods – Manual approach – Dictionary-based approach (Hu and Liu, 2004, Andreevskaia and Bergler, 2006, Dragut et al 2010) – Corpus-based approach (Hazivassiloglou and McKeown, 1997; Turney, 2002; Yu and Hazivassiloglou, 2003; Kanayama and Nasukawa, 2006; Ding, Liu and Yu, 2008)• Machine Learning – Unsupervised learning (Hatzivassiloglou and McKeown 1997, Yu and Hatzivassiloglou 2003) – Supervised learning (Alec Go et al 2009, Pang – Lee 2002, 2010 Pak – Paroubek) – Semi-supervised learning (Andreevskaia and Bergler, 2006 , Esuti and Sebastiani, 2005 ) 36
    • Sentiment Analysis: Known Approaches Building opinion words lexicon • Manual approach • Pro: precision, no rules to define • Cons: no automation, time for set up lexicon • Dictionary-based approach • Manual or prepared dictionary of positive – negative words. Expand dictionary with synonyms and antonyms. • Pro: faster, semi-automated • Cons: low precision (synonyms: great -> excellent and admirable but also -> large; big; fat) • Corpus-based approach • Seed set of positive – negative adjective (for example) • Expand this set using grammar bindings • Example: ‘this car is beautiful and spacious’ ; if is known that beautiful is positive also spacious is positive. • Pro: high automation, moderate precision 37 • Cons: attention to grammar rules, word set isn’t complete
    • Outline• Social Networks and Web 2.0• Sentiment analysis: what is it?• Sentiment analysis: applications• S.A. an inside look TwitterSA• TwitterSA Soccer Match Analysis• TwitterSA Predicting Elections 38
    • Twitter: The Social Network• 140 char max status length• Can add urls with multimedia• 99% are public status• No friend: followers and following• Hashtag # 39
    • TwitterSA 40
    • TwitterSA: Machine LearningGoal: Classify text input in Positive or Negative Supervised Algorithm• Must provide a set of inputs (Text phrase) and the appropriate outputs class (Positive or Negative) for those inputs.• Learning algorithm will train using those inputs. After that is able to classify a new instance. 41
    • TwitterSA: Multinomial Naive Bayes Naive Bayes Theorem X = new text instance to classify 𝑪 𝟏 . . 𝑪 𝒏 = possible class (Ex. Positive, Negative..) 𝑷(𝑿|𝑪 𝒊 ) = products of probabilities that single attributes of istance X appertein to class 𝐶 𝑖 𝑷(𝑪 𝒊 |𝑿) = probability that new instance X appartein to class 𝐶 𝑖 X X P(C|X) 42
    • Multinomial Naive Bayes: A Worked Example Doc Words vector Class (𝐶 𝑖 ) Training 1 Love C1 = Pos 2 Almost hate C2 = Neg 3 Love C1 = Pos Test 4 Almost Love ? 43
    • Multinomial Naive Bayes: A Worked Example Doc Words vector Class (𝐶 𝑖 ) Training 1 Love C1 = Pos 𝑁𝑐 𝑉 = 3 𝑤𝑜𝑟𝑑𝑠 2 Almost hate C2 = Neg𝑃 𝐶𝑖 = 𝑁 3 Love C1 = Pos 𝑐𝑜𝑢𝑛𝑡 𝑋, 𝐶 𝑖 + 1 Test 4 Almost Love ?𝑃 𝑋 | 𝐶𝑖 = 𝑐𝑜𝑢𝑛𝑡 𝐶 𝑖 + |𝑉| 2 1𝑷 𝑪𝟏 = 𝒑𝒐𝒔 = 𝑷 𝑪𝟐 = 𝒏𝒆𝒈 = 3 3 44
    • Multinomial Naive Bayes: A Worked Example Doc Words vector Class (𝐶 𝑖 ) Training 1 Love C1 = Pos 𝑁𝑐 𝑉 = 3 𝑤𝑜𝑟𝑑𝑠 2 Almost hate C2 = Neg 𝑃 𝐶𝑖 = 𝑁 3 Love C1 = Pos 𝑐𝑜𝑢𝑛𝑡 𝑋, 𝐶 𝑖 + 1 Test 4 Almost Love ? 𝑃 𝑋 | 𝐶𝑖 = 𝑐𝑜𝑢𝑛𝑡 𝐶 𝑖 + |𝑉| 2 1𝑷 𝑪𝟏 = 𝒑𝒐𝒔 = 𝑷 𝑪𝟐 = 𝒏𝒆𝒈 = 3 3Conditional Probabilities 2+1 𝑃 𝐿𝑜𝑣𝑒 𝑃𝑜𝑠 ) = (2+3) = 3/5 0+1 𝑃 𝐴𝑙𝑚𝑜𝑠𝑡 𝑃𝑜𝑠 ) = = 1/5 (2+3) 0+1 𝑃 𝐿𝑜𝑣𝑒 𝑁𝑒𝑔 ) = (2+3) = 1/5 1+1 45 𝑃 𝐴𝑙𝑚𝑜𝑠𝑡 𝑁𝑒𝑔 ) = (2+3) = 2/5
    • Multinomial Naive Bayes: A Worked Example Doc Words vector Class (𝐶 𝑖 ) Training 1 Love C1 = Pos 𝑁𝑐 𝑉 = 3 𝑤𝑜𝑟𝑑𝑠 2 Almost hate C2 = Neg 𝑃 𝐶𝑖 = 𝑁 3 Love C1 = Pos 𝑐𝑜𝑢𝑛𝑡 𝑋, 𝐶 𝑖 + 1 Test 4 Almost Love ? 𝑃 𝑋 | 𝐶𝑖 = 𝑐𝑜𝑢𝑛𝑡 𝐶 𝑖 + |𝑉| 2 1𝑷 𝑪𝟏 = 𝒑𝒐𝒔 = 𝑷 𝑪𝟐 = 𝒏𝒆𝒈 = 3 3Conditional Probabilities 2+1 𝑃 𝐿𝑜𝑣𝑒 𝑃𝑜𝑠 ) = (2+3) = 3/5 Choosing a class 0+1 𝑃 𝐴𝑙𝑚𝑜𝑠𝑡 𝑃𝑜𝑠 ) = = 1/5 (2+3) 𝑃 𝑝𝑜𝑠 𝑑𝑜𝑐4) = 2/3 * 1/5 * 3/5 = 0,08 0+1 𝑃 𝐿𝑜𝑣𝑒 𝑁𝑒𝑔 ) = (2+3) = 1/5 𝑃 𝑛𝑒𝑔 𝑑𝑜𝑐4) = 1/3 * 1/5 * 2/5 = 0,026 1+1 46 𝑃 𝐴𝑙𝑚𝑜𝑠𝑡 𝑁𝑒𝑔 ) = (2+3) = 2/5
    • Multinomial Naive Bayes: A Worked Example Input corpus: attributes and classes Training weights 47
    • TwitterSA: Collecting Corpus• Big corpus for training with label annotation!!• Different methods from corpus-based or dictionary-based approach• Collecting big sentiment corpus starting from noise label – Bag of words for training bayesian learning algorithm – Found that iPhone and :) can contain positive sentiment and :( the contrary – Discovered that also hashtags can be used as noise label o ‘Recently Ive started developing a love for indie music ... #loveit’ o ‘I have to say, I am so impressed with this iPhone5. I will never ever go back to a Droid. #loveit #happy’ 48
    • TwitterSA: Text Processing and NormalizationTwitterSA process: many modulesNormalization of repeated letters andalphanumeric Discard terms with high Entropy and low Salience Manage negation for sentiment training Convert slang words to normal form Unigram, Bigram for training 49
    • TwitterSA: Vector of Feature (1)LIWC Dictionary MPQA Dictionary Linguistic categories Input: ‘Happy Birthday Steve Jobs your iPhone is amazing’ {Pos, Neg} LIWC Categories {1,0} {posEmo, affect, ..} 50
    • TwitterSA: Vector of Feature (2) POS Tag Description Example CC conjunction and, but, or, & POS Tagger CD cardinal number 1, three Input: He is the best DT determiner the Output: He|PRP is|VBP the|DT best|JJS JJ adjective green JJR adjective, comparative greener JJS adjective, superlative greenest … … … 10,80,6 Negative Sentence Positive Sentence Personal pronouns and possessive Adjective and superlative adverb.0,4 Comparative adjective Proper Noun Verbs in past tense0,2 0 NNS JJ NNP JJR MD CD WP POS FW PRP TO CC -LRB- RP NN PDT RBR JJS RBS -RRB- PRP$ DT RB IN WDT UH WRB NNPS VBZ VBP VB VBG VBN VBD-0,2-0,4-0,6-0,8 Tag occurrence in positive and negative sentence 51 -1
    • TwitterSA: Vector of Feature (3) Pattern Mask‘The combination of one or more near tag’• Input • He|PRP is|VBP the|DT best|JJS• Example output • PRP|VBP ; PRP|VBP|DT ; PRP|VBP|DT|JJS ; etc.• Discover most frequency pattern mask in positive and negative sentence. {Pos, Neg} For example an input PRP|VBP|DT|JJS occurs almost in positive sentence {1,0} 52
    • TwitterSA: How Much is Accurate• N-Fold Cross Validation (average results)• Split corpus: 70% Training; 30% Test• Manual corpus for testing 53
    • TwitterSA: Testing Classification problem (not Retrieval)Precision: is positive predictive value, or correctly classified instanceRecall: or Sensitivity is the proportion of actual positives which arecorrectly identified as such Confusion Matrix Classified Classified Positive Negative Predicted TP FN Positive Predicted FP TN Negative 54
    • Outline• Social Networks and Web 2.0• Sentiment analysis: what is it?• Sentiment analysis: applications• S.A. an inside look• TwitterSA TwitterSA Soccer Match Analysis• TwitterSA Predicting Elections 55
    • TwitterSA: Soccer Match Analysis WhatMonitor soccer match on TwitterMilan – Inter, Seria A Season 2011-2012 Goal• Understand with automatic sentiment analysis the behaviour of the match• Who wins? How many goals? 56
    • TwitterSA: Soccer Match Analysis Volume of tweets Inter and Milan 1600 1400 Match Start 1200 1000 800 Tweets 600 400 200 0 Mon Fri Sun Mon 57
    • TwitterSA: Soccer Match Analysis Volume of tweets Inter and Milan 1600 1400 Match Start Coaches Interview 1200 1000 800 Tweets 600 400 200 0 Mon Fri Sun Mon 58
    • TwitterSA: Soccer Match Analysis Volume of tweets Inter and Milan 1600 1400 Match Start Goal: which team? Coaches Interview 1200 1000 800 Tweets 600 400 200 0 Mon Fri Sun Mon 59
    • TwitterSA: Soccer Match Analysis Sentiment Analysis 90 Goal: which team? 85 80% Positive Tweets 75 70 Milan pos Inter pos 65 60 55 50 match day before tweet peak match end the day after timeline 60
    • TwitterSA: Soccer Match Analysis• Who won?• How many goals? 61
    • TwitterSA: Soccer Match Analysis• Who won?• How many goals? 62
    • Outline• Social Networks and Web 2.0• Sentiment analysis: what is it?• Sentiment analysis: applications• S.A. an inside look• TwitterSA• TwitterSA Soccer Match Analysis TwitterSA Predicting Elections 63
    • TwitterSA: Predicting Elections 64
    • TwitterSA: Predicting Elections 65
    • TwitterSA: Predicting Elections 66
    • TwitterSA: Predicting Elections 67
    • TwitterSA: Predicting Elections 68
    • TwitterSA: Predicting Elections 69
    • TwitterSA: Predicting Elections Results of electionsFull article and infographics athttp://davidefeltoni.wordpress.com 70
    • References• Dr. Diana Maynard: Practical Sentiment Analysis• Seth Grimes: Sentiment Analysis Symposium 2012• B. Liu reference is available here: http://www.cs.uic.edu/~liub/FBS/AAAI-2011-tutorial-references.pdf• Best Survey about Sentiment Analysis: B. Liu ‘Sentiment Analysis and Subjectivity’ chapter is available here: http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment- analysis.pdf• Great tutorial for Sentiment Analysis: http://sentiment.christopherpotts.net/• Some images and statistics are taken from www.basistech.com, www.nielsen.com 71
    • ‘Quando in codesto sentire ti senti veramente felice,chiamalo pure come vuoi: chiamalo felicità, cuore,amore. Per questo io non ho nome alcuno.Sentimento è tutto! La parola è soltanto suono e fumo.’’Johann Wolfgang von Goethe 72