Advantages of Hiring UIUX Design Service Providers for Your Business
Β
Sentiment Analysis and Social Media: How and Why
1. Sentiment Analysis in
Social Media
How and Why
Davide Feltoni Gurini
1
Sistemi Intelligenti per Internet A.A. 2012/2013
2. About me and Contacts
1Β° Year Ph.D. in Computer Engineering at Roma Tre University
Via della Vasca Navale 79, Rome
A.I. Lab. Room 2.02a
Contacts:
β’ feltoni@dia.uniroma3.it
β’ http://about.me/davidefeltoni
2
3. Outline
οΌ Social Networks and Web 2.0
β’ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
β’ Sentiment analysis: an inside look
β’ TwitterSA
β’ TwitterSA Soccer Match Analysis
β’ TwitterSA Predicting Elections
3
4. Social Networks and Web 2.0
Common Uses:
β’ Spend time on internet!
β’ Virtual network of friends
β’ Gather info about something
β’ Share people what we do
β’ Know what other people do
β’ Know what is happening in the world
β’ β¦
4
5. Social Networks and Web 2.0
Behind the scenes
β’ Data Analyst
β’ Statistical studies
β’ How to collect and store big
data
β’ Business Analyst
β’ Social marketing
β’ Web Advertising
β’ Web and App Developer
β’ Integrate social in games and
apps
β’ Ph.D. and Researcher
β’ Research methods and publish
articles 5
6. Social Networks and Web 2.0
Evolution of Web
β’ 1.0 (web for experts) 2.0 (web for
everyone)
β’ Users are not only reading
β’ Web is a data container
β’ Users now generate more content
β’ Blog, Social networks, write
reviews, β¦
β’ Collaborative sites
(Wikipedia, Communities,
Forum)
6
8. Social Networks: Some Statistics
Facebook monthly active users now total nearly 850 million
People spent about 11 hour at month considering only Facebook
There are 175 million tweets sent from Twitter every day in 2012
The 2012 election broke records with 31.7 million political tweets.
625,000 new users on Google+ every day
27% of small and 34% of medium businesses are using social media for business
(+20% YoY)
8
Source: www.huffingtonpost.com
9. Social Networks and Web 2.0: Big Audience, Big Data
β’ Even older people use social networks
β’ Important for web data analysis
9
10. Social Networks: Real Time Update
2012 Greece clashes
β’ Fresh news: search engine vs social network
10
12. Social Networks and Web 2.0: Why Are So
Important?
β’ Know what most of people think in Web
β’ Big data available almost free
β’ Great audience and big slice of the real
population
β’ Real time feedback of news
β’ Real time comments about tv shows or events
β’ Share contents with million people
β’ Clustering users according to age, interests, page
likes..
β’ β¦
12
13. Outline
β’ Social Networks and Web 2.0
οΌ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
β’ Sentiment analysis: an inside look
β’ TwitterSA
β’ TwitterSA Soccer Match Analysis
β’ TwitterSA Predicting Elections
13
14. What is Sentiment Analysis?
Theory
Β«Sentiment Analysis or Opinion Mining
is the computational study of opinions,
sentiments and emotions expressed in
text.
-- Bing Liu, 2010,
Β«Sentiment Analysis and SubjectivityΒ»
Practical
Using NLP, statistics, or Machine Learning methods to
extract and identify, how positive β negative is the sentiment
content expressed in a review, blog, discussion, news,
14
comment or in any other document.
15. Why are online opinions so important?
β’ Β«OpinionsΒ» are the influencers of our behaviours.
β’ Before Β«making a decisionΒ», we usually seek out the
opinion of the others:
β Buy a product, rent a car, reserve an hotel room,
looking for a good restaurantβ¦
ο± Individuals: seek opinions from friends and family
ο± Organizations: use surveys, opinion polls, consultants
15
17. Sentiment Analysis Application Areas
β Organization/brand
β’ Know the org/brand reputation in Web
β’ Know consumers opinion about a product
β’ Understand consumers needs
β’ β¦
β Individuals
β’ Make decision before buy something
β’ Know aggregate sentiment for a product review
β’ Find public opinion about person, politician, β¦
β Research Studies
β’ Predict political results
β’ Predict box office
β’ Citizen polls
17
18. Sentiment Analysis Application Areas
β Marketing 2.0
β’ Advertisement Placements:
β Place ads if one praises a product
β Place ads from competitor if one dislikes a product
β’ Join Sentiment and recommendation systems
β Know what kind of people praise a product (age,
interest..)
β’ How people are responding to a product release/ad
campaign
β Social tv
β’ Know what people think and audience about a tv
show
β’ Do interactive polls during tv journal
18
20. Sentiment Analysis Application Areas
Some tools can also measure the overall sentiment expressed in blogs and social networks
Example: An earthquake produced a lot of negative sentiments
20
23. Also a fascinating problem
ο§ Intellectually challenging & many applications
ο§ A popular research topic in recent years (Shanahan, Qu, and Wiebe,
2006 (edited book); Surveys - Pang and Lee 2008; Liu, 2006 and 2011;
2010)
ο§ More than 100 companies in USA alone
ο§ Many workshop and conference
ο§ http://sentimentsymposium.com/
ο§ www.gplsi.dlsi.ua.es/congresos/wassa2012/
ο§ A Large Research Area
ο§ Opinion mining, Text Mining
ο§ Sentiment and Subjectivity analysis
ο§ Artificial Intelligence
ο§ Natural Language Processing
ο§ Computational Linguistic
ο§ Etc.
23
24. Sentiment Analysis and Social Web
How to do that?
Easy: search the Web and find a Sentiment Analysis tools
β’ http://www.twitalyzer.com/index.asp
β’ http://twendz.waggeneredstrom.com/
β’ http://www.sentiment140.com/
β’ http://www.blogmeter.it
β’ http://twitrratr.com/
β’ http://www.socialmention.com
β’ http://www.lovewillconquer.co.uk/
β’ Hundred more..
And professional sites for companies
β’ www.radian6.com 24
β’ www.sysomos.com
25. Sentiment Analysis online tools
But you will find that Rita Levi Montalcini wasnβt
very popular
25
28. Outline
β’ Social Networks and Web 2.0
β’ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
οΌ Sentiment Analysis: an inside look
β’ TwitterSA
β’ TwitterSA Soccer Match Analysis
β’ TwitterSA Predicting Elections
28
29. What is an opinion (1)?
βI bought an iPhone a few days ago. It is such a nice phone. The
touch screen is really cool. The voice quality is clear too. It is
much better than my old Blackberry, which was a terrible
phone and so difficult to type with its tiny keys. However, my
mother was mad with me as I did not tell her before I bought
the phone. She also thought the phone was too expensive, β¦β
Looking at this review is possible to do:
β’ Document-level sentiment analysis: is this review + or -?
β’ Sentence-level sentiment analysis: is each sentence + or -?
β’ Entity-level sentiment analysis: is iPhone + or -? 29
30. What is an opinion (2)?
ο± An opinion is a quintuple
(π π , π ππ , ππ ππππ , π π , π π )
where
ο§ π π is the target entity (person, product, organization,
event or a generic topic)
ο§ π ππ is an aspect/feature of the entity
ο§ π π ππππ is the sentiment value of the opinion polarity :
usually positive, negative or neutral
ο§ β π is the opinion holder
ο§ π‘ π is the time when opinion is expressed
30
31. What is an opinion (3)?
Entity β Feature β Polarity β Opinion Holder β Time
β’ I bought an iPhone and the touch screen is really cool.
(Positive)
β’ My old Blackberry, which was a terrible phone and so
difficult to type with its tiny keys (Negative)
ο± In quintuples
(iPhone, touch screen, positive, Author, review data)
(Blackberry, keys, negative, Author, review data)
31
32. Sentiment Analysis is hard (1)!
Manage Negations
β’ Direct Negation: βI don't like my new Iphoneβ
β’ Ambiguous Negation: βNot only is this phone expensive but it's also
heavy and difficult to useβ
β’ Indirect Negation: βPerhaps it is a great phone, but I fail to see whyβ
Co-reference Resolution
β’ βWe watched the movie and went to dinner; it was awfulβ
What does βitβ refers to??
Slang and Writing Errors
β’ Shortform: nite (night), sayin (saying).
β’ Acronyms: lol (laugh out loud), iirc (if I remember correctly).
β’ Writing Errors: wouls(would), rediculous (ridiculous).
β’ Punctuation Errors: im (I'm), dont (don't).
β’ Slang: that was well mint (that was very good).
β’ Repeated Letters: that was soooooo greeeat (that was so great).
β’ Alphanumeric Words: 2night(tonight), str8(straight). 32
34. Sentiment Analysis is hard (3)!
Manage Comparative
β’ βFederer is better than Nadalβ
Federer (+)
Nadal (-)
Domain Dependent Opinion
β’ βThe battery life is longβ (+)
β’ βThe waiting time to enter at restaurant was too longβ (-)
More Challenges
β’ Opinion Spam
β’ Sarcasm
β’ More general complexity of natural language
β’ β¦
34
35. Sentiment Analysis is hard (4)!
ο§ A company posted an ad for writing fake reviews on amazon.com
(65 cents per review)
35
36. Sentiment Analysis: Known Approaches
Building opinion words lexicon
β’ Lexical Methods
β Manual approach
β Dictionary-based approach (Hu and Liu, 2004, Andreevskaia and
Bergler, 2006, Dragut et al 2010)
β Corpus-based approach (Hazivassiloglou and McKeown, 1997; Turney, 2002;
Yu and Hazivassiloglou, 2003; Kanayama and Nasukawa, 2006; Ding, Liu and Yu, 2008)
β’ Machine Learning
β Unsupervised learning (Hatzivassiloglou and McKeown 1997, Yu and
Hatzivassiloglou 2003)
β Supervised learning (Alec Go et al 2009, Pang β Lee 2002, 2010 Pak β Paroubek)
β Semi-supervised learning (Andreevskaia and Bergler, 2006 , Esuti and
Sebastiani, 2005 )
36
37. Sentiment Analysis: Known Approaches
Building opinion words lexicon
β’ Manual approach
β’ Pro: precision, no rules to define
β’ Cons: no automation, time for set up lexicon
β’ Dictionary-based approach
β’ Manual or prepared dictionary of positive β negative
words. Expand dictionary with synonyms and antonyms.
β’ Pro: faster, semi-automated
β’ Cons: low precision (synonyms: great -> excellent and
admirable but also -> large; big; fat)
β’ Corpus-based approach
β’ Seed set of positive β negative adjective (for example)
β’ Expand this set using grammar bindings
β’ Example: βthis car is beautiful and spaciousβ ; if is known
that beautiful is positive also spacious is positive.
β’ Pro: high automation, moderate precision 37
β’ Cons: attention to grammar rules, word set isnβt complete
38. Outline
β’ Social Networks and Web 2.0
β’ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
β’ S.A. an inside look
οΌ TwitterSA
β’ TwitterSA Soccer Match Analysis
β’ TwitterSA Predicting Elections
38
39. Twitter: The Social Network
β’ 140 char max status length
β’ Can add urls with multimedia
β’ 99% are public status
β’ No friend: followers and following
β’ Hashtag #
39
41. TwitterSA: Machine Learning
Goal: Classify text input in Positive or Negative
Supervised Algorithm
β’ Must provide a set of inputs (Text phrase) and the
appropriate outputs class (Positive or Negative) for
those inputs.
β’ Learning algorithm will train using those inputs.
After that is able to classify a new instance.
41
42. TwitterSA: Multinomial Naive Bayes
Naive Bayes Theorem
X = new text instance to classify
πͺ π . . πͺ π = possible class (Ex. Positive, Negative..)
π·(πΏ|πͺ π ) = products of probabilities that single attributes
of istance X appertein to class πΆ π
π·(πͺ π |πΏ) = probability that new instance X appartein to
class πΆ π
X
X P(C|X)
42
43. Multinomial Naive Bayes: A Worked Example
Doc Words vector Class (πΆ π )
Training 1 Love C1 = Pos
2 Almost hate C2 = Neg
3 Love C1 = Pos
Test 4 Almost Love ?
43
44. Multinomial Naive Bayes: A Worked Example
Doc Words vector Class (πΆ π )
Training 1 Love C1 = Pos
ππ π = 3 π€ππππ 2 Almost hate C2 = Neg
π πΆπ =
π 3 Love C1 = Pos
πππ’ππ‘ π, πΆ π + 1 Test 4 Almost Love ?
π π | πΆπ =
πππ’ππ‘ πΆ π + |π|
2 1
π· πͺπ = πππ = π· πͺπ = πππ =
3 3
44
45. Multinomial Naive Bayes: A Worked Example
Doc Words vector Class (πΆ π )
Training 1 Love C1 = Pos
ππ π = 3 π€ππππ 2 Almost hate C2 = Neg
π πΆπ =
π 3 Love C1 = Pos
πππ’ππ‘ π, πΆ π + 1 Test 4 Almost Love ?
π π | πΆπ =
πππ’ππ‘ πΆ π + |π|
2 1
π· πͺπ = πππ = π· πͺπ = πππ =
3 3
Conditional Probabilities
2+1
π πΏππ£π πππ ) = (2+3) = 3/5
0+1
π π΄ππππ π‘ πππ ) = = 1/5
(2+3)
0+1
π πΏππ£π πππ ) = (2+3) = 1/5
1+1 45
π π΄ππππ π‘ πππ ) = (2+3) = 2/5
46. Multinomial Naive Bayes: A Worked Example
Doc Words vector Class (πΆ π )
Training 1 Love C1 = Pos
ππ π = 3 π€ππππ 2 Almost hate C2 = Neg
π πΆπ =
π 3 Love C1 = Pos
πππ’ππ‘ π, πΆ π + 1 Test 4 Almost Love ?
π π | πΆπ =
πππ’ππ‘ πΆ π + |π|
2 1
π· πͺπ = πππ = π· πͺπ = πππ =
3 3
Conditional Probabilities
2+1
π πΏππ£π πππ ) = (2+3) = 3/5 Choosing a class
0+1
π π΄ππππ π‘ πππ ) = = 1/5
(2+3) π πππ πππ4) = 2/3 * 1/5 * 3/5 = 0,08
0+1
π πΏππ£π πππ ) = (2+3) = 1/5
π πππ πππ4) = 1/3 * 1/5 * 2/5 = 0,026
1+1 46
π π΄ππππ π‘ πππ ) = (2+3) = 2/5
47. Multinomial Naive Bayes: A Worked Example
Input corpus: attributes and classes
Training weights
47
48. TwitterSA: Collecting Corpus
β’ Big corpus for training with label annotation!!
β’ Different methods from corpus-based or dictionary-based
approach
β’ Collecting big sentiment corpus starting from noise label
β Bag of words for training bayesian learning algorithm
β Found that iPhone and :) can contain positive sentiment and :( the
contrary
β Discovered that also hashtags can be used as noise label
o βRecently I've started developing a love for
indie music ... #loveitβ
o βI have to say, I am so impressed with this
iPhone5. I will never ever go back to a Droid.
#loveit #happyβ
48
49. TwitterSA: Text Processing and
Normalization
TwitterSA process: many modules
Normalization of repeated letters and
alphanumeric
Discard terms with high Entropy and low
Salience
Manage negation for sentiment training
Convert slang words to normal form
Unigram, Bigram for training
49
50. TwitterSA: Vector of Feature (1)
LIWC Dictionary
MPQA Dictionary
Linguistic categories
Input: βHappy Birthday Steve
Jobs your iPhone is amazingβ
{Pos, Neg} LIWC Categories
{1,0} {posEmo, affect, ..}
50
51. TwitterSA: Vector of Feature (2)
POS Tag Description Example
CC conjunction and, but, or, &
POS Tagger CD cardinal number 1, three
Input: He is the best DT determiner the
Output: He|PRP is|VBP the|DT best|JJS JJ adjective green
JJR adjective, comparative greener
JJS adjective, superlative greenest
β¦ β¦ β¦
1
0,8
0,6 Negative Sentence Positive Sentence
Personal pronouns and possessive Adjective and superlative adverb.
0,4 Comparative adjective Proper Noun
Verbs in past tense
0,2
0
NNS
JJ
NNP
JJR
MD
CD
WP
POS
FW
PRP
TO
CC
-LRB-
RP
NN
PDT
RBR
JJS
RBS
-RRB-
PRP$
DT
RB
IN
WDT
UH
WRB
NNPS
VBZ
VBP
VB
VBG
VBN
VBD
-0,2
-0,4
-0,6
-0,8
Tag occurrence in positive and negative sentence 51
-1
52. TwitterSA: Vector of Feature (3)
Pattern Mask
βThe combination of one or more near tagβ
β’ Input
β’ He|PRP is|VBP the|DT best|JJS
β’ Example output
β’ PRP|VBP ; PRP|VBP|DT ; PRP|VBP|DT|JJS ; etc.
β’ Discover most frequency pattern mask in positive and negative
sentence.
{Pos, Neg}
For example an input PRP|VBP|DT|JJS
occurs almost in positive sentence {1,0}
52
53. TwitterSA: How Much is Accurate
β’ N-Fold Cross Validation (average results)
β’ Split corpus:
70% Training; 30% Test
β’ Manual corpus for testing
53
54. TwitterSA: Testing
Classification problem (not Retrieval)
Precision: is positive predictive value, or correctly classified instance
Recall: or Sensitivity is the proportion of actual positives which are
correctly identified as such
Confusion Matrix
Classified Classified
Positive Negative
Predicted TP FN
Positive
Predicted FP TN
Negative
54
55. Outline
β’ Social Networks and Web 2.0
β’ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
β’ S.A. an inside look
β’ TwitterSA
οΌ TwitterSA Soccer Match Analysis
β’ TwitterSA Predicting Elections
55
56. TwitterSA: Soccer Match Analysis
What
Monitor soccer match on Twitter
Milan β Inter, Seria A Season 2011-2012
Goal
β’ Understand with automatic sentiment analysis the behaviour of the match
β’ Who wins? How many goals?
56
57. TwitterSA: Soccer Match Analysis
Volume of tweets Inter and Milan
1600
1400 Match Start
1200
1000
800 Tweets
600
400
200
0
Mon Fri Sun Mon
57
58. TwitterSA: Soccer Match Analysis
Volume of tweets Inter and Milan
1600
1400 Match Start
Coaches Interview
1200
1000
800 Tweets
600
400
200
0
Mon Fri Sun Mon
58
59. TwitterSA: Soccer Match Analysis
Volume of tweets Inter and Milan
1600
1400 Match Start
Goal: which team?
Coaches Interview
1200
1000
800 Tweets
600
400
200
0
Mon Fri Sun Mon
59
60. TwitterSA: Soccer Match Analysis
Sentiment Analysis
90
Goal: which team?
85
80
% Positive Tweets
75
70
Milan pos
Inter pos
65
60
55
50
match day before tweet peak match end the day after
timeline
60
63. Outline
β’ Social Networks and Web 2.0
β’ Sentiment analysis: what is it?
β’ Sentiment analysis: applications
β’ S.A. an inside look
β’ TwitterSA
β’ TwitterSA Soccer Match Analysis
οΌ TwitterSA Predicting Elections
63
71. References
β’ Dr. Diana Maynard: Practical Sentiment Analysis
β’ Seth Grimes: Sentiment Analysis Symposium 2012
β’ B. Liu reference is available here:
http://www.cs.uic.edu/~liub/FBS/AAAI-2011-tutorial-references.pdf
β’ Best Survey about Sentiment Analysis: B. Liu βSentiment Analysis and
Subjectivityβ chapter is available here:
http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-
analysis.pdf
β’ Great tutorial for Sentiment Analysis:
http://sentiment.christopherpotts.net/
β’ Some images and statistics are taken from www.basistech.com,
www.nielsen.com
71
72. βQuando in codesto sentire ti senti veramente felice,
chiamalo pure come vuoi: chiamalo felicitΓ , cuore,
amore. Per questo io non ho nome alcuno.
Sentimento Γ¨ tutto! La parola Γ¨ soltanto suono e fumo.ββ
Johann Wolfgang von Goethe
72