SlideShare a Scribd company logo
1 of 14
Yunchao He
Chin-Sheng Yang, Liang-Chih Yu, K. Robert Lai and Weiyi Liu.
 “unbelievably disappointing ”
 “Full of zany characters and richly applied satire, and some great
plot twists”
 “this is the greatest screwball comedy ever filmed”
 “ It was pathetic. The worst part about it was the boxing scenes.”
 Sentiment Analysis
 Using NLP, statistics, or machine learning methods to extract, identify, or
otherwise characterize the sentiment content of a text unit
 Sometimes called opinion mining, although the emphasis in this case is on
extraction
 Other names: Opinion extraction、Sentiment mining、Subjectivity analysis
2
3
 Movie: is this review positive or negative?
 Products: what do people think about the new iPhone?
 Public sentiment: how is consumer confidence? Is despair
increasing?
 Politics: what do people think about this candidate or issue?
 Prediction: predict election outcomes or market trends from
sentiment
4
 People express opinions in complex ways
 In opinion texts, lexical content alone can be misleading
 Intra-textual and sub-sentential reversals, negation, topic change
common
 Rhetorical devices/modes such as sarcasm, irony, implication, etc.
5
 Tokenization
 Feature Extraction: n-grams, semantics, syntactic, etc.
 Classification using different classifiers
 Naïve Bayes
 MaxEnt
 SVM
 Drawback
 Sparsity
 Context independent
S1: I really like this movie
[...0 0 1 1 1 1 1 0 0 ... ]
6
S1: This phone has a good keypad
S2: He will move and leave her for good
 Using clustering algorithm to aggregate short text to form long clusters,
in which each cluster has the same topic and the same sentiment
polarity, to reduce the sparsity of short text representation and keep
interpretation.
S1: it works perfectly! Love this product
S2: very pleased! Super easy to, I love it
S3: I recommend it
it works perfectly love this product very pleased super easy to I recommend
S1: [1 1 1 1 1 1 0 0 0 0 0 0 0]
S2: [0 0 0 1 0 0 1 1 1 1 1 1 0]
S3: [1 0 0 0 0 0 0 0 0 0 0 1 1]
S1+S2+S3: [...0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0...]
7
 Training data labeled with positive and negative polarity
 K-means clustering algorithm is used to cluster positive and
negative text separately.
 K-means, KNN, LDA…
works perfectly! Love this product
completely useless, return policy
very pleased! Super easy to, I am pleased
was very poor, it has failed
highly recommend it, high recommended!
it totally unacceptable, is so bad
works perfectly! Love this product
very pleased! Super easy to, I am pleased
highly recommend it, high recommended!
completely useless, return policy
was very poor, it has failed
it totally unacceptable, is so bad
Topical clusters
8
 Topical consistency: texts in each cluster have similar topic
 Sparsity reduced: The representation of topical clusters is more
dense than single text
 Easy to apply the idea to other area
9
Classifier: Multinomial Naive Bayes
Probabilistic classifier: get the probability of label given a clustered
text
,
1
arg max ( | )
arg max ( ) ( | )
Ci
i
s S
i j
s S j N
s P s C
P s P C s

  

 
$
( ) sN
P s
N

,
,
( , ) 1
( | )
( | ) | |
i j
i j
x V
N C s
P C s
N x s V




Bayes’ theory
Independent assumption
10
 Given an unlabeled text , we use Euclidean distance to find the
most similar positive cluster , and the most similar negative
cluster
 The sentiment of , is estimated according to the probabilistic
change of the two clusters when merging with . (vs. KNN)
 This merging operation is called two-stage-merging method, as each
unlabeled text will be merged two times.
0, | ( ) ( ) | | ( ) ( ) |
( )
1, .
m m n n
j
P NC P C P NC P C
f x
otherwise
   
   
 

mC 
jx
nC 
jx
jx
11
 Dataset: Stanford Twitter Sentiment Corpus (STS)
 Baseline: bag-of-unigrams and bigrams without clustering
 Evaluation Metrics: accuracy, precision, recall
 The average precision and accuracy is 1.7% and 1.3% higher than
the baseline method.
Methods Accuracy Precision Recall
Our Method 0.816 0.82 0.813
Bigrams 0.805 0.807 0.802
12
 We introduce a Clustering algorithm based method to reduce
sparsity problem for sentiment classification of short text
 This idea can be applied to other area
 The above method is just a prototype work and some technique can
be used to improve the model, including clustering algorithms,
distributed representation and the two-stage-merging method.
 Future works:
 Expanding this model use top-n similar clusters.
 Use distributed representation.
 Some deep learning model.
13
何云超 yunchaohe@gmail.com
Thank you
Q&A
14

More Related Content

Similar to Yunchao he icot2015

02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
Subhas Kumar Ghosh
 
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
icwe2015
 
AP Stats Procedures for Two Independent Samples
AP Stats Procedures for Two Independent SamplesAP Stats Procedures for Two Independent Samples
AP Stats Procedures for Two Independent Samples
June Patton
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Sherri Gunder
 

Similar to Yunchao he icot2015 (20)

Continuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep LearningContinuous Sentiment Intensity Prediction based on Deep Learning
Continuous Sentiment Intensity Prediction based on Deep Learning
 
02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis02 naive bays classifier and sentiment analysis
02 naive bays classifier and sentiment analysis
 
A measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysisA measure to evaluate latent variable model fit by sensitivity analysis
A measure to evaluate latent variable model fit by sensitivity analysis
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
Omsa
OmsaOmsa
Omsa
 
Sarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysisSarcasm Detection: Achilles Heel of sentiment analysis
Sarcasm Detection: Achilles Heel of sentiment analysis
 
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text...
 
AP Stats Procedures for Two Independent Samples
AP Stats Procedures for Two Independent SamplesAP Stats Procedures for Two Independent Samples
AP Stats Procedures for Two Independent Samples
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
Elementary statistical inference1
Elementary statistical inference1Elementary statistical inference1
Elementary statistical inference1
 
sa-mincut-aditya.ppt
sa-mincut-aditya.pptsa-mincut-aditya.ppt
sa-mincut-aditya.ppt
 
sa.ppt
sa.pptsa.ppt
sa.ppt
 
Collective sensing
Collective sensingCollective sensing
Collective sensing
 
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
 
Analyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning projectAnalyzing Movie Reviews : Machine learning project
Analyzing Movie Reviews : Machine learning project
 
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)Presenting Diverse Political Opinions: How and How Much (CHI 2010)
Presenting Diverse Political Opinions: How and How Much (CHI 2010)
 
Sampling and Sampling Distributions
Sampling and Sampling DistributionsSampling and Sampling Distributions
Sampling and Sampling Distributions
 
Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...Sentence level sentiment polarity calculation for customer reviews by conside...
Sentence level sentiment polarity calculation for customer reviews by conside...
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 

Recently uploaded

Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptxFactors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
vemusae
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
ednyonat
 
🔝9953056974 🔝Call Girls In Mehrauli Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Mehrauli  Escort Service Delhi NCR🔝9953056974 🔝Call Girls In Mehrauli  Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Mehrauli Escort Service Delhi NCR
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
anilsa9823
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
Delhi  99530 vip 56974  Genuine Escort Service Call Girls in MasudpurDelhi  99530 vip 56974  Genuine Escort Service Call Girls in Masudpur
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
anilsa9823
 
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure
 
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our EscortsVIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
sonatiwari757
 

Recently uploaded (20)

Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for site
 
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
Night 7k Call Girls Noida Sector 121 Call Me: 8448380779
 
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort ServiceVellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
Vellore Call Girls Service ☎ ️82500–77686 ☎️ Enjoy 24/7 Escort Service
 
Website research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazineWebsite research Powerpoint for Bauer magazine
Website research Powerpoint for Bauer magazine
 
Film the city investagation powerpoint :)
Film the city investagation powerpoint :)Film the city investagation powerpoint :)
Film the city investagation powerpoint :)
 
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptxFactors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
Factors-on-Authenticity-and-Validity-of-Evidences-and-Information.pptx
 
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptxDickinsonSlides teeeeeeeeeeessssssssssst.pptx
DickinsonSlides teeeeeeeeeeessssssssssst.pptx
 
🔝9953056974 🔝Call Girls In Mehrauli Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Mehrauli  Escort Service Delhi NCR🔝9953056974 🔝Call Girls In Mehrauli  Escort Service Delhi NCR
🔝9953056974 🔝Call Girls In Mehrauli Escort Service Delhi NCR
 
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
Top Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash PaymentTop Call Girls In Charbagh ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment
Top Call Girls In Charbagh ( Lucknow ) 🔝 8923113531 🔝 Cash Payment
 
Film show production powerpoint for site
Film show production powerpoint for siteFilm show production powerpoint for site
Film show production powerpoint for site
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
Delhi  99530 vip 56974  Genuine Escort Service Call Girls in MasudpurDelhi  99530 vip 56974  Genuine Escort Service Call Girls in Masudpur
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Masudpur
 
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 37 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
CALL ON ➥8923113531 🔝Call Girls Ashiyana Colony Lucknow best sexual service O...
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketing
 
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Rohini Sector 35 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
Night 7k Call Girls Atta Market Escorts Call Me: 8448380779
 
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Tilak Nagar ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our EscortsVIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
VIP Chandigarh Call Girls Service 7001035870 Enjoy Call Girls With Our Escorts
 
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking MenCall Girls In South Ex. Delhi O9654467111 Women Seeking Men
Call Girls In South Ex. Delhi O9654467111 Women Seeking Men
 
Ready to get noticed? Partner with Sociocosmos
Ready to get noticed? Partner with SociocosmosReady to get noticed? Partner with Sociocosmos
Ready to get noticed? Partner with Sociocosmos
 

Yunchao he icot2015

  • 1. Yunchao He Chin-Sheng Yang, Liang-Chih Yu, K. Robert Lai and Weiyi Liu.
  • 2.  “unbelievably disappointing ”  “Full of zany characters and richly applied satire, and some great plot twists”  “this is the greatest screwball comedy ever filmed”  “ It was pathetic. The worst part about it was the boxing scenes.”  Sentiment Analysis  Using NLP, statistics, or machine learning methods to extract, identify, or otherwise characterize the sentiment content of a text unit  Sometimes called opinion mining, although the emphasis in this case is on extraction  Other names: Opinion extraction、Sentiment mining、Subjectivity analysis 2
  • 3. 3
  • 4.  Movie: is this review positive or negative?  Products: what do people think about the new iPhone?  Public sentiment: how is consumer confidence? Is despair increasing?  Politics: what do people think about this candidate or issue?  Prediction: predict election outcomes or market trends from sentiment 4
  • 5.  People express opinions in complex ways  In opinion texts, lexical content alone can be misleading  Intra-textual and sub-sentential reversals, negation, topic change common  Rhetorical devices/modes such as sarcasm, irony, implication, etc. 5
  • 6.  Tokenization  Feature Extraction: n-grams, semantics, syntactic, etc.  Classification using different classifiers  Naïve Bayes  MaxEnt  SVM  Drawback  Sparsity  Context independent S1: I really like this movie [...0 0 1 1 1 1 1 0 0 ... ] 6 S1: This phone has a good keypad S2: He will move and leave her for good
  • 7.  Using clustering algorithm to aggregate short text to form long clusters, in which each cluster has the same topic and the same sentiment polarity, to reduce the sparsity of short text representation and keep interpretation. S1: it works perfectly! Love this product S2: very pleased! Super easy to, I love it S3: I recommend it it works perfectly love this product very pleased super easy to I recommend S1: [1 1 1 1 1 1 0 0 0 0 0 0 0] S2: [0 0 0 1 0 0 1 1 1 1 1 1 0] S3: [1 0 0 0 0 0 0 0 0 0 0 1 1] S1+S2+S3: [...0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0...] 7
  • 8.  Training data labeled with positive and negative polarity  K-means clustering algorithm is used to cluster positive and negative text separately.  K-means, KNN, LDA… works perfectly! Love this product completely useless, return policy very pleased! Super easy to, I am pleased was very poor, it has failed highly recommend it, high recommended! it totally unacceptable, is so bad works perfectly! Love this product very pleased! Super easy to, I am pleased highly recommend it, high recommended! completely useless, return policy was very poor, it has failed it totally unacceptable, is so bad Topical clusters 8
  • 9.  Topical consistency: texts in each cluster have similar topic  Sparsity reduced: The representation of topical clusters is more dense than single text  Easy to apply the idea to other area 9
  • 10. Classifier: Multinomial Naive Bayes Probabilistic classifier: get the probability of label given a clustered text , 1 arg max ( | ) arg max ( ) ( | ) Ci i s S i j s S j N s P s C P s P C s        $ ( ) sN P s N  , , ( , ) 1 ( | ) ( | ) | | i j i j x V N C s P C s N x s V     Bayes’ theory Independent assumption 10
  • 11.  Given an unlabeled text , we use Euclidean distance to find the most similar positive cluster , and the most similar negative cluster  The sentiment of , is estimated according to the probabilistic change of the two clusters when merging with . (vs. KNN)  This merging operation is called two-stage-merging method, as each unlabeled text will be merged two times. 0, | ( ) ( ) | | ( ) ( ) | ( ) 1, . m m n n j P NC P C P NC P C f x otherwise            mC  jx nC  jx jx 11
  • 12.  Dataset: Stanford Twitter Sentiment Corpus (STS)  Baseline: bag-of-unigrams and bigrams without clustering  Evaluation Metrics: accuracy, precision, recall  The average precision and accuracy is 1.7% and 1.3% higher than the baseline method. Methods Accuracy Precision Recall Our Method 0.816 0.82 0.813 Bigrams 0.805 0.807 0.802 12
  • 13.  We introduce a Clustering algorithm based method to reduce sparsity problem for sentiment classification of short text  This idea can be applied to other area  The above method is just a prototype work and some technique can be used to improve the model, including clustering algorithms, distributed representation and the two-stage-merging method.  Future works:  Expanding this model use top-n similar clusters.  Use distributed representation.  Some deep learning model. 13