SlideShare a Scribd company logo
1 of 32
Alleviating Data Sparsity For
Twitter Sentiment Analysis

        Hassan Saif, Yulan He & Harith Alani
Knowledge Media Institute, The Open University, Milton
             Keynes, United Kingdom



Making Sense of Microblogs – WWW2012 Conference
                   Lyon - France
Outline
•   Hello World
•   Motivation
•   Related Work
•   Semantic Features
•   Topic-Sentiment Features
•   Evaluation
•   Demos
•   The Future
Sentiment Analysis
“Sentiment analysis is the task of identifying
positive and negative opinions, emotions and
evaluations in text”


 The main dish was                         The main dish was
                     It is a Syrian dish
     delicious                             salty and horrible


    Opinion             Fact                   Opinion

                                                                3
Microblogging
• Service which allows subscribers to post short
  updates online and broadcast them

• Answers the question: What are you doing
  now?

• Twitter, Plurk, sfeed, Yammer, BlueTwi, etc.


                                                   4
Together!
Sense?
Sense?
                                           1400000


AGARWAL et al.
                                           1200000            1143562

BARBOSA, L., AND FENG, J.
                                           1000000
BIFET, A., AND FRANK, E.
                                            800000
DIAKOPOULOS, N., AND SHAMMA, D.                                                713178



GO et al.                                   600000



He & Saif                                   400000



PAK & PAROUBEK                              200000



                                   And           0
                                                                         1
                                  Many               Objective Tweets   Subjective Tweets

                                  Others
                                               UK General Elections Corpus
Why


                         Private Sectors


Because It is Critical
    df                                     Keep In Touch


                         Public Sectors
Related Work
Sentiment Analysis

                                        Lexical Based Approach
Text Classification Problem




                  Right Features
                                                Word Polarity




                                   Building Better Dictionary
   Machine Learning Approach
Twitter Sentiment Analysis

Challenges

    – The short length of status update

    – Language Variations

    – Open Social Environment
Twitter Sentiment Analysis

Related Work

  • Distant Supervision
      – Supervised classifiers trained from noisy labels

      – Tweets messages are labeled using emoticons

      – Data filtering process

  Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)
Twitter Sentiment Analysis

Related Work

  • Followers Graph & Label Propagation
       – Twitter follower graph (users, tweets, unigrams
         and hashtags)
       – Start with small number of labeled tweets
       – Applied label propagation method throughout the
         graph.

  Speriosu et al., (2009)
Twitter Sentiment Analysis

Related Work

    • Feature Engineering
         – Unigrams, bigrams, POS
         – Microblogging features
              • Hashtags
              • Emoticons
              • Abbreviations & Intensifiers


  Agsrwal et al., (2011) – Kouloumpis et al (2011)
So?
What Does sparsity mean?




  Training data contains many infrequent terms
What Does sparsity mean?




       Word frequency statistics
How!


Sentiment Topic Features
                                  Extracts semantically hidden concepts
                                  from      tweets     data     and    then
                                  incorporates them into supervised
Extract latent topics and         classifier training by interpolation
the associated topic
sentiment      from    the
tweets data which are
subsequently added into
the original feature space    Semantic Features
for supervised classifier
training
Semantic Features
Shallow Semantic Method


                                                                Sport
         @Stace_meister Ya, I have Rugby in an hour




    Sushi time for fabulous Jesse's last day on dragons den    Person




   Dear eBay, if I win I owe you a total 580.63 bye paycheck
                                                               Company
Sematic Features
Interpolation Method
Topic-Sentiment Features
Joint Sentiment Topic Model
JST1 is a four-layer generative model
which allows the detection of both
sentiment and topic simultaneously
from text.

The only supervision is word prior
polarity information which can be
obtained from MPQA subjectivity
lexicon.




Lin & He. 2009
Twitter Sentiment Corpus
                             Stanford University



Collected: the 6th of April & the 25th of June 2009


Training Set: 1.6 million tweets (Balanced)

Testing Set: 177 negative tweets & 182 positive tweets




http://twittersentiment.appspot.com/
Our Sentiment Corpus
• Training Set: 60K tweets
• Testing Set: 1000 tweets




• Annotated additional 640 tweets using
  Tweenator
Evaluation
                                                   Extended Test Set


Method                          Accuracy

Unigrams                        80.7%

Semantic replacement            76.3%

Semantic interpolation          84.0%

Sentiment-topic features        82.3%



  Sentiment classification results on the 1000-tweets test set
Evaluation
                                                              Stanford Dataset


       Method                            Accuracy
       Unigrams                          81.0%
       Semantic replacement              77.3%
       Sematic augmentation              80.45%
       Semantic interpolation            84.1%
       Sentiment-topic features          86.3%


       (Go et al., 2009)                 83%
       (Speriosu et al., 2011)           84.7%

Sentiment classification results on the original Stanford Twitter Sentiment test set
Evaluation

Semantic Features V.s Sentiment-Topic Features
Tweenator




http://tweenator.com
Conclusion
• Twitter sentiment analysis faces data sparsity problem due to
  some special characteristics of Twitter

• Semantic & topic-sentiment features reduce the sparsity
  problem and increase the performance significantly

• Sentiment-topic features should be preferred over semantic
  features for the sentiment classification task since it gives
  much better results with far less features.



                                                                  30
The Future
Future Work

                                                   Sentiment-Topic Model

        Enhance Entity Extraction Methods




                                 Attaching weight to extracted features

Semantic Smoothing Model



                                                  Statistical replacement
References
[1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of
the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38.

[2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING
(2010), pp. 36–44.

[3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp.
1–15.

[4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project
Report, Stanford (2009).

[5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of
the ICWSM (2011).

[5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on
Information and knowledge management (2009), ACM, pp. 375–384.
[6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010).

[7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International
Semantic Web Conference (ISWC) (2011).
Thank




        You

More Related Content

What's hot

Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]Sagar Ahire
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataIswarya M
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets🧑‍💻 Manuel Coppotelli
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisSagar Ahire
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysisprathako
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysisAshish Mundra
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysisharit66
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisAditya Joshi
 

What's hot (20)

Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]MTech Seminar Presentation [IIT-Bombay]
MTech Seminar Presentation [IIT-Bombay]
 
Sentiment analysis
Sentiment analysisSentiment analysis
Sentiment analysis
 
Sentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big DataSentiment analysis in Twitter on Big Data
Sentiment analysis in Twitter on Big Data
 
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and TweetsSentiCheNews - Sentiment Analysis on Newspapers and Tweets
SentiCheNews - Sentiment Analysis on Newspapers and Tweets
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Ontology based sentiment analysis
Ontology based sentiment analysisOntology based sentiment analysis
Ontology based sentiment analysis
 
Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .Sentimental Analysis of twitter data .
Sentimental Analysis of twitter data .
 
social network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysissocial network analysis project twitter sentimental analysis
social network analysis project twitter sentimental analysis
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
Adapting Sentiment Lexicons using Contextual Semantics for Sentiment Analysis...
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Final deck
Final deckFinal deck
Final deck
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
SemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment AnalysisSemEval - Aspect Based Sentiment Analysis
SemEval - Aspect Based Sentiment Analysis
 

Viewers also liked

Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Mat Morrison
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...Srivatsan Ramanujam
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012Michael Wilde
 
Political sentiment analysis using twitter data
Political sentiment analysis using twitter dataPolitical sentiment analysis using twitter data
Political sentiment analysis using twitter dataAmal Mahmoud
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisTharindu Kumara
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltkWei-Ting Kuo
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Jen Aman
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis worksCJ Jenkins
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisFabio Benedetti
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in TwitterAyushi Dalmia
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseDataWorks Summit
 
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisLecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisMarina Santini
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics supraja reddy
 
Sentiments Improvement
Sentiments ImprovementSentiments Improvement
Sentiments ImprovementMisha Kozik
 
Challenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisChallenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisAna Canhoto
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...Cataldo Musto
 

Viewers also liked (20)

Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")Negative Sentiment (or "Sentiment Analysis is Sh*te")
Negative Sentiment (or "Sentiment Analysis is Sh*te")
 
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
A Pipeline for Distributed Topic and Sentiment Analysis of Tweets on Pivotal ...
 
Social media & sentiment analysis splunk conf2012
Social media & sentiment analysis   splunk conf2012Social media & sentiment analysis   splunk conf2012
Social media & sentiment analysis splunk conf2012
 
Political sentiment analysis using twitter data
Political sentiment analysis using twitter dataPolitical sentiment analysis using twitter data
Political sentiment analysis using twitter data
 
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment AnalysisSupervised Learning Based Approach to Aspect Based Sentiment Analysis
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
 
Sentiment analysis-by-nltk
Sentiment analysis-by-nltkSentiment analysis-by-nltk
Sentiment analysis-by-nltk
 
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
Embrace Sparsity At Web Scale: Apache Spark MLlib Algorithms Optimization For...
 
How Sentiment Analysis works
How Sentiment Analysis worksHow Sentiment Analysis works
How Sentiment Analysis works
 
Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Tutorial of Sentiment Analysis
Tutorial of Sentiment AnalysisTutorial of Sentiment Analysis
Tutorial of Sentiment Analysis
 
Sentiment Analysis in Twitter
Sentiment Analysis in TwitterSentiment Analysis in Twitter
Sentiment Analysis in Twitter
 
Realtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBaseRealtime Sentiment Analysis Application Using Hadoop and HBase
Realtime Sentiment Analysis Application Using Hadoop and HBase
 
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of  TwitterOn Stopwords, Filtering and Data Sparsity for Sentiment Analysis of  Twitter
On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter
 
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment AnalysisLecture 3: Structuring Unstructured Texts Through Sentiment Analysis
Lecture 3: Structuring Unstructured Texts Through Sentiment Analysis
 
Sentiment analysis taxonomy_apr-12-2011
Sentiment analysis taxonomy_apr-12-2011Sentiment analysis taxonomy_apr-12-2011
Sentiment analysis taxonomy_apr-12-2011
 
NLP based Mining on Movie Critics
NLP based Mining on Movie Critics NLP based Mining on Movie Critics
NLP based Mining on Movie Critics
 
Sentiments Improvement
Sentiments ImprovementSentiments Improvement
Sentiments Improvement
 
Challenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysisChallenges of using Twitter for sentiment analysis
Challenges of using Twitter for sentiment analysis
 
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
A comparison of Lexicon-based approaches for Sentiment Analysis of microblog ...
 
Data mining tasks
Data mining tasksData mining tasks
Data mining tasks
 

Similar to Alleviating Data Sparsity for Twitter Sentiment Analysis

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsJinho Choi
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analyticsshengjing 孙胜晶
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET Journal
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using mlPravin Katiyar
 
Putting research to work for you sabatier
Putting research to work for you sabatierPutting research to work for you sabatier
Putting research to work for you sabatierLouannsabatier
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureIBM Watson
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx20211a05p7
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesSuvodeep Mazumdar
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine LearningIRJET Journal
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportFabien Gandon
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection ModelIRJET Journal
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmKhushboo Gupta
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET Journal
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 

Similar to Alleviating Data Sparsity for Twitter Sentiment Analysis (20)

Classifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer PairsClassifying Non-Referential It for Question Answer Pairs
Classifying Non-Referential It for Question Answer Pairs
 
NAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive AnalyticsNAACL Tutorial
Social Media Predictive Analytics
NAACL Tutorial
Social Media Predictive Analytics
 
IRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in TwitterIRJET-Sentiment Analysis in Twitter
IRJET-Sentiment Analysis in Twitter
 
Sentiment analysis using ml
Sentiment analysis using mlSentiment analysis using ml
Sentiment analysis using ml
 
Putting Research to Work for You
Putting Research to Work for YouPutting Research to Work for You
Putting Research to Work for You
 
Putting research to work for you sabatier
Putting research to work for you sabatierPutting research to work for you sabatier
Putting research to work for you sabatier
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Estimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens lawEstimating the overall sentiment score by inferring modus ponens law
Estimating the overall sentiment score by inferring modus ponens law
 
Extracting Semantic
Extracting Semantic Extracting Semantic
Extracting Semantic
 
Watson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the FutureWatson DevCon 2016 - From Jeopardy! to the Future
Watson DevCon 2016 - From Jeopardy! to the Future
 
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptxSampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
SampleLiteratureReviewTemplate_IVBTechIISEM_MajorProject.pptx
 
Extracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication ExchangesExtracting Semantic User Networks from Informal Communication Exchanges
Extracting Semantic User Networks from Informal Communication Exchanges
 
IRJET - Twitter Sentiment Analysis using Machine Learning
IRJET -  	  Twitter Sentiment Analysis using Machine LearningIRJET -  	  Twitter Sentiment Analysis using Machine Learning
IRJET - Twitter Sentiment Analysis using Machine Learning
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
IRJET - Cyberbulling Detection Model
IRJET -  	  Cyberbulling Detection ModelIRJET -  	  Cyberbulling Detection Model
IRJET - Cyberbulling Detection Model
 
sent_analysis_report
sent_analysis_reportsent_analysis_report
sent_analysis_report
 
Sentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes AlgorithmSentimental Analysis - Naive Bayes Algorithm
Sentimental Analysis - Naive Bayes Algorithm
 
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...IRJET-  	  A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
IRJET- A Review on: Sentiment Polarity Analysis on Twitter Data from Diff...
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 

Recently uploaded

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Alleviating Data Sparsity for Twitter Sentiment Analysis

  • 1. Alleviating Data Sparsity For Twitter Sentiment Analysis Hassan Saif, Yulan He & Harith Alani Knowledge Media Institute, The Open University, Milton Keynes, United Kingdom Making Sense of Microblogs – WWW2012 Conference Lyon - France
  • 2. Outline • Hello World • Motivation • Related Work • Semantic Features • Topic-Sentiment Features • Evaluation • Demos • The Future
  • 3. Sentiment Analysis “Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text” The main dish was The main dish was It is a Syrian dish delicious salty and horrible Opinion Fact Opinion 3
  • 4. Microblogging • Service which allows subscribers to post short updates online and broadcast them • Answers the question: What are you doing now? • Twitter, Plurk, sfeed, Yammer, BlueTwi, etc. 4
  • 7. Sense? 1400000 AGARWAL et al. 1200000 1143562 BARBOSA, L., AND FENG, J. 1000000 BIFET, A., AND FRANK, E. 800000 DIAKOPOULOS, N., AND SHAMMA, D. 713178 GO et al. 600000 He & Saif 400000 PAK & PAROUBEK 200000 And 0 1 Many Objective Tweets Subjective Tweets Others UK General Elections Corpus
  • 8. Why Private Sectors Because It is Critical df Keep In Touch Public Sectors
  • 10. Sentiment Analysis Lexical Based Approach Text Classification Problem Right Features Word Polarity Building Better Dictionary Machine Learning Approach
  • 11. Twitter Sentiment Analysis Challenges – The short length of status update – Language Variations – Open Social Environment
  • 12. Twitter Sentiment Analysis Related Work • Distant Supervision – Supervised classifiers trained from noisy labels – Tweets messages are labeled using emoticons – Data filtering process Go et al., (2009) - Barbosa and Fengl. (2010) – Pak and Paroubek (2010)
  • 13. Twitter Sentiment Analysis Related Work • Followers Graph & Label Propagation – Twitter follower graph (users, tweets, unigrams and hashtags) – Start with small number of labeled tweets – Applied label propagation method throughout the graph. Speriosu et al., (2009)
  • 14. Twitter Sentiment Analysis Related Work • Feature Engineering – Unigrams, bigrams, POS – Microblogging features • Hashtags • Emoticons • Abbreviations & Intensifiers Agsrwal et al., (2011) – Kouloumpis et al (2011)
  • 15. So?
  • 16. What Does sparsity mean? Training data contains many infrequent terms
  • 17. What Does sparsity mean? Word frequency statistics
  • 18. How! Sentiment Topic Features Extracts semantically hidden concepts from tweets data and then incorporates them into supervised Extract latent topics and classifier training by interpolation the associated topic sentiment from the tweets data which are subsequently added into the original feature space Semantic Features for supervised classifier training
  • 19. Semantic Features Shallow Semantic Method Sport @Stace_meister Ya, I have Rugby in an hour Sushi time for fabulous Jesse's last day on dragons den Person Dear eBay, if I win I owe you a total 580.63 bye paycheck Company
  • 21. Topic-Sentiment Features Joint Sentiment Topic Model JST1 is a four-layer generative model which allows the detection of both sentiment and topic simultaneously from text. The only supervision is word prior polarity information which can be obtained from MPQA subjectivity lexicon. Lin & He. 2009
  • 22. Twitter Sentiment Corpus Stanford University Collected: the 6th of April & the 25th of June 2009 Training Set: 1.6 million tweets (Balanced) Testing Set: 177 negative tweets & 182 positive tweets http://twittersentiment.appspot.com/
  • 23. Our Sentiment Corpus • Training Set: 60K tweets • Testing Set: 1000 tweets • Annotated additional 640 tweets using Tweenator
  • 24. Evaluation Extended Test Set Method Accuracy Unigrams 80.7% Semantic replacement 76.3% Semantic interpolation 84.0% Sentiment-topic features 82.3% Sentiment classification results on the 1000-tweets test set
  • 25. Evaluation Stanford Dataset Method Accuracy Unigrams 81.0% Semantic replacement 77.3% Sematic augmentation 80.45% Semantic interpolation 84.1% Sentiment-topic features 86.3% (Go et al., 2009) 83% (Speriosu et al., 2011) 84.7% Sentiment classification results on the original Stanford Twitter Sentiment test set
  • 26. Evaluation Semantic Features V.s Sentiment-Topic Features
  • 28. Conclusion • Twitter sentiment analysis faces data sparsity problem due to some special characteristics of Twitter • Semantic & topic-sentiment features reduce the sparsity problem and increase the performance significantly • Sentiment-topic features should be preferred over semantic features for the sentiment classification task since it gives much better results with far less features. 30
  • 30. Future Work Sentiment-Topic Model Enhance Entity Extraction Methods Attaching weight to extracted features Semantic Smoothing Model Statistical replacement
  • 31. References [1] AGARWAL, A., XIE, B., VOVSHA, I., RAMBOW, O., AND PASSONNEAU, R. Sentiment analysis of twitter data. In Proceedings of the ACL 2011 Workshop on Languages in Social Media (2011), pp. 30–38. [2] BARBOSA, L., AND FENG, J. Robust sentiment detection on twitter from biased and noisy data. In Proceedings of COLING (2010), pp. 36–44. [3] BIFET, A., AND FRANK, E. Sentiment knowledge discovery in twitter streaming data. In Discovery Science (2010), Springer, pp. 1–15. [4] GO, A., BHAYANI, R., AND HUANG, L. Twitter sentiment classification using distant supervision. CS224N Project Report, Stanford (2009). [5] KOULOUMPIS, E., WILSON, T., AND MOORE, J. Twitter sentiment analysis: The good the bad and the omg! In Proceedings of the ICWSM (2011). [5]LIN, C., AND HE, Y. Joint sentiment/topic model for sentiment analysis. In Proceeding of the 18th ACM conference on Information and knowledge management (2009), ACM, pp. 375–384. [6] PAK, A., AND PAROUBEK, P. Twitter as a corpus for sentiment analysis and opinion mining. Proceedings of LREC 2010 (2010). [7]SAIF, H., HE, Y., AND ALANI, H. Semantic Smoothing for Twitter Sentiment Analysis. In Proceeding of the 10th International Semantic Web Conference (ISWC) (2011).
  • 32. Thank You

Editor's Notes

  1. Can we extract sentiment from microblogs?Provide list of people who said users express more opinions on microblogscustomers are less controlled of their emotions when interacting in a social environment.So, yes we can do sentiment analysis over microblogs.
  2. Now, Why do we need to do it?Millionsof status updates and tweets messages are shared everyday among usersFor public sectors: social data can be used to predict public political opinion which make easier to policy makers to put political strategies based on that.For private sector: Companies can track public opinion about their products and services
  3. Two Main ApproachesLexical-based ApproachBuilding a better dictionaryFocuses on words and phrases as bearers of semantic orientationMachine Learning Approach Finding the Right FeaturesYet another form of text classification
  4. To overcome the noisy label data
  5. To overcome the noisy label data
  6. To overcome the noisy label data
  7. Previous work tried to overcome the sparsity problem partially and indirectly by either apply some data filtering or depending on different set of microblogs features. However,None of them studied the affect of data sparsity on the classification performance.
  8. Why: many words and terms in the testing corpus doesn’t appear in the training corpus.
  9. Compares the word frequency statistics of the tweets data we used in our experiments and the movie review dataX-axis shows the word frequency interval, e.g., words occur up to 10 times. (1-10), more than 10 times but up to 20 times (10-20), etc.Y-axis shows the percentage of words falls within certain word frequency interval.It can be observed that the tweets data are sparser than the movie review data since the former contain more infrequent words, with 93% of the words in the tweets data occurring less than 10 times (cf. 78% in the movie review data).
  10. §
  11. §