 Francesco Colace, Massimo De Santo, Luca Greco
DIEM –Università degli Studi di Salerno
{fcolace, desanto, lgreco}@unisa...
 Web 2.0 (or Web X.Y) rules!
 Social Networks, Blogs, Microblogs, Reviews’
Collectors Sites: huge and terrific quantity ...
 Open issues:
o How to manage this information?
o How to extract the sentiment inside the data?
o How to understand somet...
 Brief introduction to the Sentiment Analysis
o Related Works
 Towards a Sentiment Analysis Framework
o The Proposed App...
 Sentiment:
o a thought, view, or attitude, especially based mainly on emotion instead
of reason
 Sentiment Analysis (as...
 Consumer information
o Product reviews (Amazon, e-Bay, …)
 Marketing
o Consumer attitudes
o Trends
 Politics
o Politic...
 What features adopt?
o Words
o Sentences
 How to interpret features for sentiment detection?
o As a bag of words
o By t...
 Naïve Bayes
 Maximum Entropy Classifier
 SVM
 Markov Blanket Classifier
 … … …
 Latent Dirichlet Allocation (LDA)
A...
 By the use of the Bag of Words approach, a document
can be represented as an ordered set of words
 Problems:
o What wor...
 The mixed Graph of Terms is a «graph based» representation
of documents
 In the proposed approach, a mixed Graph of Ter...
 In the proposed approach, in a mixed Graph of Terms two
different layers can be recognized:
 The Aggregator Layer: the ...
 In natural language processing, Latent Dirichlet Allocation (LDA) is a
generative model that allows sets of observations...
ACII 2013 – Geneva, 2-5 September 2013
ACII 2013 – Geneva, 2-5 September 2013
 Step_1: Learn a mixed Graph of Terms by the
use of labelled documents (i.e. Positive or
Negative) obtaining:
o mGT posit...
ACII 2013 – Geneva, 2-5 September 2013
 Dataset: Movie Reviews
Approach Accuracy
Support Vector Machine* 82,90
Naive Bayes* 81,50
Maximum Entropy* 81,00
mGT-LDA...
 Dataset: Real Tweets related to Politics
 Training Set: 3980 Tweets
 Test Set: 32185 Tweets
ACII 2013 – Geneva, 2-5 Se...
ACII 2013 – Geneva, 2-5 September 2013
http://193.205.190.209/elezioni2013/
ACII 2013 – Geneva, 2-5 September 2013
days
accuracy
ACII 2013 – Geneva, 2-5 September 2013
Masterchef - http://193.205.190.209/tvshow/masterchef/
 Pro:
o Indipendent from Language
o Fast classification
o Continous Upgrade
o Little Training Set
 Cons:
o In general, l...
 To improve the classification by the continous update of
the training set
 To Introduce SentiWordnet as Annotated lexic...
ACII 2013 – Geneva, 2-5 September 2013
Don’t forget to tweet your sentiment!!! 
Upcoming SlideShare
Loading in …5
×

A Probabilistic Approach to Tweets' Sentiment Classification - ACII 2013 Conference

573 views
418 views

Published on

A Probabilistic Approach to Tweets' Sentiment Classification - ACII 2013 Conference - Colace De Santo Greco

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
573
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

A Probabilistic Approach to Tweets' Sentiment Classification - ACII 2013 Conference

  1. 1.  Francesco Colace, Massimo De Santo, Luca Greco DIEM –Università degli Studi di Salerno {fcolace, desanto, lgreco}@unisa.it ACII 2013 – Geneva, 2-5 September 2013
  2. 2.  Web 2.0 (or Web X.Y) rules!  Social Networks, Blogs, Microblogs, Reviews’ Collectors Sites: huge and terrific quantity of heterogeneus and opinonated data ACII 2013 – Geneva, 2-5 September 2013
  3. 3.  Open issues: o How to manage this information? o How to extract the sentiment inside the data? o How to understand something about the users? o How to evaluate the opinion of people about some topics or products?  Sentiment Analysis ACII 2013 – Geneva, 2-5 September 2013
  4. 4.  Brief introduction to the Sentiment Analysis o Related Works  Towards a Sentiment Analysis Framework o The Proposed Approach • The LDAApproach • The Mixed Graph of Terms • A sentiment mining algorithm  Experimental results  Conclusions and Future Works ACII 2013 – Geneva, 2-5 September 2013
  5. 5.  Sentiment: o a thought, view, or attitude, especially based mainly on emotion instead of reason  Sentiment Analysis (as known as Opinion mining): o use of Natural Language Processing (NLP) and computational techniques to automate the extraction and classification of sentiment from unstructured texts ACII 2013 – Geneva, 2-5 September 2013
  6. 6.  Consumer information o Product reviews (Amazon, e-Bay, …)  Marketing o Consumer attitudes o Trends  Politics o Politicians want to know voters’ point of views o Voters want to know policitians’ stances and who else supports them  Social o Find like-minded individuals or communities ACII 2013 – Geneva, 2-5 September 2013
  7. 7.  What features adopt? o Words o Sentences  How to interpret features for sentiment detection? o As a bag of words o By the use of annotated lexicons o According to syntactic patterns o Analyzing the paragraph structure ACII 2013 – Geneva, 2-5 September 2013
  8. 8.  Naïve Bayes  Maximum Entropy Classifier  SVM  Markov Blanket Classifier  … … …  Latent Dirichlet Allocation (LDA) ACII 2013 – Geneva, 2-5 September 2013
  9. 9.  By the use of the Bag of Words approach, a document can be represented as an ordered set of words  Problems: o What words express better the sentiment in a text? o How to compare various «bag of words» derived from texts with the same sentiment? o By the use of the bag of words is it possible to represent the documents’ domain of interest? ACII 2013 – Geneva, 2-5 September 2013
  10. 10.  The mixed Graph of Terms is a «graph based» representation of documents  In the proposed approach, a mixed Graph of Terms is obtained by an automatic extraction of words based on probabilistic clustering techniques as Latent Dirichlet Allocation (LDA)  In a mixed Graph of Terms the words are linked according to their mutual occurence probability and «aggregating_word» and «aggregated_words» can be recognized  Our proposal: a mixed Graph of Terms can be used as a «sentiment filter» ACII 2013 – Geneva, 2-5 September 2013
  11. 11.  In the proposed approach, in a mixed Graph of Terms two different layers can be recognized:  The Aggregator Layer: the words with higher degree of interconnection with the words that are in the documents  The “Aggregated Words” Layer: this layer expresses words that have higher degree of interconnection with one or more Aggregator Word ACII 2013 – Geneva, 2-5 September 2013
  12. 12.  In natural language processing, Latent Dirichlet Allocation (LDA) is a generative model that allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar  For example, if observations are words collected into documents, it posits that each document is a mixture of a small number of topics and that each word's creation is attributable to one of the document's topics  The basic idea is that the documents are represented as random mixtures over latent topics, where a topic is characterized by a distribution over words  By the use of the Latent Dirichlet Allocation technique a set of documents can be represented as a mixed Graph of Terms ACII 2013 – Geneva, 2-5 September 2013
  13. 13. ACII 2013 – Geneva, 2-5 September 2013
  14. 14. ACII 2013 – Geneva, 2-5 September 2013
  15. 15.  Step_1: Learn a mixed Graph of Terms by the use of labelled documents (i.e. Positive or Negative) obtaining: o mGT positive o mGT negative  Step_2: Use the mixed Graph of Terms as filter in order to classify the sentiment of texts o Comparing concepts that are both in the mGTs both in the text o Comparing words that are both in the mGTs both in the text ACII 2013 – Geneva, 2-5 September 2013
  16. 16. ACII 2013 – Geneva, 2-5 September 2013
  17. 17.  Dataset: Movie Reviews Approach Accuracy Support Vector Machine* 82,90 Naive Bayes* 81,50 Maximum Entropy* 81,00 mGT-LDA 88,50 *[Bo Pang, 2002] ACII 2013 – Geneva, 2-5 September 2013
  18. 18.  Dataset: Real Tweets related to Politics  Training Set: 3980 Tweets  Test Set: 32185 Tweets ACII 2013 – Geneva, 2-5 September 2013 Approach Accuracy mGT-LDA 87,10 SVM 79,20 Naive Bayes 76,60
  19. 19. ACII 2013 – Geneva, 2-5 September 2013 http://193.205.190.209/elezioni2013/
  20. 20. ACII 2013 – Geneva, 2-5 September 2013 days accuracy
  21. 21. ACII 2013 – Geneva, 2-5 September 2013 Masterchef - http://193.205.190.209/tvshow/masterchef/
  22. 22.  Pro: o Indipendent from Language o Fast classification o Continous Upgrade o Little Training Set  Cons: o In general, long Time for mGT building process o An Annotated Lexicon is needed ACII 2013 – Geneva, 2-5 September 2013
  23. 23.  To improve the classification by the continous update of the training set  To Introduce SentiWordnet as Annotated lexicon  To adopt an ontological formalism for a better representation of the mGT  To build a bigger tweets’ dataset ACII 2013 – Geneva, 2-5 September 2013
  24. 24. ACII 2013 – Geneva, 2-5 September 2013 Don’t forget to tweet your sentiment!!! 

×