An Approach to Subjectivity Detection on Twitter Using the Structured Information
1. Improving the Sentiment Analysis ...
DeustoTech - Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
September 28, 2016
An Approach to Subjectivity Detection on Twitter
Using the Structured Information
ICCCI 2016 - 8th International Conference on Computational Collective Intelligence
Juan Sixto, Aitor Almeida and Diego López-de-Ipiña
1
2. Improving the Sentiment Analysis ...
Overview
Introduction & Motivation
Related Work
Sentiment Analysis of Twitter Data
Structured and Unstructured Information
Experiments
Conclusions & Future Work
2/15
3. Improving the Sentiment Analysis ...
Introduction & Motivation
âș User-generated information of social networks
âș New algorithms and methods for their classiïŹcation.
âș The Sentiment Analysis (SA) methods.
âș Ranking algorithms as resources.
âș Microblogging and Twitter
âș One of the largest textual data sources.
âș Specific characteristics.
Introduction & Motivation 33/15
4. Improving the Sentiment Analysis ...
Introduction & Motivation
âș Can the Structured Information of Twitter be
used for sentiment analysis at global level?
âș How the Structured Information of Twitter is
classified?
âș What Structural features are useful to
subjectivity detection task?
Introduction & Motivation 44/15
5. Improving the Sentiment Analysis ...
Related Work
âș Contextual Applications in Sentiment Analysis
âș [Pennacchiotti and Popescu, 2011] Linguistic and social network.
âș [De Choudhury et al., 2013] User behavior to predict emotional states.
âș ClassiïŹcation algorithms
âș [Cortes and Vapnik, 1995] Support Vector Machine (SVM)
âș [Cox, 1958] Logistic Regression (LR)
âș [Friedman, 2001] Gradient Boosting ClassiïŹer (GBC)
âș Train and Test Dataset
âș [Villena-RomĂĄn et al., 2015] TASSâ15 General Corpus.
âș 7.219 (11%) Train / 60.798 (89%) Test.
âș Six diïŹerent polarity labels: P+, P, N+, N, NEU, NONE
Related Work 55/15
6. Improving the Sentiment Analysis ...
Sentiment Analysis of Twitter Data
Okapi BM25 ranking function 66/15
âș Sentiment Analysis (or Opinion Mining) is defined as the task
of finding the opinions of authors about specific entities.
âș Feldman, 2013
âș Twitter text corpora
âș Heterogeneous user-generated corpora
âș Open Domain
âș Noisy Text
7. Improving the Sentiment Analysis ...
Structured and Unstructured Information
Okapi BM25 ranking function 77/15
8. Improving the Sentiment Analysis ...
Adaptation of the algorithm
âș Four categories of attributes.
âș Text attributes
âș Hashtags, Links, Emoticons, Punctuation, Retweet,...
âș Tweet attributes
âș Quantity of retweets, creation date/time, associated
place,...
âș User attributes
âș Location, political affiliation, post habits,...
âș Topographic attributes
âș Modularity class of user, In-degree, Out-degree,
Communities,...
Okapi BM25 ranking function 88/15
9. Improving the Sentiment Analysis ...
Adaptation of the algorithm
âș Four categories of attributes.
âș Text attributes
âș Hashtags, Links, Emoticons, Punctuation, Retweet,...
âș Tweet attributes
âș Quantity of retweets, creation date/time, associated
place,...
âș User attributes
âș Location, political affiliation, post habits,...
âș Topographic attributes
âș Modularity class of user, In-degree, Out-degree,
Communities,...
Okapi BM25 ranking function 99/15
10. Improving the Sentiment Analysis ...
Experiments
Experiments 1010/15
âș Selected features to train a classifier
âș [Barbosa and Feng, 2010]
âș URL
âș Exclamation marks
âș Emoticons
âș Uppercase words
âș Uppercase Percent
âș Favorites
âș Modularity Class
âș Directed graph relations based on âFollowâ
âș Three communities formed by left/right/neutral ideologies.
âș Graph Degrees (In-Degree - Out-Degree)
âș Retweets (RTs)
âș Ellipsis
11. Improving the Sentiment Analysis ...
Experiments
â Meta-Information classifier
â GradientBoosting model
â Bag-of-Words classifier
â Logistic Regression model
â Meta-Information and Bag-of-Words classifier
â Matrix representation of structural features
â GradientBoosting model
â Meta-Information and Bag-of-Words Stacking Classifier
â Both models.
â Array of level-0 models.
â Logistic Regression model
Experiments 1111/15
12. Improving the Sentiment Analysis ...
Experiments
âș Test datasets : 60.798 items.
âș 6 categories: NONE,NEU,P,N,P+,N+
âș NONE: 20.54 % (Train) and 12,30 % (Test).
âș Performance measures:
âș Accuracy: true results / total dataset.
âș Macro averaged-F1: precision and recall.
âș NONE-F1: micro averaged F1 of the True labels.
Experiments 1212/15
13. Improving the Sentiment Analysis ...
Conclusions
Conclusions 1313/15
âș We have proposed a method which:
âș Adapt the contextual data to the global polarity detection
task.
âș Add new ways to use the contextual information.
âș We presented a contextual data classification.
âș We combined the structured and unstructured
information to complement the classification task.
14. Improving the Sentiment Analysis ...
Future Work
Future Work 1414/15
âș Improve the present system including:
âș More Twitter components and their relation with polarity.
âș Lexicons and semantic resources.
âș Extend the classifier to a global polarity task
âș Study the relation between structural data and other user
features.
16. Improving the Sentiment Analysis ...
All rights of images are reserved by the original
owners*, the rest of the content is licensed under a
Creative Commons by-sa 3.0 license.
17. Improving the Sentiment Analysis ...
DeustoTech - Deusto Institute of Technology, University of Deusto
http://www.morelab.deusto.es
An Approach to Subjectivity Detection on Twitter
Using the Structured Information
Juan Sixto, Aitor Almeida and Diego López-de-Ipiña
{jsixto, aitor.almeida, dipina }@deusto.es