Sentiment Extraction From Tweets
Multilingual Challenges
Nantia Makrynioti and Vasilis Vassalos
Athens University of Economics and Business
DaWaK 2015, Valencia Spain
Sentiment Extraction or Sentiment Analysis
The task of identifying opinion polarity in a text unit
This guide provides a thoughtful and thorough solution
to planning a vacation. Lots of great advice on every
aspect of the trip. (Amazon review)
Experts in France are due to begin examining part of a
wing that washed up on the island of Reunion to see if it
came from Flight MH370. (Sentence from an article)
Hey @delta !!! You promised to deliver my luggage that
you lost -3 hours ago!!?? I'm tired and I need my stufffff
(Tweet)
Examples
Brand monitoring
Political campaigns
Public opinion on significant
events, such as the Arab Spring
Applications
Sentiment analysis of text
data in languages other
than English is important.
Related Work
• Sentiment analysis for English:
• Supervised machine learning approaches with various
features.
• Use of polarity lexicons and WordNet for synonyms/antonyms.
• Plenty of application domains: tweets, movie reviews, blog
posts, etc.
• Sentiment analysis for other languages: Dutch, French, Arabic,
Chinese.
• Comparative studies of sentiment analysis methods.
Contributions
1. A new supervised method for classification of tweets
into positive, negative and neutral.
‣ Significant time performance improvements over the state-of-the-
art methods of comparable quality.
2. A case study of sentiment analysis in Greek.
3. Annotation of a corpus of tweets in Greek to be used for
training and test data for this purpose1
.
4. Extensive evaluation results and comparisons with three
published methods on both Greek and English datasets.
1 The Greek data will be available at this url https://github.com/nantiamak/greek-tweets-sentiment in a few days or by emailing the author
(makriniotik@aueb.gr).
Approach
• Main steps of the proposed approach:
• Preprocessing of text
• Features
• Sentiment prediction
• Negation identification
Preprocessing (1)
Removal of URL links,
mentions, RT, stop words
I will be participating in a
Google+ Hangout with
@TheEconomist tomorrow!
I'm SUPER excited! Check it
out: https://plus.google.com/
u/0/events/
cjead60p8q24qqtmu95cighg
eag
I will participating Google
+ Hangout tomorrow!
SUPER excited! Check:
Preprocessing
process
Original Text
Preprocessed
Text
Preprocessing (2)
Preprocessing
process
Original Text
Preprocessed
Text
Replacement of positive/
negative emoticons and
hashtags with two specific
emoticons
I just had a fantastic meal at
the lovely Vanessa's house!
<3 Then off to the middle of
nowhere aka New
Hampshire tomorrow!
#excited
I just had a fantastic meal
at the lovely Vanessa's
house! :) Then off to the
middle of nowhere aka
New Hampshire
tomorrow! :)
Preprocessing (3)
Preprocessing
process
Original Text
Preprocessed
Text
Removal of
accent marks
καλημέρα καλημερα
Preprocessing (4)
Preprocessing
process
Original Text
Preprocessed
Text
Stemming
Games played during the
week are now more interesting
than Super Sunday!
Game play dur the week are
now more interest than super
Sunday!
-Let’s train a classification model! 

-But machine learning algorithms
need numeric data.
We will translate each text to a numeric feature vector.
Features
• Unigram features generated from training data
(bag-of-words representation).
• Lexicon features based on subjective lexicons of
positive and negative words.
‣ Example: the number of words from the positive or the
negative lexicon.
Unigram Features
Vocabulary of training data
4th
and
best
Eli
freezing!
game!
in
is
it
like
manning
QB
quarter
Rugby
the
tonight
…
“Eli manning is the best 4th quarter QB in the game!”, positive
“Rugby tonight and it is like freezing! omg I may die”, negative
“i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral
<1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,positive>
<0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,negative>
Feature selection to keep a subset of unigrams
Lexicon features
Positive lexicon
admire
charm
beautiful
love
divine
best
marvelous
effective
…
Negative lexicon
suffer
monotonous
dead
pathetic
liar
afraid
vicious
weak
…
Two boolean features:
1. presence of words from the
positive lexicon,
2. presence of words from the
negative lexicon
<1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,1,0,positive>
<0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,0,0,negative>
“Eli manning best 4th quarter QB in the game!”, positive
“Rugby tonight and it's like freezing! omg I may die”, negative
“i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral
Other features
…we tried but didn’t contribute that much.
• Part-of-speech features: number of adjectives,
number of verbs etc.
• Emoticon features: number of emoticons, number of
positive/negative emoticons.
• URL features: number/presence of URL links.
Numeric Feature Vectors
Training of Classification Model

(SVM, Logistic Regression)
Sentiment Predictions on Unseen Data
✔
✔
✔
Negation Identification (1)
• A post-processing step, after the prediction of the
classifier.
• Based on patterns of part-of-speech tags and
negation words.
• Reversal of prediction from positive to negative and
vice versa, if one of these patterns is identified in a
tweet.
Negation Identification (2)
(not)<verb><adjective>
The movie does not seem good.
Check if “good” is a feature of the vector and its value is 1. If yes and
sentiment prediction = positive -> prediction = negative
Experiments
Dataset Positive Negative Neutral Total
GR-train 1870 2940 3190 8000
GR-test 261 249 378 888
Sem-train 3287 1601 4175 9063
Sem-test 1572 601 1640 3813
• Two versions of the proposed method:
1. #Sentiment_v1 (SVM)
2. #Sentiment_v2 (Logistic Regression)
• Three published methods to compare2:
1. Barbosa_method (1)
2. Mohammad_method (2)
3. Go_method (3)
2 Since no code was available, we reimplemented all three other methods to the best of our ability.
Metrics
Example:
Predicted
“Positive”
Predicted
“Negative”
True
“Positive”
a b
True
“Negative”
c d
Precision_pos = a / (a+c)

Recall_pos = a / (a+b)
We compute the average of each metric for the three sentiment classes.
Evaluation on Greek data
Percentage
0
10
20
30
40
50
60
70
80
90
100
Methods
#Sentiment_v1 #Sentiment_v2 Mohammad_method Go_method Barbosa_method
47.4
61.7
67.3
65
68.6
51.4
61.3
68.5
64
67.9
62.562.7
66.867.669.8
Avg Precision % Avg Recall % Avg F1 %
#Sentiment_v1 and Mohammad_method are statistically
indistinguishable.
The difference between #Sentiment_v1 and #Sentiment_v2 is statistically significant.
Thus, we keep only #Sentiment_v1 for upcoming results.
Statistical Significance
McNemar’s Test: statistical test to assess the significance
of difference between two proportions.
• Assume we train and test two classifiers on the same data.
‣ n00 = The number of instances misclassified by both classifiers
‣ n01 = The number of instances misclassified by classifier1 but
correctly classified by classifier2
‣ n10 = The number of instances misclassified by classifier2 but
correctly classified by classifier1
‣ n11 = The number of instances correctly classified by both
classifiers
• Compare the McNemar’s statistic with chi-squared distribution with
1 degree of freedom to check if the classifiers are different.
Evaluation on English data
Percentage
0
10
20
30
40
50
60
70
80
90
100
Methods
#Sentiment_v1 Mohammad_method Go_method Barbosa_method
56.556.4
66.6
64.2
55.955.5
71.6
62.5
57.9
61.8
65
72.7
Avg Precision % Avg Recall % Avg F1 %
#Sentiment_v1 and Mohammad_method are
again statistically indistinguishable.
Time Performance
Method
Predicted tweets/
sec
Training time (min)
#Sentiment_v1 16 tweets/sec 8.45 min
Mohammad_method 9 tweets/sec 14.91 min
Go_method 807 tweets/sec 5.9 min
Barbosa_method 8 tweets/sec 15 min
Setup: single machine with Intel Core i5 at 2.6 GHz, 16 GB RAM
Sensitivity Analysis
Modification
Avg F-score on
Greek
Avg F-score on
English
No modification 68.6% 64.2%
Without negation
identification
68.7% 64.1%
Without feature
selection
66.7% 62.2%
Without stemming 63.1% 62.2%
Conclusions and Future Work
• Comparable quality results with 43% less training and 44% less
prediction time.
• New pre/post-processing techniques needed for different
languages.
• Different effect of preprocessing steps in performance on Greek
and English.
• Future work: assignment of sentiment to the correct entity in a
tweet.
Thank you!
Questions?
References
(1) Mohammad, S., Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in
sentiment analysis of tweets. In: Second Joint Conference on Lexical and Computational
Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic
Evaluation (SemEval 2013), vol. 2, pp. 321–327 (2013)
(2) Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In:
Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp.
36–44. Association for Computational Linguistics (2010)
(3) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision.
Processing 150(12), 1–6 (2009)

SA_presentation2_no_animations

  • 1.
    Sentiment Extraction FromTweets Multilingual Challenges Nantia Makrynioti and Vasilis Vassalos Athens University of Economics and Business DaWaK 2015, Valencia Spain
  • 2.
    Sentiment Extraction orSentiment Analysis The task of identifying opinion polarity in a text unit
  • 3.
    This guide providesa thoughtful and thorough solution to planning a vacation. Lots of great advice on every aspect of the trip. (Amazon review) Experts in France are due to begin examining part of a wing that washed up on the island of Reunion to see if it came from Flight MH370. (Sentence from an article) Hey @delta !!! You promised to deliver my luggage that you lost -3 hours ago!!?? I'm tired and I need my stufffff (Tweet) Examples
  • 4.
    Brand monitoring Political campaigns Publicopinion on significant events, such as the Arab Spring Applications
  • 5.
    Sentiment analysis oftext data in languages other than English is important.
  • 6.
    Related Work • Sentimentanalysis for English: • Supervised machine learning approaches with various features. • Use of polarity lexicons and WordNet for synonyms/antonyms. • Plenty of application domains: tweets, movie reviews, blog posts, etc. • Sentiment analysis for other languages: Dutch, French, Arabic, Chinese. • Comparative studies of sentiment analysis methods.
  • 7.
    Contributions 1. A newsupervised method for classification of tweets into positive, negative and neutral. ‣ Significant time performance improvements over the state-of-the- art methods of comparable quality. 2. A case study of sentiment analysis in Greek. 3. Annotation of a corpus of tweets in Greek to be used for training and test data for this purpose1 . 4. Extensive evaluation results and comparisons with three published methods on both Greek and English datasets. 1 The Greek data will be available at this url https://github.com/nantiamak/greek-tweets-sentiment in a few days or by emailing the author (makriniotik@aueb.gr).
  • 8.
    Approach • Main stepsof the proposed approach: • Preprocessing of text • Features • Sentiment prediction • Negation identification
  • 9.
    Preprocessing (1) Removal ofURL links, mentions, RT, stop words I will be participating in a Google+ Hangout with @TheEconomist tomorrow! I'm SUPER excited! Check it out: https://plus.google.com/ u/0/events/ cjead60p8q24qqtmu95cighg eag I will participating Google + Hangout tomorrow! SUPER excited! Check: Preprocessing process Original Text Preprocessed Text
  • 10.
    Preprocessing (2) Preprocessing process Original Text Preprocessed Text Replacementof positive/ negative emoticons and hashtags with two specific emoticons I just had a fantastic meal at the lovely Vanessa's house! <3 Then off to the middle of nowhere aka New Hampshire tomorrow! #excited I just had a fantastic meal at the lovely Vanessa's house! :) Then off to the middle of nowhere aka New Hampshire tomorrow! :)
  • 11.
  • 12.
    Preprocessing (4) Preprocessing process Original Text Preprocessed Text Stemming Gamesplayed during the week are now more interesting than Super Sunday! Game play dur the week are now more interest than super Sunday!
  • 13.
    -Let’s train aclassification model! -But machine learning algorithms need numeric data. We will translate each text to a numeric feature vector.
  • 14.
    Features • Unigram featuresgenerated from training data (bag-of-words representation). • Lexicon features based on subjective lexicons of positive and negative words. ‣ Example: the number of words from the positive or the negative lexicon.
  • 15.
    Unigram Features Vocabulary oftraining data 4th and best Eli freezing! game! in is it like manning QB quarter Rugby the tonight … “Eli manning is the best 4th quarter QB in the game!”, positive “Rugby tonight and it is like freezing! omg I may die”, negative “i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral <1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,positive> <0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,negative> Feature selection to keep a subset of unigrams
  • 16.
    Lexicon features Positive lexicon admire charm beautiful love divine best marvelous effective … Negativelexicon suffer monotonous dead pathetic liar afraid vicious weak … Two boolean features: 1. presence of words from the positive lexicon, 2. presence of words from the negative lexicon <1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,1,0,positive> <0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,0,0,negative> “Eli manning best 4th quarter QB in the game!”, positive “Rugby tonight and it's like freezing! omg I may die”, negative “i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral
  • 17.
    Other features …we triedbut didn’t contribute that much. • Part-of-speech features: number of adjectives, number of verbs etc. • Emoticon features: number of emoticons, number of positive/negative emoticons. • URL features: number/presence of URL links.
  • 18.
    Numeric Feature Vectors Trainingof Classification Model (SVM, Logistic Regression) Sentiment Predictions on Unseen Data ✔ ✔ ✔
  • 19.
    Negation Identification (1) •A post-processing step, after the prediction of the classifier. • Based on patterns of part-of-speech tags and negation words. • Reversal of prediction from positive to negative and vice versa, if one of these patterns is identified in a tweet.
  • 20.
    Negation Identification (2) (not)<verb><adjective> Themovie does not seem good. Check if “good” is a feature of the vector and its value is 1. If yes and sentiment prediction = positive -> prediction = negative
  • 21.
    Experiments Dataset Positive NegativeNeutral Total GR-train 1870 2940 3190 8000 GR-test 261 249 378 888 Sem-train 3287 1601 4175 9063 Sem-test 1572 601 1640 3813 • Two versions of the proposed method: 1. #Sentiment_v1 (SVM) 2. #Sentiment_v2 (Logistic Regression) • Three published methods to compare2: 1. Barbosa_method (1) 2. Mohammad_method (2) 3. Go_method (3) 2 Since no code was available, we reimplemented all three other methods to the best of our ability.
  • 22.
    Metrics Example: Predicted “Positive” Predicted “Negative” True “Positive” a b True “Negative” c d Precision_pos= a / (a+c) Recall_pos = a / (a+b) We compute the average of each metric for the three sentiment classes.
  • 23.
    Evaluation on Greekdata Percentage 0 10 20 30 40 50 60 70 80 90 100 Methods #Sentiment_v1 #Sentiment_v2 Mohammad_method Go_method Barbosa_method 47.4 61.7 67.3 65 68.6 51.4 61.3 68.5 64 67.9 62.562.7 66.867.669.8 Avg Precision % Avg Recall % Avg F1 % #Sentiment_v1 and Mohammad_method are statistically indistinguishable. The difference between #Sentiment_v1 and #Sentiment_v2 is statistically significant. Thus, we keep only #Sentiment_v1 for upcoming results.
  • 24.
    Statistical Significance McNemar’s Test:statistical test to assess the significance of difference between two proportions. • Assume we train and test two classifiers on the same data. ‣ n00 = The number of instances misclassified by both classifiers ‣ n01 = The number of instances misclassified by classifier1 but correctly classified by classifier2 ‣ n10 = The number of instances misclassified by classifier2 but correctly classified by classifier1 ‣ n11 = The number of instances correctly classified by both classifiers • Compare the McNemar’s statistic with chi-squared distribution with 1 degree of freedom to check if the classifiers are different.
  • 25.
    Evaluation on Englishdata Percentage 0 10 20 30 40 50 60 70 80 90 100 Methods #Sentiment_v1 Mohammad_method Go_method Barbosa_method 56.556.4 66.6 64.2 55.955.5 71.6 62.5 57.9 61.8 65 72.7 Avg Precision % Avg Recall % Avg F1 % #Sentiment_v1 and Mohammad_method are again statistically indistinguishable.
  • 26.
    Time Performance Method Predicted tweets/ sec Trainingtime (min) #Sentiment_v1 16 tweets/sec 8.45 min Mohammad_method 9 tweets/sec 14.91 min Go_method 807 tweets/sec 5.9 min Barbosa_method 8 tweets/sec 15 min Setup: single machine with Intel Core i5 at 2.6 GHz, 16 GB RAM
  • 27.
    Sensitivity Analysis Modification Avg F-scoreon Greek Avg F-score on English No modification 68.6% 64.2% Without negation identification 68.7% 64.1% Without feature selection 66.7% 62.2% Without stemming 63.1% 62.2%
  • 28.
    Conclusions and FutureWork • Comparable quality results with 43% less training and 44% less prediction time. • New pre/post-processing techniques needed for different languages. • Different effect of preprocessing steps in performance on Greek and English. • Future work: assignment of sentiment to the correct entity in a tweet.
  • 29.
  • 30.
    References (1) Mohammad, S.,Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in sentiment analysis of tweets. In: Second Joint Conference on Lexical and Computational Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), vol. 2, pp. 321–327 (2013) (2) Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36–44. Association for Computational Linguistics (2010) (3) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classification using distant supervision. Processing 150(12), 1–6 (2009)