SA_presentation2_no_animations

Sentiment Extraction From Tweets
Multilingual Challenges
Nantia Makrynioti and Vasilis Vassalos
Athens University of Economics and Business
DaWaK 2015, Valencia Spain

Sentiment Extraction or Sentiment Analysis
The task of identifying opinion polarity in a text unit

This guide provides a thoughtful and thorough solution
to planning a vacation. Lots of great advice on every
aspect of the trip. (Amazon review)
Experts in France are due to begin examining part of a
wing that washed up on the island of Reunion to see if it
came from Flight MH370. (Sentence from an article)
Hey @delta !!! You promised to deliver my luggage that
you lost -3 hours ago!!?? I'm tired and I need my stufffff
(Tweet)
Examples

Brand monitoring
Political campaigns
Public opinion on signiﬁcant
events, such as the Arab Spring
Applications

Sentiment analysis of text
data in languages other
than English is important.

Related Work
• Sentiment analysis for English:
• Supervised machine learning approaches with various
features.
• Use of polarity lexicons and WordNet for synonyms/antonyms.
• Plenty of application domains: tweets, movie reviews, blog
posts, etc.
• Sentiment analysis for other languages: Dutch, French, Arabic,
Chinese.
• Comparative studies of sentiment analysis methods.

Contributions
1. A new supervised method for classiﬁcation of tweets
into positive, negative and neutral.
‣ Signiﬁcant time performance improvements over the state-of-the-
art methods of comparable quality.
2. A case study of sentiment analysis in Greek.
3. Annotation of a corpus of tweets in Greek to be used for
training and test data for this purpose1
.
4. Extensive evaluation results and comparisons with three
published methods on both Greek and English datasets.
1 The Greek data will be available at this url https://github.com/nantiamak/greek-tweets-sentiment in a few days or by emailing the author
(makriniotik@aueb.gr).

Approach
• Main steps of the proposed approach:
• Preprocessing of text
• Features
• Sentiment prediction
• Negation identiﬁcation

Preprocessing (1)
Removal of URL links,
mentions, RT, stop words
I will be participating in a
Google+ Hangout with
@TheEconomist tomorrow!
I'm SUPER excited! Check it
out: https://plus.google.com/
u/0/events/
cjead60p8q24qqtmu95cighg
eag
I will participating Google
+ Hangout tomorrow!
SUPER excited! Check:
Preprocessing
process
Original Text
Preprocessed
Text

Preprocessing (2)
Preprocessing
process
Original Text
Preprocessed
Text
Replacement of positive/
negative emoticons and
hashtags with two speciﬁc
emoticons
I just had a fantastic meal at
the lovely Vanessa's house!
<3 Then off to the middle of
nowhere aka New
Hampshire tomorrow!
#excited
I just had a fantastic meal
at the lovely Vanessa's
house! :) Then off to the
middle of nowhere aka
New Hampshire
tomorrow! :)

Preprocessing (3)
Preprocessing
process
Original Text
Preprocessed
Text
Removal of
accent marks
καλημέρα καλημερα

Preprocessing (4)
Preprocessing
process
Original Text
Preprocessed
Text
Stemming
Games played during the
week are now more interesting
than Super Sunday!
Game play dur the week are
now more interest than super
Sunday!

-Let’s train a classiﬁcation model!

-But machine learning algorithms
need numeric data.
We will translate each text to a numeric feature vector.

Features
• Unigram features generated from training data
(bag-of-words representation).
• Lexicon features based on subjective lexicons of
positive and negative words.
‣ Example: the number of words from the positive or the
negative lexicon.

Unigram Features
Vocabulary of training data
4th
and
best
Eli
freezing!
game!
in
is
it
like
manning
QB
quarter
Rugby
the
tonight
…
“Eli manning is the best 4th quarter QB in the game!”, positive
“Rugby tonight and it is like freezing! omg I may die”, negative
“i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral
<1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,positive>
<0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,negative>
Feature selection to keep a subset of unigrams

Lexicon features
Positive lexicon
admire
charm
beautiful
love
divine
best
marvelous
effective
…
Negative lexicon
suffer
monotonous
dead
pathetic
liar
afraid
vicious
weak
…
Two boolean features:
1. presence of words from the
positive lexicon,
2. presence of words from the
negative lexicon
<1,0,1,1,0,1,1,1,0,0,1,1,1,0,1,0…,1,0,positive>
<0,1,0,0,1,0,0,1,1,1,0,0,0,1,0,1…,0,0,negative>
“Eli manning best 4th quarter QB in the game!”, positive
“Rugby tonight and it's like freezing! omg I may die”, negative
“i recorded some Diablo III . And i will edit and upload it tomorrow”, neutral

Other features
…we tried but didn’t contribute that much.
• Part-of-speech features: number of adjectives,
number of verbs etc.
• Emoticon features: number of emoticons, number of
positive/negative emoticons.
• URL features: number/presence of URL links.

Numeric Feature Vectors
Training of Classiﬁcation Model

(SVM, Logistic Regression)
Sentiment Predictions on Unseen Data
✔
✔
✔

Negation Identification (1)
• A post-processing step, after the prediction of the
classifier.
• Based on patterns of part-of-speech tags and
negation words.
• Reversal of prediction from positive to negative and
vice versa, if one of these patterns is identified in a
tweet.

Negation Identiﬁcation (2)
(not)<verb><adjective>
The movie does not seem good.
Check if “good” is a feature of the vector and its value is 1. If yes and
sentiment prediction = positive -> prediction = negative

Experiments
Dataset Positive Negative Neutral Total
GR-train 1870 2940 3190 8000
GR-test 261 249 378 888
Sem-train 3287 1601 4175 9063
Sem-test 1572 601 1640 3813
• Two versions of the proposed method:
1. #Sentiment_v1 (SVM)
2. #Sentiment_v2 (Logistic Regression)
• Three published methods to compare2:
1. Barbosa_method (1)
2. Mohammad_method (2)
3. Go_method (3)
2 Since no code was available, we reimplemented all three other methods to the best of our ability.

Metrics
Example:
Predicted
“Positive”
Predicted
“Negative”
True
“Positive”
a b
True
“Negative”
c d
Precision_pos = a / (a+c)

Recall_pos = a / (a+b)
We compute the average of each metric for the three sentiment classes.

Evaluation on Greek data
Percentage
0
10
20
30
40
50
60
70
80
90
100
Methods
#Sentiment_v1 #Sentiment_v2 Mohammad_method Go_method Barbosa_method
47.4
61.7
67.3
65
68.6
51.4
61.3
68.5
64
67.9
62.562.7
66.867.669.8
Avg Precision % Avg Recall % Avg F1 %
#Sentiment_v1 and Mohammad_method are statistically
indistinguishable.
The difference between #Sentiment_v1 and #Sentiment_v2 is statistically signiﬁcant.
Thus, we keep only #Sentiment_v1 for upcoming results.

Statistical Significance
McNemar’s Test: statistical test to assess the significance
of difference between two proportions.
• Assume we train and test two classifiers on the same data.
‣ n00 = The number of instances misclassified by both classifiers
‣ n01 = The number of instances misclassified by classifier1 but
correctly classified by classifier2
‣ n10 = The number of instances misclassified by classifier2 but
correctly classified by classifier1
‣ n11 = The number of instances correctly classified by both
classifiers
• Compare the McNemar’s statistic with chi-squared distribution with
1 degree of freedom to check if the classifiers are different.

Evaluation on English data
Percentage
0
10
20
30
40
50
60
70
80
90
100
Methods
#Sentiment_v1 Mohammad_method Go_method Barbosa_method
56.556.4
66.6
64.2
55.955.5
71.6
62.5
57.9
61.8
65
72.7
Avg Precision % Avg Recall % Avg F1 %
#Sentiment_v1 and Mohammad_method are
again statistically indistinguishable.

Time Performance
Method
Predicted tweets/
sec
Training time (min)
#Sentiment_v1 16 tweets/sec 8.45 min
Mohammad_method 9 tweets/sec 14.91 min
Go_method 807 tweets/sec 5.9 min
Barbosa_method 8 tweets/sec 15 min
Setup: single machine with Intel Core i5 at 2.6 GHz, 16 GB RAM

Sensitivity Analysis
Modification
Avg F-score on
Greek
Avg F-score on
English
No modification 68.6% 64.2%
Without negation
identification
68.7% 64.1%
Without feature
selection
66.7% 62.2%
Without stemming 63.1% 62.2%

Conclusions and Future Work
• Comparable quality results with 43% less training and 44% less
prediction time.
• New pre/post-processing techniques needed for different
languages.
• Different effect of preprocessing steps in performance on Greek
and English.
• Future work: assignment of sentiment to the correct entity in a
tweet.

References
(1) Mohammad, S., Kiritchenko, S., Zhu, X.: Nrc-canada: building the state-of-the-art in
sentiment analysis of tweets. In: Second Joint Conference on Lexical and Computational
Semantics (*SEM), Proceedings of the Seventh International Workshop on Semantic
Evaluation (SemEval 2013), vol. 2, pp. 321–327 (2013)
(2) Barbosa, L., Feng, J.: Robust sentiment detection on twitter from biased and noisy data. In:
Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp.
36–44. Association for Computational Linguistics (2010)
(3) Go, A., Bhayani, R., Huang, L.: Twitter sentiment classiﬁcation using distant supervision.
Processing 150(12), 1–6 (2009)

SA_presentation2_no_animations

More Related Content

Similar to SA_presentation2_no_animations

SA_presentation2_no_animations