Entity-oriented sentiment analysis
of tweets: results and problems
Natalia Loukachevitch
Lomonosov Moscow State University
Yuliya Rubtsova
A.P. Ershov Institute of Informatics Systems
Entity-Oriented analysis of tweets:
reputation monitoring
Sentiment
Analysis
In
general
Entity-
oriented
SentiRuEval 2014-2015
Aspect-oriented
analysis of reviews
• Restaurants
• Cars
Entity-Oriented analysis
of tweets: reputation
monitoring
• Banks [8]
• Telecom companies [7]
Testing of sentiment analysis systems
of Russian texts
SentiRuEval: Entity-Oriented
analysis of tweets
Reputation-oriented tweet may express
Task: to determine sentiment towards the mentioned
company
Participation
9 participants 33 runs
positive or negative opinion
about a company
positive or negative fact
concerning a company
SentiRuEval: Entity-Oriented
analysis of tweets
Training collection
5000 banking tweets
5000 telecom tweets
Test collection
4549 banking tweets
3845 telecom tweets
December
2013
February
2014
July
2014
August
2014
Test collection Train collection
Expert annotation
• Tweet considered as neutral
0
• Positive fact or opinion
1
• Negative fact or opinion
-1
• Positive and negative sentiments in
the same tweet
+-
• Meaningless
--
Annotation problem
Test data were annotated using the voting scheme
Agreement between 2 or 3 annotators
The number of
tweets with the
same labels from
at least 2 assessors
Full agreement The final
number
of tweets in the
test collection
Telecom 4 503 (90.06%) 2 233 (44.66%) 3 845
Banks 4 915 (98.3%) 3 818 (76.36%) 4 549
Distribution of messages in collections
according to sentiment classes
2397
973
1667
2816
413
944
Neutral Positive Negative
Telecom Training collecion
Gold standard test
collection
3569
410
2138
3592
350
670
Neutral Positive Negative
Banks Training collecion
Gold standard test
collection
Quality measure
macro-average F-measure:
F-measure of the
positive class
F-measure of the
negative class
+
2
ignored F-measure of neutral class
this does not reduce the task to the two-class prediction
Additionally micro-average F-measures were
calculated for two sentiment classes
Results
Run id Macro F Micro F
Baseline 0.1823 0.337
2 0.4882 0.5355
3 0.4804 0.5094
4 0.467 0.506
Run id Macro F Micro F
Baseline 0.1267 0.2377
4 0.3598 0.343
10 0.352 0.337
2 0.3354 0.3656
Top 3 results for telecom
tweets
Top 3 results for bank
tweets
Manual labeling of participant for telecom domain
Macro-F – 0.703
Micro-F – 0.7487
Classification methods
•lemmas and syntactic links presented as triples (head word,
dependent word, type of relation)
2
•rule-based approach accounting syntactic relations between
sentiment words and the target entities
3
•maximum entropy method on the basis of word n-grams, symbol n-
grams, and topic modeling results.
4
•word n-grams, letter n-grams, emoticons, punctuation marks,
smilies, a manual sentiment vocabulary, and automatically
generated sentiment list based on (PMI) of a word occurrences in
positive or negative training subsets.
10
Classification methods
SVM + syntactic relations
Linguistic syntax-based pattern (without
machine learning)
Maxent, SVM using various features
Explaining the difference in the
perfomance in two domains
Best results in banking and telecom domains are
different: 0.36 vs. 0.488
Difference between training and test collections:
Kullback-Leibler divergence
Explaining the difference in the
performance in two domains
The topics of reputation-oriented tweets greatly
depend on positive or negative events with
the regard of the target entities
Problems of reputation
analysis of tweets
In any moment some events influencing reputation can
occur => absence in training data
Test collections. December 2013-
February 2014. Ukraine events did
not influence target entities
Train collections in both domains.
July-August 2014 after Ukraine
events 2013-2014 Sanctions
against banks. Problems with
communication in Crimea
Analyzing difficult tweets
71 tweets in the
banking domain
wrongly classified by all
participants
85 tweets in the
telecom domain
difficult for almost all
participants (maximum 2
systems were correct)
First group. 1.1
Contains evident sentiment words
(such as понравиться – to like)
that were absent in the training set
General vocabulary of
Russian sentiment words could help
First group. 1.2
Contains words expressing well-known positive
or negative situations such as theft or murder
but absent in the training collection
General vocabulary of connotative
words would be useful
First group. 1.3
Tweets contains words and phrases describing
current events, concerning the current news
flow
Parallel analysis of the current news, revealing
correlations between tweet words and general
sentiment and connotation vocabularies in
news texts
Second group
Misclassified tweets includes
tweets that are really complicated
Mention more than one entity with
different attitudes
Several sentiment words with different
polarity orientation
Contain irony
vocabularies M-L
framework
30% Tweet in
Bank collection
15% Tweet in
Telecom collection
Were systems entity-oriented?
Test tweets mentioning two or more entities
• 58 tweets in the banking domain (15 tweets with different
polarity labels),
• 232 tweets in the telecom domain (71 tweets with
different polarity labels)
3 of 9 participants considered the task as
entity-oriented one
• Other participants always assigned the same polarity
class to all entities mentioned in a tweet
Performance
• Worse than for all tweets on average
• Entity-oriented approaches did not achieve better
results
Conclusion
We described the tasks, approaches and results in
SentiRuEval testing
– High dependence from train collections
– High impact from current dramatic events
– Capability to do entity-oriented analysis is quite restricted
– large impact for improving results can be based on
integration of a general sentiment vocabulary and a
general vocabulary of connotative words
– The most participants solved the general task of tweet
classification;
– Entity-oriented approaches did not achieve better results.
All prepared materials are accessible for research purposes
http://goo.gl/qHeAVo
Thank you!
You can help us to assess
tweets for SentiRuEval-2016
http://sentimeter.ru/assess/texts/
Yuliya Rubtsova

Entity-oriented sentiment analysis of tweets: results and problems

  • 1.
    Entity-oriented sentiment analysis oftweets: results and problems Natalia Loukachevitch Lomonosov Moscow State University Yuliya Rubtsova A.P. Ershov Institute of Informatics Systems
  • 2.
    Entity-Oriented analysis oftweets: reputation monitoring Sentiment Analysis In general Entity- oriented
  • 3.
    SentiRuEval 2014-2015 Aspect-oriented analysis ofreviews • Restaurants • Cars Entity-Oriented analysis of tweets: reputation monitoring • Banks [8] • Telecom companies [7] Testing of sentiment analysis systems of Russian texts
  • 4.
    SentiRuEval: Entity-Oriented analysis oftweets Reputation-oriented tweet may express Task: to determine sentiment towards the mentioned company Participation 9 participants 33 runs positive or negative opinion about a company positive or negative fact concerning a company
  • 5.
    SentiRuEval: Entity-Oriented analysis oftweets Training collection 5000 banking tweets 5000 telecom tweets Test collection 4549 banking tweets 3845 telecom tweets December 2013 February 2014 July 2014 August 2014 Test collection Train collection
  • 6.
    Expert annotation • Tweetconsidered as neutral 0 • Positive fact or opinion 1 • Negative fact or opinion -1 • Positive and negative sentiments in the same tweet +- • Meaningless --
  • 7.
    Annotation problem Test datawere annotated using the voting scheme Agreement between 2 or 3 annotators The number of tweets with the same labels from at least 2 assessors Full agreement The final number of tweets in the test collection Telecom 4 503 (90.06%) 2 233 (44.66%) 3 845 Banks 4 915 (98.3%) 3 818 (76.36%) 4 549
  • 8.
    Distribution of messagesin collections according to sentiment classes 2397 973 1667 2816 413 944 Neutral Positive Negative Telecom Training collecion Gold standard test collection 3569 410 2138 3592 350 670 Neutral Positive Negative Banks Training collecion Gold standard test collection
  • 9.
    Quality measure macro-average F-measure: F-measureof the positive class F-measure of the negative class + 2 ignored F-measure of neutral class this does not reduce the task to the two-class prediction Additionally micro-average F-measures were calculated for two sentiment classes
  • 10.
    Results Run id MacroF Micro F Baseline 0.1823 0.337 2 0.4882 0.5355 3 0.4804 0.5094 4 0.467 0.506 Run id Macro F Micro F Baseline 0.1267 0.2377 4 0.3598 0.343 10 0.352 0.337 2 0.3354 0.3656 Top 3 results for telecom tweets Top 3 results for bank tweets Manual labeling of participant for telecom domain Macro-F – 0.703 Micro-F – 0.7487
  • 11.
    Classification methods •lemmas andsyntactic links presented as triples (head word, dependent word, type of relation) 2 •rule-based approach accounting syntactic relations between sentiment words and the target entities 3 •maximum entropy method on the basis of word n-grams, symbol n- grams, and topic modeling results. 4 •word n-grams, letter n-grams, emoticons, punctuation marks, smilies, a manual sentiment vocabulary, and automatically generated sentiment list based on (PMI) of a word occurrences in positive or negative training subsets. 10
  • 12.
    Classification methods SVM +syntactic relations Linguistic syntax-based pattern (without machine learning) Maxent, SVM using various features
  • 13.
    Explaining the differencein the perfomance in two domains Best results in banking and telecom domains are different: 0.36 vs. 0.488 Difference between training and test collections: Kullback-Leibler divergence
  • 14.
    Explaining the differencein the performance in two domains The topics of reputation-oriented tweets greatly depend on positive or negative events with the regard of the target entities
  • 15.
    Problems of reputation analysisof tweets In any moment some events influencing reputation can occur => absence in training data Test collections. December 2013- February 2014. Ukraine events did not influence target entities Train collections in both domains. July-August 2014 after Ukraine events 2013-2014 Sanctions against banks. Problems with communication in Crimea
  • 16.
    Analyzing difficult tweets 71tweets in the banking domain wrongly classified by all participants 85 tweets in the telecom domain difficult for almost all participants (maximum 2 systems were correct)
  • 17.
    First group. 1.1 Containsevident sentiment words (such as понравиться – to like) that were absent in the training set General vocabulary of Russian sentiment words could help
  • 18.
    First group. 1.2 Containswords expressing well-known positive or negative situations such as theft or murder but absent in the training collection General vocabulary of connotative words would be useful
  • 19.
    First group. 1.3 Tweetscontains words and phrases describing current events, concerning the current news flow Parallel analysis of the current news, revealing correlations between tweet words and general sentiment and connotation vocabularies in news texts
  • 20.
    Second group Misclassified tweetsincludes tweets that are really complicated Mention more than one entity with different attitudes Several sentiment words with different polarity orientation Contain irony
  • 21.
    vocabularies M-L framework 30% Tweetin Bank collection 15% Tweet in Telecom collection
  • 22.
    Were systems entity-oriented? Testtweets mentioning two or more entities • 58 tweets in the banking domain (15 tweets with different polarity labels), • 232 tweets in the telecom domain (71 tweets with different polarity labels) 3 of 9 participants considered the task as entity-oriented one • Other participants always assigned the same polarity class to all entities mentioned in a tweet Performance • Worse than for all tweets on average • Entity-oriented approaches did not achieve better results
  • 23.
    Conclusion We described thetasks, approaches and results in SentiRuEval testing – High dependence from train collections – High impact from current dramatic events – Capability to do entity-oriented analysis is quite restricted – large impact for improving results can be based on integration of a general sentiment vocabulary and a general vocabulary of connotative words – The most participants solved the general task of tweet classification; – Entity-oriented approaches did not achieve better results. All prepared materials are accessible for research purposes http://goo.gl/qHeAVo
  • 24.
    Thank you! You canhelp us to assess tweets for SentiRuEval-2016 http://sentimeter.ru/assess/texts/ Yuliya Rubtsova

Editor's Notes

  • #3 In general: sentiment of the whole document, fragment or sentence Entity-oriented Sentiment about a specific entity Politician, political party Company etc. Sentiment about specific parts or properties of an entity (aspects) Переходи в Билайн. «Все за 300» — отличный тариф!
  • #5 The goal of the Twitter sentiment analysis at SentiRuEval was to find tweets influencing the reputation of a company in two domains
  • #6 The datasets were collected with Streaming API Twitter
  • #7 To prepare the datasets, 20,000 messages were labeled including 5,000 messages in each domain for training and test collections Each collection was labeled at least by two assessors. The gold standard test collections were labeled by three assessors. Irrelevant or unclear messages were removed from the training and test sets.
  • #8 To avoid inconsistency and disputes, the voting scheme was applied to the test collections labeling
  • #9  We noticed that sometimes users do not want to be rude and add positive emoticons to clearly negative or ironic messages. That is why simple methods based on extraction of emoticons, which are used for classification on the whole tweet level, do not work well
  • #10 Main quality measure:
  • #11 The baselines are based on the majority reputation-oriented category (negative one in this case). one of the participants fulfilled independent expert labeling of telecom tweets which can be considered as the maximum possible performance of automated systems in this task.
  • #12  Most participants used the SVM classification method.
  • #13  Most participants used the SVM classification method.
  • #14 we computed the Kullback-Leibler divergence to compare the difference of word probability distributions in the test collections in relation to the training collections
  • #18  includes tweets that were misclassified because of the restricted size of the training collection, which did not contain appropriate training
  • #19 These words are usually considered as neutral, not-opinionated, but having positive or negative associations (so called connotations). For solving these problems, a general vocabulary of connotative words would be useful because the appearance of these words in connection with a company influences its reputation.
  • #20 Problematic tweets contains words and phrases describing current events, concerning the current news flow. The apperance of some events and their influence the company’s reputation are very difficult to predict, their mentioning will always be absent in the training collection. In this case, the parallel analysis of the current news, revealing correlations between tweet words and general sentiment and connotation vocabulaties in news texts, can help.
  • #22 It means that integration of various vocabularies into the machine-learning framework can improve the performance of reputation-oriented automatic systems