Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Entity-oriented sentiment analysis of tweets: results and problems

This is summarization of the results of the reputation-oriented
Twitter task, which was held as part of SentiRuEval evaluation of Russian sentiment-analysis systems. The tweets in two domains: telecom companies and banks - were included in the evaluation. The task was to determine if an author of a tweet has a positive or negative attitude to a company mentioned in the message. The main issue of this paper is to analyze the current state and problems of approaches applied by the participants.

  • Login to see the comments

  • Be the first to like this

Entity-oriented sentiment analysis of tweets: results and problems

  1. 1. Entity-oriented sentiment analysis of tweets: results and problems Natalia Loukachevitch Lomonosov Moscow State University Yuliya Rubtsova A.P. Ershov Institute of Informatics Systems
  2. 2. Entity-Oriented analysis of tweets: reputation monitoring Sentiment Analysis In general Entity- oriented
  3. 3. SentiRuEval 2014-2015 Aspect-oriented analysis of reviews • Restaurants • Cars Entity-Oriented analysis of tweets: reputation monitoring • Banks [8] • Telecom companies [7] Testing of sentiment analysis systems of Russian texts
  4. 4. SentiRuEval: Entity-Oriented analysis of tweets Reputation-oriented tweet may express Task: to determine sentiment towards the mentioned company Participation 9 participants 33 runs positive or negative opinion about a company positive or negative fact concerning a company
  5. 5. SentiRuEval: Entity-Oriented analysis of tweets Training collection 5000 banking tweets 5000 telecom tweets Test collection 4549 banking tweets 3845 telecom tweets December 2013 February 2014 July 2014 August 2014 Test collection Train collection
  6. 6. Expert annotation • Tweet considered as neutral0 • Positive fact or opinion1 • Negative fact or opinion-1 • Positive and negative sentiments in the same tweet+- • Meaningless--
  7. 7. Annotation problem Test data were annotated using the voting scheme Agreement between 2 or 3 annotators The number of tweets with the same labels from at least 2 assessors Full agreement The final number of tweets in the test collection Telecom 4 503 (90.06%) 2 233 (44.66%) 3 845 Banks 4 915 (98.3%) 3 818 (76.36%) 4 549
  8. 8. Distribution of messages in collections according to sentiment classes 2397 973 1667 2816 413 944 Neutral Positive Negative Telecom Training collecion Gold standard test collection 3569 410 2138 3592 350 670 Neutral Positive Negative Banks Training collecion Gold standard test collection
  9. 9. Quality measure macro-average F-measure: F-measure of the positive class F-measure of the negative class + 2 ignored F-measure of neutral class this does not reduce the task to the two-class prediction Additionally micro-average F-measures were calculated for two sentiment classes
  10. 10. Results Run id Macro F Micro F Baseline 0.1823 0.337 2 0.4882 0.5355 3 0.4804 0.5094 4 0.467 0.506 Run id Macro F Micro F Baseline 0.1267 0.2377 4 0.3598 0.343 10 0.352 0.337 2 0.3354 0.3656 Top 3 results for telecom tweets Top 3 results for bank tweets Manual labeling of participant for telecom domain Macro-F – 0.703 Micro-F – 0.7487
  11. 11. Classification methods •lemmas and syntactic links presented as triples (head word, dependent word, type of relation)2 •rule-based approach accounting syntactic relations between sentiment words and the target entities3 •maximum entropy method on the basis of word n-grams, symbol n- grams, and topic modeling results.4 •word n-grams, letter n-grams, emoticons, punctuation marks, smilies, a manual sentiment vocabulary, and automatically generated sentiment list based on (PMI) of a word occurrences in positive or negative training subsets. 10
  12. 12. Classification methods SVM + syntactic relations Linguistic syntax-based pattern (without machine learning) Maxent, SVM using various features
  13. 13. Explaining the difference in the perfomance in two domains Best results in banking and telecom domains are different: 0.36 vs. 0.488 Difference between training and test collections: Kullback-Leibler divergence
  14. 14. Explaining the difference in the performance in two domains The topics of reputation-oriented tweets greatly depend on positive or negative events with the regard of the target entities
  15. 15. Problems of reputation analysis of tweets In any moment some events influencing reputation can occur => absence in training data Test collections. December 2013- February 2014. Ukraine events did not influence target entities Train collections in both domains. July-August 2014 after Ukraine events 2013-2014 Sanctions against banks. Problems with communication in Crimea
  16. 16. Analyzing difficult tweets 71 tweets in the banking domain wrongly classified by all participants 85 tweets in the telecom domain difficult for almost all participants (maximum 2 systems were correct)
  17. 17. First group. 1.1 Contains evident sentiment words (such as понравиться – to like) that were absent in the training set General vocabulary of Russian sentiment words could help
  18. 18. First group. 1.2 Contains words expressing well-known positive or negative situations such as theft or murder but absent in the training collection General vocabulary of connotative words would be useful
  19. 19. First group. 1.3 Tweets contains words and phrases describing current events, concerning the current news flow Parallel analysis of the current news, revealing correlations between tweet words and general sentiment and connotation vocabularies in news texts
  20. 20. Second group Misclassified tweets includes tweets that are really complicated Mention more than one entity with different attitudes Several sentiment words with different polarity orientation Contain irony
  21. 21. vocabularies M-L framework 30% Tweet in Bank collection 15% Tweet in Telecom collection
  22. 22. Were systems entity-oriented? Test tweets mentioning two or more entities • 58 tweets in the banking domain (15 tweets with different polarity labels), • 232 tweets in the telecom domain (71 tweets with different polarity labels) 3 of 9 participants considered the task as entity-oriented one • Other participants always assigned the same polarity class to all entities mentioned in a tweet Performance • Worse than for all tweets on average • Entity-oriented approaches did not achieve better results
  23. 23. Conclusion We described the tasks, approaches and results in SentiRuEval testing – High dependence from train collections – High impact from current dramatic events – Capability to do entity-oriented analysis is quite restricted – large impact for improving results can be based on integration of a general sentiment vocabulary and a general vocabulary of connotative words – The most participants solved the general task of tweet classification; – Entity-oriented approaches did not achieve better results. All prepared materials are accessible for research purposes
  24. 24. Thank you! You can help us to assess tweets for SentiRuEval-2016 Yuliya Rubtsova