Rule based approach to sentiment analysis at romip’11 slides


Published on

Slides for presentation at Dialogue'12

API (free access for devs available):

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Rule based approach to sentiment analysis at romip’11 slides

  1. 1. Rule-based approach tosentiment analysis at ROMIP’11 Dmitry Kan Twitter: @DmitryKan AlphaSense Inc Dialogue, 2012
  2. 2. Outline• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems
  3. 3. Problem definition• What is sentiment for people: – Mood of the author? Mood of the reader? Personal attitude? – Opinion about the target object (product etc)? – Something else, defined by an annotator’s boss?• What is sentiment for a computer: – General polarity background – General opinion mining – Object (product) oriented opinion mining – Polarity strength detection
  4. 4. Base level for accuracy• cross-annotator agreement gives 80% [1]• Real performance of the system is the one it shows when used on un-annotated data• Real example: ”CEO of the company turned 50” (was marked as positive -> why?)• Some machine learning (ML) methods can give 90% and more on test data• Hard (unless impossible) to do object oriented sentiment detection with ML
  5. 5. Towards shallow parsing of input text Opposite conjunction negation totalSentimentScore = Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found 0, if no opp conj found Majority likes this, but I do not like this NOT(polarity) = opposite_polarity Opposite conjunction Object: iPhone Sentiment: positive negation Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative Object: - Sentiment: neutral (mixed)I liked new iPhone, but GalaxyS is not easy to use iPhone GalaxyS
  6. 6. Rule based algorithm flow on example sentence Majority likes this, but I do not like this. Phase1 (negations): posScore = 0 – negation weight = -2 Phase2 (individual words): Word ”likes”: posScore = -2 + 1 = -1 Word ”not”: negScore = 0 + 1 = 1 Word ”like”: posScore = -1 + 1 = 0 Phase3 (oppositeConjuctions): sentimentCount = 3 totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2 Sentiment: Negative
  7. 7. Rule-based algorithm #1/3• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive and 2338 negative words)• All words are lemmatized (A. Zaliznyak [2])• Set of negations of Russian, that tend to noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)
  8. 8. Rule-based algorithm #2/3• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, where oppositeConjuctionSentimentScore removes the polarity mass from the sentence with a conjunction and is: sentimentWordCount / 2
  9. 9. Rule-based algorithm #3/3• Object oriented sentiment detection• First each sentence of the input text is examined for the presense of the keywords of the object• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above
  10. 10. Performance• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class when two annotators have agreed• Much lower precision and recall for negative class (not enough dictionary entries, sentiment for text level to be defined)• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]
  11. 11. Open problems• Multi-sentence sentiment detection• Domain adaptation: mining polarity words [4]• Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names (Named Entity Recognition)
  12. 12. Questions?Thank you!
  13. 13. Bibliography• [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.• [2] Andrey Zaliznyak. Grammaticheskij slovar russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).• [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.
  14. 14. Bibliography• [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.• [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.