Your SlideShare is downloading. ×
Rule based approach to sentiment analysis at romip’11 slides
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Rule based approach to sentiment analysis at romip’11 slides

2,212
views

Published on

Slides for presentation at Dialogue'12 …

Slides for presentation at Dialogue'12

API (free access for devs available): https://mashape.com/dmitrykey/russiansentimentanalyzer

Published in: Technology, Education

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,212
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Rule-based approach tosentiment analysis at ROMIP’11 Dmitry Kan dmitry.kan@gmail.com Twitter: @DmitryKan AlphaSense Inc Dialogue, 2012
  • 2. Outline• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems
  • 3. Problem definition• What is sentiment for people: – Mood of the author? Mood of the reader? Personal attitude? – Opinion about the target object (product etc)? – Something else, defined by an annotator’s boss?• What is sentiment for a computer: – General polarity background – General opinion mining – Object (product) oriented opinion mining – Polarity strength detection
  • 4. Base level for accuracy• cross-annotator agreement gives 80% [1]• Real performance of the system is the one it shows when used on un-annotated data• Real example: ”CEO of the company turned 50” (was marked as positive -> why?)• Some machine learning (ML) methods can give 90% and more on test data• Hard (unless impossible) to do object oriented sentiment detection with ML
  • 5. Towards shallow parsing of input text Opposite conjunction negation totalSentimentScore = Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found 0, if no opp conj found Majority likes this, but I do not like this NOT(polarity) = opposite_polarity Opposite conjunction Object: iPhone Sentiment: positive negation Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative Object: - Sentiment: neutral (mixed)I liked new iPhone, but GalaxyS is not easy to use iPhone GalaxyS
  • 6. Rule based algorithm flow on example sentence Majority likes this, but I do not like this. Phase1 (negations): posScore = 0 – negation weight = -2 Phase2 (individual words): Word ”likes”: posScore = -2 + 1 = -1 Word ”not”: negScore = 0 + 1 = 1 Word ”like”: posScore = -1 + 1 = 0 Phase3 (oppositeConjuctions): sentimentCount = 3 totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2 Sentiment: Negative
  • 7. Rule-based algorithm #1/3• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive and 2338 negative words)• All words are lemmatized (A. Zaliznyak [2])• Set of negations of Russian, that tend to noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)
  • 8. Rule-based algorithm #2/3• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, where oppositeConjuctionSentimentScore removes the polarity mass from the sentence with a conjunction and is: sentimentWordCount / 2
  • 9. Rule-based algorithm #3/3• Object oriented sentiment detection• First each sentence of the input text is examined for the presense of the keywords of the object• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above
  • 10. Performance• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class when two annotators have agreed• Much lower precision and recall for negative class (not enough dictionary entries, sentiment for text level to be defined)• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes [3]
  • 11. Open problems• Multi-sentence sentiment detection• Domain adaptation: mining polarity words [4]• Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names (Named Entity Recognition)
  • 12. Questions?Thank you!
  • 13. Bibliography• [1] Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.• [2] Andrey Zaliznyak. Grammaticheskij slovar russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).• [3] Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.
  • 14. Bibliography• [4] Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.• [5] Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.