Rule-based approach tosentiment analysis at ROMIP’11 Dmitry Kan email@example.com Twitter: @DmitryKan AlphaSense Inc Dialogue, 2012
Outline• Problem definition• Base level for accuracy• Towards shallow parsing of input text• Rule-based algorithm• Object-oriented sentiment detection• Performance• Open problems
Problem definition• What is sentiment for people: – Mood of the author? Mood of the reader? Personal attitude? – Opinion about the target object (product etc)? – Something else, defined by an annotator’s boss?• What is sentiment for a computer: – General polarity background – General opinion mining – Object (product) oriented opinion mining – Polarity strength detection
Base level for accuracy• cross-annotator agreement gives 80% • Real performance of the system is the one it shows when used on un-annotated data• Real example: ”CEO of the company turned 50” (was marked as positive -> why?)• Some machine learning (ML) methods can give 90% and more on test data• Hard (unless impossible) to do object oriented sentiment detection with ML
Towards shallow parsing of input text Opposite conjunction negation totalSentimentScore = Subclause 1 Subclause 2 totalPositiveScore – totalNegativeScore - ½ * sentimentCount, if opp. conj found 0, if no opp conj found Majority likes this, but I do not like this NOT(polarity) = opposite_polarity Opposite conjunction Object: iPhone Sentiment: positive negation Subclause 1 Subclause 2 Object: GalaxyS Sentiment: negative Object: - Sentiment: neutral (mixed)I liked new iPhone, but GalaxyS is not easy to use iPhone GalaxyS
Rule based algorithm flow on example sentence Majority likes this, but I do not like this. Phase1 (negations): posScore = 0 – negation weight = -2 Phase2 (individual words): Word ”likes”: posScore = -2 + 1 = -1 Word ”not”: negScore = 0 + 1 = 1 Word ”like”: posScore = -1 + 1 = 0 Phase3 (oppositeConjuctions): sentimentCount = 3 totalScore = posScore – negScore – ½ * sentimentCount = 0 – 1 – 3/2 = -5/2 Sentiment: Negative
Rule-based algorithm #1/3• Suits micro-posts (twitter) or individual sentences• Polarity dictionaries for Russian (1739 positive and 2338 negative words)• All words are lemmatized (A. Zaliznyak )• Set of negations of Russian, that tend to noticeably affect on polarity of connected word(s): не плохо (not bad); also gap between words are processed correctly, for example: Я не сильно люблю это (I do not strongly like this)
Rule-based algorithm #2/3• Set of opposite conjunctions of Russian, which affect on polarity of sentence’s subclauses in relation to each other: Большинству это всё нравится, а мне нет (Majority likes this, but I do not)• totalScore = positiveScore – negativeScore - oppositeConjuctionSentimentScore, where oppositeConjuctionSentimentScore removes the polarity mass from the sentence with a conjunction and is: sentimentWordCount / 2
Rule-based algorithm #3/3• Object oriented sentiment detection• First each sentence of the input text is examined for the presense of the keywords of the object• If the sentence was found, it is checked for the presence of conjuctions or other boundaries of subclauses (like punctuation)• If there is no boundary found, the sentiment of the entire found sentence is detected according to the algorithm described above• If there is a boundary, the subclause containing the keywords is identified and sentiment of the subclause is detected according to the algorithm described above
Performance• Test data: text reviews (many sentences)• Accuracy of 64%• 92% precision and 69% recall for positive class when two annotators have agreed• Much lower precision and recall for negative class (not enough dictionary entries, sentiment for text level to be defined)• Worked slightly better for 2-way classifier ensemble with Multinomial Naive Bayes 
Open problems• Multi-sentence sentiment detection• Domain adaptation: mining polarity words • Adding more rules for shallow parsing• Trying out formal syntactic parsing• Automatic detection of product names (Named Entity Recognition)
Bibliography•  Bermingham, A. and Smeaton, A.F. (2009). A study of interannotator agreement for opinion retrieval. In SIGIR, 784-785.•  Andrey Zaliznyak. Grammaticheskij slovar russkogo jazyka. Moskva, 1977, (further editions are 1980, 1987, 2003).•  Poroshin V. (2012). Proof of concept statistical sentiment classification at ROMIP 2011. In Dialog.
Bibliography•  Chetverkin I., Loukachevitch N. (2010). Automatic Extraction of Domain-specific Opinion Words. Dialogue.•  Minqing Hu, Bing Liu. (2004). Mining and summarizing customer reviews. In Proc. of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.