Sentiment Analysis

1,582 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,582
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
80
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Sentiment Analysis

  1. 1. Thumbs up? Sentiment Classification using Machine Learning Techniques - Bo Pang and Lillian Lee - Shivakumar Vaithyanathan
  2. 2. What is it?? • Input – raw text over some topic • Output – opinion ( +ve, -ve or neutral ) • Its is hard – why??? - determines the opinion on overall text rather than just subject of the topic -- lets understand the problem
  3. 3. We know … • Web – enormous amount of data • Topical categorization – active research
  4. 4. Rise of blogs, forums … • Web 2.0 is commonly associated with web applications that facilitate interactive information sharing, interoperability, user-centered design, and collaboration on the World Wide Web – (source : Wikipedia)
  5. 5. Why is it interesting? • Represents the voice about particular topic from broader audience • Example : product reviews, movie reviews, book reviews • Important to business intelligence applications - What do people (dis)like in Nikon D40
  6. 6. What this paper does • Examines the effectiveness of applying machine learning techniques to sentiment classification problem • Challenging – while topic are identifiable by keywords alone, sentiment can be expressed in a more subtle manner.
  7. 7. Dataset : Movie-Review Domain Reason : – Large online collection for reviews – Easy to summarize with machine-extractable rating indicator than to handle data for supervised learning Corpus of 752 –ve, 1301 +ve, with total 144 reviewers represented
  8. 8. Naïve approach • Idea: people tend to use certain words to express strong sentiments, produce such list and rely to classify text
  9. 9. Machine Learning methods • Let {f1, f2, …, fm} be predefined m features that can appear in document.Example : “still” or bigram “really stinks” • ni(d) – number of times fi occurs in document d • Document vector(d) = (n1(d), n2(d), …, nm(d))
  10. 10. Naïve Bayes Assign to a given document d the class Naïve Bayes rule :
  11. 11. Maximum Entropy • Idea is to make fewest assumptions about the data while still being consistent with it
  12. 12. Support Vector Machines(SVM) • Are large-margin, non-probabilistic classifiers in contrast to Naïve Bayes and Maximum Entropy • Letting (corresponding to +ve,- ve), be the correct class of document dj,
  13. 13. Evaluations • Randomly selected 700 positive, 700 negative sentiment documents • Automatically removed rating indicators, extracted textual information from original HTML • Added NOT_ to every word between a negation word(“not”, “isn’t”) and first punctuation.
  14. 14. Results
  15. 15. Conclusion • Unigram presence information turned out to be most effective • The superiority of presence information in comparison to feature frequency indicates a difference between sentiment and topic categorization.

×