Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Sentiment Analysis






Total Views
Views on SlideShare
Embed Views



2 Embeds 12

http://www.linkedin.com 10
https://www.linkedin.com 2



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Sentiment Analysis Sentiment Analysis Presentation Transcript

  • Thumbs up? Sentiment Classification using Machine Learning Techniques
    - Bo Pang and Lillian Lee
    - ShivakumarVaithyanathan
  • What is it??
    Input – raw text over some topic
    Output – opinion ( +ve, -ve or neutral )
    Its is hard – why???
    - determines the opinion on overall text rather than just subject of the topic
    -- lets understand the problem
  • We know …
    Web – enormous amount of data
    Topical categorization – active research
  • Rise of blogs, forums …
    Web 2.0 is commonly associated with web applications that facilitate interactive informationsharing, interoperability, user-centered design, and collaboration on the World Wide Web – (source : Wikipedia)
  • Why is it interesting?
    Represents the voice about particular topic from broader audience
    Example : product reviews, movie reviews, book reviews
    Important to business intelligence applications
    - What do people (dis)like in Nikon D40
  • What this paper does
    Examines the effectiveness of applying machine learning techniques to sentiment classification problem
    Challenging – while topic are identifiable by keywords alone, sentiment can be expressed in a more subtle manner.
  • Dataset : Movie-Review Domain
    Reason :
    Large online collection for reviews
    Easy to summarize with machine-extractable rating indicator than to handle data for supervised learning
    Corpus of 752 –ve, 1301 +ve, with total 144 reviewers represented
  • Naïve approach
    Idea: people tend to use certain words to express strong sentiments, produce such list and rely to classify text
  • Machine Learning methods
    Let {f1, f2, …, fm} be predefined m features that can appear in document.Example : “still” or bigram “really stinks”
    ni(d) – number of times fi occurs in document d
    Document vector(d) = (n1(d), n2(d), …, nm(d))
  • Naïve Bayes
    Assign to a given document d the class
    Naïve Bayes rule :
  • Maximum Entropy
    Idea is to make fewest assumptions about the data while still being consistent with it
  • Support Vector Machines(SVM)
    Are large-margin, non-probabilistic classifiers in contrast to Naïve Bayes and Maximum Entropy
    Letting (corresponding to +ve,-ve), be the correct class of document dj,
  • Evaluations
    Randomly selected 700 positive, 700 negative sentiment documents
    Automatically removed rating indicators, extracted textual information from original HTML
    Added NOT_ to every word between a negation word(“not”, “isn’t”) and first punctuation.
  • Results
  • Conclusion
    Unigram presence information turned out to be most effective
    The superiority of presence information in comparison to feature frequency indicates a difference between sentiment and topic categorization.