Opinion mining
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


Opinion mining



document sentiment classification,

document sentiment classification,
sentence subjectivity and sentiment classification



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds


Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Opinion mining Presentation Transcript

  • 1. SUBMITTED BY: Heena Gupta (2013EMS02)
  • 2. DEFINITION: • to classify an opinion document as expressing a positive or negative opinion or sentiment. • it considers the whole document as a basic information unit.
  • 3. PROBLEM DEFINITION Given an opinion document d evaluating an entity,determine the overall sentiment s of the opinion holder about the entity,i.e., determine s expressed on aspect GENERAL in the quintuple (_, GENERAL, s, _, _), where the entity e, opinion holder h, and time of opinion t are assumed known or irrelevant (do not care). • If s takes categorical values, e.g., positive and negative, then it is a classification problem. • If s takes numeric values or ordinal scores within a given range,e.g., 1 to 5, the problem becomes regression. ASSUMPTION “The opinion document d expresses opinions on a single entity e and contains opinions from a single opinion holder h.”
  • 4. Sentiment Classification Using Supervised Learning • Usually 2 class classification problem Positive Negative • If rating is used (1-5 stars) 1-2(negative) ,4-5(positive),3(neutral) • Essentially a text classification problem • Many supervised learning techniques(naïve Bayes classification, and support vector machines (SVM)) Key features used in sentiment classification • Terms and their frequency • Part of speech(POS) • Sentiment words and phrases • Rule of opinion • Sentiment shifter • Syntactic dependency
  • 5. Algorithm • Two consecutive words are extracted if their POS tag conform to any of the pattern Example: This piano produces beautiful sounds WP NN VB JJ NN Sentiment Classification Using Unsupervised Learning
  • 6. • Estimates the sentiment orientation (SO) of the extracted phrases using the pointwise mutual information (PMI) measure: PMI(term1,term2) = log 2(Pr(term1 ˄ term2 )/(Pr(term1)Pr(term2 ))) PMI measures the degree of statistical dependence between two terms Pr(term1 ˄ term2 ) is the actual co-occurrence probability of term1 and term2 Pr(term1)Pr(term2) is the co-occurrence probability of the two terms if they are statistically independent. SO = PMI (phrase ,”excellent”) – PMI(phrase ,”poor”) SO(phrase) = log2 hits(phrase near “excellent”) hits(“poor”) hits (phrase near “poor”)hits(“excellent”)
  • 7. • Given a review, the algorithm computes the average SO of all phrases in the review and classifies the review as positive if the average SO is positive and negative otherwise.
  • 8. We modeled rating prediction as a graph-based semi-supervised learning problem, which used • labeled (with ratings) reviews • unlabeled (without ratings) reviews. The unlabeled reviews were also the test reviews whose ratings need to be predicted. In the graph, • each node is a document (review) and • the link between two nodes is the similarity value between the two documents. The algorithm used assumed that initially a separate learner has already predicted the numerical ratings of the unlabeled documents. The graph based method only improves them by revising the ratings through solving an optimization problem to force ratings to be smooth throughout the graph with regard to both the ratings and the link weights. Sentiment Rating Prediction (Regression Problem)
  • 9. Sentiment classification is highly sensitive to the domain from which the training data is extracted. Two types of domains Source domain : original domain with labeled trained data Target domain : new domain which is used for testing Four Strategies 1. Training on a mixture of labeled reviews from other domains where such data are available and testing on the target domain 2. Training a classifier as above, but limiting the set of features to those only observed in the target domain 3. Using ensembles of classifiers from domains with available labeled data and testing on the target domain 4. Combining small amounts of labeled data with large amounts of unlabeled data in the target. Cross Domain Sentiment Classification
  • 10. Cross-language sentiment classification means to perform sentiment classification of opinion documents in multiple languages Example: If we use Sentiment resources in English to perform classification of Chinese reviews the following algorithm is used : • Translates each Chinese review into English using multiple translators, which produce different English versions. • It then uses a lexicon-based approach to classify each translated English version. The lexicon consists of a set of positive terms, a set of negative terms, a set of negation terms, and a set of intensifiers. • The algorithm then sums up the sentiment scores of the terms in the review considering negations and intensifiers. • If the final score is less than 0, the review is negative, otherwise positive. • For the final classification of each review, it combines the scores of different translated versions using various ensemble methods, e.g., average, max, weighted average, voting Cross Language Sentiment Classification
  • 11. SUBMITTED BY: Heena Gupta (2013EMS02)
  • 12. INTRODUCTION Sentences are short documents .Sentence level analysis is to classify sentiment expressed in each sentence ASSUMPTION One assumption that researchers often make is that sentence usually contain single opinion PROBLEM DEFINITION Given a sentence x, determine whether x expresses a positive, negative, or neutral (or no) opinion. SENTENCE SENTIMENT CLASSIFICATION CAN BE SOLVED AS • Two separate classification Problem 1. Classify whether sentence expresses opinion or not( Subjective classification) 2. Classify those opinion sentences into positive and negative classes
  • 13. Sentences are classified into two types • Subjective (give personal views and opinion) • Objective (some factual information) • Subjective classification is based on supervised learning • Gradability is a semantic property that enables a word to appear in a comparative construct and to accept modifying expressions that act as intensifiers or diminishers. Example: a small planet is usually much larger than a large house • sentence similarity was measured based on shared words, phrases SUBJECTIVITY CLASSIFICATION
  • 14. One of the bottlenecks in applying supervised learning is the manual effort involved in annotating a large number of training examples. Solution : a bootstrapping approach to label training data automatically was proposed • The algorithm works by first using two high precision classifiers to automatically identify some subjective and objective sentences. • The highprecision classifiers use lists of lexical items (single words or n-grams) that are good subjectivity clues. • HP-Subj classifies a sentence as subjective if it contains two or more strong subjective clues. • HP-Obj classifies a sentence as objective if there are no strong subjective clues.. • The extracted sentences are then added to Sentiment Analysis and Opinion Mining the training data to learn patterns
  • 15. ASSUMPTION A sentence expresses a single sentiment from a single opinion holder. METHOD • For sentiment classification of subjective sentences, we use a large set of seed adjectives. • modified log-likelihood ratio to determine the positive or negative orientation for each adjective, adverb, noun and verb. • An orientation to each sentence is assigned by the average log- likelihood scores of its words. • Two thresholds are chosen using the training data and applied to determine whether the sentence has a positive, negative, or neutral orientation. SENTENCE SENTIMENT CLASSIFICATION
  • 16. DEALING WITH CONDITIONAL SENTENCES • Conditional sentences are sentences that describe implications or hypothetical situations and their consequences. Such a sentence typically contains two clauses: • the condition clause • the consequent clause, • that are dependent on each other. Their relationship has significant impact on whether the sentence expresses a positive or negative sentiment. • EXAMPLE: “If someone makes a reliable car, I will buy it”
  • 17. • Translate test sentences in the target language into the source language and classify them using a source language classifier. • Translate a source language training corpus into the target language and build a corpus-based classifier in the target language. • Translate a sentiment or subjectivity lexicon in the source language to the target language and build a lexicon-based classifier in the target language. CROSS LANGUAGE SUBJECTIVITY CLASSIFICATION