H.A.T. Kumara – 2011CS006
Supervisor
Mr. Viraj Welgama
Co-Supervisor
Dr. A. R. Weerasinghe
Supervised Learning Based Approach To
Aspect Based Sentiment Analysis
• Proposal Wrap-up
• Background
• Existing Approaches
• Research Aims
• Scope & Limitations
• Design & Methodology
• Current Progress
• Evaluation
Outline
PROPOSAL WRAP-UP
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Introduction “What people think?”
• “Which Laptop should I buy?”
• “Which Restaurant should I go to?”
• “Which Food do I need to order?”
• “Which Service do I need to use?”
Introduction
Opinion Mining
Everyday a large number of opinion
related documents are put on the
Internet.
People Post
• Product Reviews
• Political Views
• Feelings
Introduction
Opinion Mining
Opinion Mining or sentiment analysis aims to
determine the attitude of a speaker with respect
to some topic or the overall contextual polarity
of a document
? Sentiment
Analysis
attitude of speaker
Introduction
Aspect Based Sentiment Analysis
In aspect-based sentiment analysis (ABSA) the
aim is to identify the aspects of entities and the
sentiment expressed for each aspect.
Aspect Based Sentiment Analysis
• Aspect Category Extraction
The Shrimp was awesome, but over-priced.
{Entity#Attribute} –> { Food#Quality, Food#Prices }
• Sentiment Polarity
The Shrimp was awesome, but over-priced.
{Entity#Attribute, Polarity} –> {Food#Quality, Positive}
{Food#Prices, Negative}
EXISTING APPROACHES
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
ExistingApproaches Existing Approaches
Aspect Based
Sentiment Analysis
Sentiment
Classification
Aspect Extraction
ExistingApproaches Sentiment Classification
ExistingApproaches Aspect Extraction
Aspect Extraction
Topic Model Based
Approaches
Frequency Based
Approaches
Supervised Learning
Based Approaches
ExistingApproaches Aspect Extraction
Aspect Extraction
Topic Model Based
Approaches
Frequency Based
Approaches
Supervised Learning
Based Approaches
ExistingApproaches Aspect Extraction
Aspect Extraction
Topic Model Based
Approaches
Frequency Based
Approaches
Supervised Learning
Based Approaches
Sentiment Classification
• .System Technique Model Features
Wagner J. et al. Supervised SVM • SentiWordNet, General Inquirer,
Bing Liu (2004).
• Normalized the lexicon scores
Sentinue Supervised MaxEnt • Lexical features
• Lexicon features
• Domain specific featues
B. Pang Study Supervised SVM, Naïve
Bayes,
MaxEnt
• Unigrams, Bigrams, Adjectives,
Poistion of words
Harb et al. Stuy Unsupervised Association
Rule
• Adjectives and Adverbs
Aspect Extraction
• . System Technique Model Features
NRC Canada Supervised SVM MPQA, General Inquirer, Bing Liu
NRC Hashtag lexicon.
NLANGP Supervised SVM Word Clusters, Pos tags, Head words
Sentinue Supervised MaxEnt Text words and lemmas
Hu and Liu Unsupervised - Noun Frequency
Association Rule Mining
RESEARCH AIMS
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Research Objectives
• Discover a novel approach to conduct Aspect Based
Sentiment Analysis for reviews.
• Apply supervised learning based approach to extract
aspect categories and to determine sentiment polarity
• Following objectives are devised, to achieve main targets of
the project;
– An approach to extract aspect category towards which an opinion
is expressed in the given text or review.
– An approach to estimate the sentiment and the average sentiment
of the texts per aspect.
ASSUMPTIONS
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
DesignAssumptions
Design Assumptions
Input sentences are assumed to be grammatically
correct and in English
Subjectivity detection is not addressed hence assumed
all the sentences are opinionated either positive or
negative
Input sentences are assumed to belong to only one of
the pre identified set of domains
DesignAssumptions
Design Assumptions Cont.
Author and reader standing point is not addressed so it
is assumed that all the input sentences are of
independent observations
Sarcasm is not addressed hence assumed that dataset
does not contain sarcastic sentences.
DESIGN AND
METHODOLOGY
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
Design Design Overview
Polarity
Input
Preprocessing Aspect Category
Extraction
Sentiment Analyzer
Positive Negative{Entity#Attrubute}
Aspect Category
Design Preprocessing Module
Polarity
Input
Preprocessing Aspect Category
Extraction
Sentiment Analyzer
Positive Negative{Entity#Attrubute}
Aspect Category
Design Preprocessing Module
The staff is unbelievably friendly, and I dream
about their fajitas...so good.
(Great for a romantic evening, but over-priced.
The backlit keys are wonderful :-)
The atmosphere isn't the greatest, I won’t so
to this place again for sure.
Yes, Great display "Mac .
white space and punctuations
unexpected symbols/tokens
emoticons
not formal, playful words
Design Aspect Category Extractor
Polarity
Input
Preprocessing Aspect Category
Extraction
Sentiment Analyzer
Positive Negative{Entity#Attrubute}
Aspect Category
Design Aspect Category Extractor
{Entity#Attrubute}
Sentiment Lexicon
Aspect Category
Lexical FeaturesIn Domain Sentiment
Lexicon
Classifier
Design Lexicon Generation
Unlabeled Copora In Domain Sentiment
Lexicon
A sentiment score for each term w in the corpus:
PMI stands for pointwise mutual information:
Design
Aspect Category Extractor
• Class labels are already know and limited
• Supervised Learning
• One classifier for each aspect category.
• One-vs-all binary classifier
• Classification Models available
• SVM, Maximum Entropy( According to Literature )
Design Sentiment Analyzer
Polarity
Input
Preprocessing Aspect Category
Extraction
Sentiment Analyzer
Positive Negative{Entity#Attrubute}
Aspect Category
Design Sentiment Analyzer
This is a binary classification problem
Classification Models available
-SVM, MaxEnt, Naïve Bayesian ( According to Literature )
Classification features
• Domain Specific Features
• Features from In domain sentiment lexicon.
• Part of Speech Features
• Number of adjectives, adverbs, and nouns in the sentence
• Negation Features
• Single binary feature determined by whether there was
any negation in the sentence
CURRENT PROGRESS
Supervised Learning Based Approach to Aspect Based Sentiment Analysis
CurrentProgress
Datasets
Laptop Reviews Dataset
From Amazon.com
Restaurants Dataset
From Ganu et al. study
Annotation Process
3 Annotators involved
DataUnderstanding Initial Data Analysis
Restaurants Data Set (Train) – Rapid Minor
DataUnderstanding Initial Data Analysis
Restaurants Data Set (Train) – Rapid Minor
Initial Data Analysis
Aspect Category Frequency Distribution – Restaurants
Domain
DataUnderstanding Initial Data Analysis
Laptop Data Set (Train) – Rapid Minor
DataUnderstanding Initial Data Analysis
Laptop Data Set (Train) – Rapid Minor
Initial Data Analysis
Aspect Category Frequency Distribution – Laptops
Domain
CurrentProgress
Evaluation
• Aspect Category Extraction
• Precision and Recall
• F-Score
• Sentiment Polarity
• Cross Validation (k-fold validation)
• Precision and Recall (Compare with two
algorithms)
• F-Score
Progress Progress Overview
Completed
• Literature survey
• Design
• Dataset Understanding
• Existing System
• Preprocessing Module
To-do
• Implementation of modules
• Test and Evaluation
• Completing the Thesis
Questions?THANK YOU

Supervised Learning Based Approach to Aspect Based Sentiment Analysis

  • 1.
    H.A.T. Kumara –2011CS006 Supervisor Mr. Viraj Welgama Co-Supervisor Dr. A. R. Weerasinghe Supervised Learning Based Approach To Aspect Based Sentiment Analysis
  • 2.
    • Proposal Wrap-up •Background • Existing Approaches • Research Aims • Scope & Limitations • Design & Methodology • Current Progress • Evaluation Outline
  • 3.
    PROPOSAL WRAP-UP Supervised LearningBased Approach to Aspect Based Sentiment Analysis
  • 4.
    Introduction “What peoplethink?” • “Which Laptop should I buy?” • “Which Restaurant should I go to?” • “Which Food do I need to order?” • “Which Service do I need to use?”
  • 5.
    Introduction Opinion Mining Everyday alarge number of opinion related documents are put on the Internet. People Post • Product Reviews • Political Views • Feelings
  • 6.
    Introduction Opinion Mining Opinion Miningor sentiment analysis aims to determine the attitude of a speaker with respect to some topic or the overall contextual polarity of a document ? Sentiment Analysis attitude of speaker
  • 7.
    Introduction Aspect Based SentimentAnalysis In aspect-based sentiment analysis (ABSA) the aim is to identify the aspects of entities and the sentiment expressed for each aspect.
  • 8.
    Aspect Based SentimentAnalysis • Aspect Category Extraction The Shrimp was awesome, but over-priced. {Entity#Attribute} –> { Food#Quality, Food#Prices } • Sentiment Polarity The Shrimp was awesome, but over-priced. {Entity#Attribute, Polarity} –> {Food#Quality, Positive} {Food#Prices, Negative}
  • 9.
    EXISTING APPROACHES Supervised LearningBased Approach to Aspect Based Sentiment Analysis
  • 10.
    ExistingApproaches Existing Approaches AspectBased Sentiment Analysis Sentiment Classification Aspect Extraction
  • 11.
  • 12.
    ExistingApproaches Aspect Extraction AspectExtraction Topic Model Based Approaches Frequency Based Approaches Supervised Learning Based Approaches
  • 13.
    ExistingApproaches Aspect Extraction AspectExtraction Topic Model Based Approaches Frequency Based Approaches Supervised Learning Based Approaches
  • 14.
    ExistingApproaches Aspect Extraction AspectExtraction Topic Model Based Approaches Frequency Based Approaches Supervised Learning Based Approaches
  • 15.
    Sentiment Classification • .SystemTechnique Model Features Wagner J. et al. Supervised SVM • SentiWordNet, General Inquirer, Bing Liu (2004). • Normalized the lexicon scores Sentinue Supervised MaxEnt • Lexical features • Lexicon features • Domain specific featues B. Pang Study Supervised SVM, Naïve Bayes, MaxEnt • Unigrams, Bigrams, Adjectives, Poistion of words Harb et al. Stuy Unsupervised Association Rule • Adjectives and Adverbs
  • 16.
    Aspect Extraction • .System Technique Model Features NRC Canada Supervised SVM MPQA, General Inquirer, Bing Liu NRC Hashtag lexicon. NLANGP Supervised SVM Word Clusters, Pos tags, Head words Sentinue Supervised MaxEnt Text words and lemmas Hu and Liu Unsupervised - Noun Frequency Association Rule Mining
  • 17.
    RESEARCH AIMS Supervised LearningBased Approach to Aspect Based Sentiment Analysis
  • 18.
    Research Objectives • Discovera novel approach to conduct Aspect Based Sentiment Analysis for reviews. • Apply supervised learning based approach to extract aspect categories and to determine sentiment polarity • Following objectives are devised, to achieve main targets of the project; – An approach to extract aspect category towards which an opinion is expressed in the given text or review. – An approach to estimate the sentiment and the average sentiment of the texts per aspect.
  • 19.
    ASSUMPTIONS Supervised Learning BasedApproach to Aspect Based Sentiment Analysis
  • 20.
    DesignAssumptions Design Assumptions Input sentencesare assumed to be grammatically correct and in English Subjectivity detection is not addressed hence assumed all the sentences are opinionated either positive or negative Input sentences are assumed to belong to only one of the pre identified set of domains
  • 21.
    DesignAssumptions Design Assumptions Cont. Authorand reader standing point is not addressed so it is assumed that all the input sentences are of independent observations Sarcasm is not addressed hence assumed that dataset does not contain sarcastic sentences.
  • 22.
    DESIGN AND METHODOLOGY Supervised LearningBased Approach to Aspect Based Sentiment Analysis
  • 23.
    Design Design Overview Polarity Input PreprocessingAspect Category Extraction Sentiment Analyzer Positive Negative{Entity#Attrubute} Aspect Category
  • 24.
    Design Preprocessing Module Polarity Input PreprocessingAspect Category Extraction Sentiment Analyzer Positive Negative{Entity#Attrubute} Aspect Category
  • 25.
    Design Preprocessing Module Thestaff is unbelievably friendly, and I dream about their fajitas...so good. (Great for a romantic evening, but over-priced. The backlit keys are wonderful :-) The atmosphere isn't the greatest, I won’t so to this place again for sure. Yes, Great display "Mac . white space and punctuations unexpected symbols/tokens emoticons not formal, playful words
  • 26.
    Design Aspect CategoryExtractor Polarity Input Preprocessing Aspect Category Extraction Sentiment Analyzer Positive Negative{Entity#Attrubute} Aspect Category
  • 27.
    Design Aspect CategoryExtractor {Entity#Attrubute} Sentiment Lexicon Aspect Category Lexical FeaturesIn Domain Sentiment Lexicon Classifier
  • 28.
    Design Lexicon Generation UnlabeledCopora In Domain Sentiment Lexicon A sentiment score for each term w in the corpus: PMI stands for pointwise mutual information:
  • 29.
    Design Aspect Category Extractor •Class labels are already know and limited • Supervised Learning • One classifier for each aspect category. • One-vs-all binary classifier • Classification Models available • SVM, Maximum Entropy( According to Literature )
  • 30.
    Design Sentiment Analyzer Polarity Input PreprocessingAspect Category Extraction Sentiment Analyzer Positive Negative{Entity#Attrubute} Aspect Category
  • 31.
    Design Sentiment Analyzer Thisis a binary classification problem Classification Models available -SVM, MaxEnt, Naïve Bayesian ( According to Literature ) Classification features • Domain Specific Features • Features from In domain sentiment lexicon. • Part of Speech Features • Number of adjectives, adverbs, and nouns in the sentence • Negation Features • Single binary feature determined by whether there was any negation in the sentence
  • 32.
    CURRENT PROGRESS Supervised LearningBased Approach to Aspect Based Sentiment Analysis
  • 33.
    CurrentProgress Datasets Laptop Reviews Dataset FromAmazon.com Restaurants Dataset From Ganu et al. study Annotation Process 3 Annotators involved
  • 34.
    DataUnderstanding Initial DataAnalysis Restaurants Data Set (Train) – Rapid Minor
  • 35.
    DataUnderstanding Initial DataAnalysis Restaurants Data Set (Train) – Rapid Minor
  • 36.
    Initial Data Analysis AspectCategory Frequency Distribution – Restaurants Domain
  • 37.
    DataUnderstanding Initial DataAnalysis Laptop Data Set (Train) – Rapid Minor
  • 38.
    DataUnderstanding Initial DataAnalysis Laptop Data Set (Train) – Rapid Minor
  • 39.
    Initial Data Analysis AspectCategory Frequency Distribution – Laptops Domain
  • 40.
    CurrentProgress Evaluation • Aspect CategoryExtraction • Precision and Recall • F-Score • Sentiment Polarity • Cross Validation (k-fold validation) • Precision and Recall (Compare with two algorithms) • F-Score
  • 41.
    Progress Progress Overview Completed •Literature survey • Design • Dataset Understanding • Existing System • Preprocessing Module To-do • Implementation of modules • Test and Evaluation • Completing the Thesis
  • 42.

Editor's Notes

  • #2 This presentation demonstrates the new capabilities of PowerPoint and it is best viewed in Slide Show. These slides are designed to give you great ideas for the presentations you’ll create in PowerPoint 2010! For more sample templates, click the File tab, and then on the New tab, click Sample Templates.
  • #5 What other people think or What other peoples opinion has always been an important piece of information for most of us whenever we have to make a decision.
  • #6 With the proliferation of user generated content in the internet, interest in the opinion mining or sentiment analysis has grown rapidly, both in academia and business. The ability to extract sentiments from such sources can provide invaluable information about people’s views on various topics
  • #8 The majority of current approaches, however, attempt to detect the overall polarity of a sentence, paragraph, or text span, irrespective of the entities mentioned (e.g., laptops, battery, screen) and their attributes (e.g. price, design, quality). The ultimate goal is to be able to generate summaries listing all the aspects and their overall polarity such as the example shown in Fig. 1.
  • #9 It specifies the category of the domain to which the review refers. Aspect Category contains the Entity#Attribute pair of the review. Aspect Category (Entity and Attribute). Identify every entity E and attribute A pair E#A towards which an opinion is expressed in the given text. Entity is the aspect of the domain for which an opinion is expressed in the given review. Attribute is the quality or feature the review refers to and this is a dependent on the Entity. Every Entity#Attribute pair obtained from sentence should be assigned a polarity of either positive, negative, or neutral depending on the sentiment expressed by the user.
  • #13 Topic modeling methods have been attempted as an unsupervised and knowledge- lean approach. They exploit word occurrence information to capture latent topics in corpora.
  • #14 Topic modeling methods have been attempted as an unsupervised and knowledge- lean approach. They exploit word occurrence information to capture latent topics in corpora.
  • #15 Topic modeling methods have been attempted as an unsupervised and knowledge- lean approach. They exploit word occurrence information to capture latent topics in corpora.
  • #16 1 Employed four lexicons :-MPQA (Wilson 2005), SentiWordNet, General Inquirer, Bing Liu’s Lexicon. Normalized all the scores in range [-1, 1] For a word, these four scores are summed to arrive at a score in range [-4, 4] Domain specific words were manually added. E.g. mouthwatering, watery, better-configured. One of the earliest works which used supervised method to solve sentiment classification problem is B. Pang. In this paper, authors used three machine learning techniques to classify sentiment of movie review documents. To implement these machine learning techniques on movie review documents, they used the standard bag of features frame work. Harb et al. [8] performed blog classification by starting with the 2 sets of seed words with positive and negative semantic orienta- tions respectively/
  • #17 1 Employed four lexicons :-MPQA (Wilson 2005), SentiWordNet, General Inquirer, Bing Liu’s Lexicon. Normalized all the scores in range [-1, 1] For a word, these four scores are summed to arrive at a score in range [-4, 4] Domain specific words were manually added. E.g. mouthwatering, watery, better-configured.
  • #28 This category is an entity and attribute pair, each chosen from an inventory with possible values, in each domain, for entity types and attributes.
  • #29 Apart from the training data provided, we compiled large corpora of reviews for restaurants and laptops that were not labeled for aspect terms, aspect categories, or sentiment. We generated lexicons from these corpora and used them as a source of additional features in our machine learning systems. we calculated a sentiment score for each term w in the corpus, using (1) where freq (w, pos) is the number of times a term w occurs in positive reviews, freq (w) is the total frequency of term w in the corpus, freq (pos) is the total number of tokens in positive reviews, and N is the total number of tokens in the corpus.
  • #30 This category is an entity and attribute pair, each chosen from an inventory with possible values, in each domain, for entity types and attributes.
  • #32 Every Entity#Attribute pair obtained from sentence should be assigned a polarity of either positive, negative, or neutral depending on the sentiment expressed by the user. Sentiment analyze module nds the overall polarity (Positive or Negative) of an input review. Here we deploy series of machine learning classication algorithms such as Nave Bayes, Maximum Entropy and SVM to ascertain the suitability of applying them on sentiment classication, where parameters of these algorithms will be tune-tuned to suit our training models.
  • #34 Each dataset was annotated by a linguist (annotator A) using BRAT), a web- based annotation tool Then, one of the organizers (annotator B) validated/inspected the resulting annotations. When B was not confident or disagreed with A, a decision was made collaboratively between them and a third annotator.
  • #41 Randomly partition the data into k mutually exclusive subsets, each approximately equal size (k-fold)