Hardik Dalal
Faculty of Computer Science,
Dalhousie University
Aspect-level sentiment
analysis of customer reviews
using Double Propagation
31st March, 2016
Contents
 Introduction to application scenario
 Problem statement
 Objective
 Approach
 Data Gathering and Preparation
 Dataset description
 Data cleaning
 Implementation
 Double propagation algorithm
 Filter frequent unnecessary words
 Classify sentiment words
 Evaluation
 Results
 Conclusion
Introduction to application scenario
 Web 2.0 Summit in 2004, Tim O’Reilly emphasized on user-generated
content and its usage
 Allows people to connect globally in the world of Web
 Explosion of digital content
 Blogs and Instant Messaging (IM) for starters
 Followed by MySpace, Facebook, Twitter, Wikipedia, YouTube and so
on
 Alongside the rise of customer-generated review websites; Yelp and
Epinions
 And of course the E-commerce giant; Amazon
Introduction to application scenario (cont.)
 Customer reviews is a rich source of information for other customers
and sellers
 “78% of Americans Read Online Reviews Before Making a Purchase
Decision” by YouGov
 “97% of consumers found online review accurate” by ComScore
 “92% users have more confidence in online reviews then sales clerk
and other sources” by WSJ
Problem statement
 Which information can help purchasing a product based on reviews
easier?
 Read all the reviews?
 A preferable choice is to evaluate products based on its features’
rating not just overall rating like traditional 5-star rating
Objective
 Extract all product feature and sentiment word pairs from reviews
 Summarize them
3
-1
1
2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
Cost Quality Speed Customer service
Approach
 Based on grammatical relations and Part-Of-Speech (POS) tagging
 Exploit relations between aspect terms and opinion words
 How?
 Aspect terms are mostly nouns (NN or NNS)
 Opinion words are mostly adjectives (JJ, JJR, JJS)
 Relations between them:
 Adjectival modifier (amod)
 Direct object (dobj)
 Nominal subject (nsubj)
 Conjunction (conj)
 Has_a
An example
“These/DT, speakers/NNS, are/VBP,
incredibly/RB, amazing/JJ”
nsubj(amazing-5,
speakers-2)
Aspect => speakers
Opinion => amazing
(positive)
Noun
Adjective
Data Gathering
 (Liu & Liu, 2015) freely available for opinion mining/sentiment
analysis and used in 24th International Joint Conference on Artificial
Intelligence (IJCAI) 2015
 Approximately 2000 reviews from 3 products fetched from Amazon
 XML format, manually annotated
Data Cleaning
 Removed URLs in some of the reviews
 Missing start or end tags
 Symbols such as '&' had to be replaced by & due to Java XML
parser restrictions
Double
Propagation
Algorithm
Extract opinions
using aspects
Extract aspects
using existing
aspects
Extract opinions
using existing
opinions
Extract aspects
using opinions
Aspect list
• …
• …
• …
Opinion list
• good
• Bad
• …
Filter frequent unnecessary words
 Some words appear too frequently and do not carry value in further
stages of mining process for example, nothing, someone, anybody
 A common stop word list built using TF-IDF
Classify Opinion words
 Each word in SentiWordNet have a Positive and a Negative score
 Used score to determine polarity of opinion words
 Summation of all sentiment scores associated to an aspect is its final
score
Positive score
Negative score
Evaluation
 Prepare gold standard from annotated data
 Calculate precision and recall using gold standard and results
Conclusion
 Grammatical relations are vital in understanding NLP problems
 Double propagation can be used for several other text mining
problems
References
 Liu, Q., Liu, B., (2015). Annotated: More Customer Review Datasets (3
products) [Dataset]. Retrieved from
www.cs.uic.edu/~liub/FBS/CustomerReviews-3-domains.rar.
 Garcıa-Pablos, A., Cuadros, M., Gaines, S., & Rigau, G. (2014). V3:
Unsupervised Generation of Domain Aspect Terms for Aspect Based
Sentiment Analysis. In SemEval 2014, 833.
 SentiWordNet, from http://sentiwordnet.isti.cnr.it/
Thank you

Aspect-level sentiment analysis of customer reviews using Double Propagation

  • 1.
    Hardik Dalal Faculty ofComputer Science, Dalhousie University Aspect-level sentiment analysis of customer reviews using Double Propagation 31st March, 2016
  • 2.
    Contents  Introduction toapplication scenario  Problem statement  Objective  Approach  Data Gathering and Preparation  Dataset description  Data cleaning  Implementation  Double propagation algorithm  Filter frequent unnecessary words  Classify sentiment words  Evaluation  Results  Conclusion
  • 3.
    Introduction to applicationscenario  Web 2.0 Summit in 2004, Tim O’Reilly emphasized on user-generated content and its usage  Allows people to connect globally in the world of Web  Explosion of digital content  Blogs and Instant Messaging (IM) for starters  Followed by MySpace, Facebook, Twitter, Wikipedia, YouTube and so on  Alongside the rise of customer-generated review websites; Yelp and Epinions  And of course the E-commerce giant; Amazon
  • 4.
    Introduction to applicationscenario (cont.)  Customer reviews is a rich source of information for other customers and sellers  “78% of Americans Read Online Reviews Before Making a Purchase Decision” by YouGov  “97% of consumers found online review accurate” by ComScore  “92% users have more confidence in online reviews then sales clerk and other sources” by WSJ
  • 6.
    Problem statement  Whichinformation can help purchasing a product based on reviews easier?  Read all the reviews?  A preferable choice is to evaluate products based on its features’ rating not just overall rating like traditional 5-star rating
  • 7.
    Objective  Extract allproduct feature and sentiment word pairs from reviews  Summarize them 3 -1 1 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Cost Quality Speed Customer service
  • 8.
    Approach  Based ongrammatical relations and Part-Of-Speech (POS) tagging  Exploit relations between aspect terms and opinion words  How?  Aspect terms are mostly nouns (NN or NNS)  Opinion words are mostly adjectives (JJ, JJR, JJS)  Relations between them:  Adjectival modifier (amod)  Direct object (dobj)  Nominal subject (nsubj)  Conjunction (conj)  Has_a
  • 9.
    An example “These/DT, speakers/NNS,are/VBP, incredibly/RB, amazing/JJ” nsubj(amazing-5, speakers-2) Aspect => speakers Opinion => amazing (positive) Noun Adjective
  • 10.
    Data Gathering  (Liu& Liu, 2015) freely available for opinion mining/sentiment analysis and used in 24th International Joint Conference on Artificial Intelligence (IJCAI) 2015  Approximately 2000 reviews from 3 products fetched from Amazon  XML format, manually annotated
  • 11.
    Data Cleaning  RemovedURLs in some of the reviews  Missing start or end tags  Symbols such as '&' had to be replaced by & due to Java XML parser restrictions
  • 12.
    Double Propagation Algorithm Extract opinions using aspects Extractaspects using existing aspects Extract opinions using existing opinions Extract aspects using opinions Aspect list • … • … • … Opinion list • good • Bad • …
  • 13.
    Filter frequent unnecessarywords  Some words appear too frequently and do not carry value in further stages of mining process for example, nothing, someone, anybody  A common stop word list built using TF-IDF
  • 14.
    Classify Opinion words Each word in SentiWordNet have a Positive and a Negative score  Used score to determine polarity of opinion words  Summation of all sentiment scores associated to an aspect is its final score Positive score Negative score
  • 15.
    Evaluation  Prepare goldstandard from annotated data  Calculate precision and recall using gold standard and results
  • 16.
    Conclusion  Grammatical relationsare vital in understanding NLP problems  Double propagation can be used for several other text mining problems
  • 17.
    References  Liu, Q.,Liu, B., (2015). Annotated: More Customer Review Datasets (3 products) [Dataset]. Retrieved from www.cs.uic.edu/~liub/FBS/CustomerReviews-3-domains.rar.  Garcıa-Pablos, A., Cuadros, M., Gaines, S., & Rigau, G. (2014). V3: Unsupervised Generation of Domain Aspect Terms for Aspect Based Sentiment Analysis. In SemEval 2014, 833.  SentiWordNet, from http://sentiwordnet.isti.cnr.it/
  • 18.

Editor's Notes

  • #4 Who is Tim O’Reilly?
  • #5 YouGov - an international internet-based market research firm
  • #9 nsubj(amazing-5, speakers-2)
  • #18 Other NLP problems; NER Unsupervised technique