Online Feedback Correlation using Clustering Research Work Done for CS 651:  Internet Algorithms
Dedicated to Tibor Horvath <ul><li>Whose endless pursuit of getting a PhD (imagine that) kept him from researching this to...
Problem Statement <ul><li>Millions+ of reviews available </li></ul><ul><li>Consumers read only a small number of reviews. ...
Problem Statement (continued) <ul><li>What information from reviews is important? </li></ul><ul><li>What can we extract fr...
Motivation <ul><li>People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000] </li></...
Current Solutions <ul><li>“ Good” review placement </li></ul><ul><li>Show small number of reviews </li></ul><ul><li>. . . ...
Amazon Example
Observations <ul><li>Consumers look at a product based on its overall rating </li></ul><ul><li>Consumers read “editorial r...
Observations Lead to Hypotheses! <ul><li>Hypothesis:   Products with numerous similar negative reviews will often not be p...
Definitions <ul><li>Semantic Orientation:  polar classification of whether something is positive or negative </li></ul><ul...
Overview of Project <ul><li>Obtain large repository of customer reviews </li></ul><ul><li>Extract features from customer r...
Related Work <ul><li>Related work has fallen into one of three disparate camps </li></ul><ul><ul><li>Classification:   cla...
Limitations of Related Work <ul><li>Classification </li></ul><ul><ul><li>Overly summarizing </li></ul></ul><ul><li>Domain ...
Close to Summarization? <ul><li>Most closely related to work done in Summarization by Hu and Liu. </li></ul><ul><ul><li>Su...
Data for Project <ul><li>Data from Amazon.com customer reviews  </li></ul><ul><ul><li>Available through the use of Amazon ...
Technologies Used <ul><li>Java to program modules </li></ul><ul><li>Amazon ECS </li></ul><ul><li>NLProcessor (trial versio...
Project Structure
Simplifications Made <ul><li>Limited data set  </li></ul><ul><li>Feature list created a priori </li></ul><ul><li>Features ...
Analysis <ul><li>Associated Clusters with Products </li></ul><ul><li>Found negative clusters using threshold (-0.1) </li><...
Results <ul><li>Hypothesis calculates with 82% accuracy! </li></ul><ul><li>But most of the four thousand products were pru...
Conclusion <ul><li>Consumers are affected by negative reviews that correlate to show similar flaws. </li></ul><ul><li>Affe...
Future Work <ul><li>Larger seed set for adjectives  </li></ul><ul><li>Use more complicated NLP techniques </li></ul><ul><l...
Questions
Upcoming SlideShare
Loading in …5
×

Online feedback correlation using clustering

505 views

Published on

My presentation given for Internet search class. I theorized that you could determine how good a product was based on the different types of negative reviews automatically

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
505
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Online feedback correlation using clustering

  1. 1. Online Feedback Correlation using Clustering Research Work Done for CS 651: Internet Algorithms
  2. 2. Dedicated to Tibor Horvath <ul><li>Whose endless pursuit of getting a PhD (imagine that) kept him from researching this topic. </li></ul>
  3. 3. Problem Statement <ul><li>Millions+ of reviews available </li></ul><ul><li>Consumers read only a small number of reviews. </li></ul><ul><li>Reviewer content not always trustworthy </li></ul>
  4. 4. Problem Statement (continued) <ul><li>What information from reviews is important? </li></ul><ul><li>What can we extract from the overall set of reviews efficiently to provide more utility to consumers than is already provided? </li></ul>
  5. 5. Motivation <ul><li>People are increasingly relying on online feedback mechanisms in making choices [Guernsey 2000] </li></ul><ul><li>Online feedback mechanisms draw consumers </li></ul><ul><li>Competitive Edge </li></ul><ul><li>Quality currently bad </li></ul>
  6. 6. Current Solutions <ul><li>“ Good” review placement </li></ul><ul><li>Show small number of reviews </li></ul><ul><li>. . . more Trustworthy? </li></ul>
  7. 7. Amazon Example
  8. 8. Observations <ul><li>Consumers look at a product based on its overall rating </li></ul><ul><li>Consumers read “editorial review” for content </li></ul><ul><li>Reviews indicate can indicate common issues </li></ul><ul><li>… Can we correlate these reviews in some meaningful way? </li></ul>
  9. 9. Observations Lead to Hypotheses! <ul><li>Hypothesis: Products with numerous similar negative reviews will often not be purchased regardless of their positive reviews. Furthermore, the number of negative reviews is a high indication of the likeliness of certain flaws in a product. </li></ul>
  10. 10. Definitions <ul><li>Semantic Orientation: polar classification of whether something is positive or negative </li></ul><ul><li>Natural Language Processing: deciphering parts of speech from free text </li></ul><ul><li>Feature: quality of a product that customers care about </li></ul><ul><li>Feature Vector: vector representing a review in a d-dimensional space where each dimension represents a feature. </li></ul>
  11. 11. Overview of Project <ul><li>Obtain large repository of customer reviews </li></ul><ul><li>Extract features from customer reviews and orient them </li></ul><ul><li>Create feature vectors i.e. [1,0,-1,1,1,-1 … ] from reviews and features </li></ul><ul><li>Cluster feature vectors to find large negative clusters </li></ul><ul><li>Analyze clusters and compare to hypothesis </li></ul>
  12. 12. Related Work <ul><li>Related work has fallen into one of three disparate camps </li></ul><ul><ul><li>Classification: classifying Reviews into Negative or Positive reviews </li></ul></ul><ul><ul><li>Domain Specificity: overall effect of reviews in a domain </li></ul></ul><ul><ul><li>Summarization: features extraction to summarize reviews </li></ul></ul>
  13. 13. Limitations of Related Work <ul><li>Classification </li></ul><ul><ul><li>Overly summarizing </li></ul></ul><ul><li>Domain Specificity </li></ul><ul><ul><li>Hard to generalize given domain information </li></ul></ul><ul><li>Summarization </li></ul><ul><ul><li>No overall knowledge of collection </li></ul></ul>
  14. 14. Close to Summarization? <ul><li>Most closely related to work done in Summarization by Hu and Liu. </li></ul><ul><ul><li>Summarization with dynamical feature extraction and orientation per review </li></ul></ul>
  15. 15. Data for Project <ul><li>Data from Amazon.com customer reviews </li></ul><ul><ul><li>Available through the use of Amazon E-Commerce Service (ECS) </li></ul></ul><ul><ul><li>Four thousand products related to mp3 players </li></ul></ul><ul><ul><li>Over twenty thousand customer reviews </li></ul></ul>
  16. 16. Technologies Used <ul><li>Java to program modules </li></ul><ul><li>Amazon ECS </li></ul><ul><li>NLProcessor (trial version) from Infogistics </li></ul><ul><li>Princeton’s WordNet as a thesaurus </li></ul><ul><li>KMLocal from David Mount’s group at University of Maryland for clustering </li></ul>
  17. 17. Project Structure
  18. 18. Simplifications Made <ul><li>Limited data set </li></ul><ul><li>Feature list created a priori </li></ul><ul><li>Features from same sentence given same orientation </li></ul><ul><li>Sentences without features neglected </li></ul><ul><li>Number of clusters chosen only to see correlations in biggest cluster </li></ul><ul><li>Small adjective seed set </li></ul>
  19. 19. Analysis <ul><li>Associated Clusters with Products </li></ul><ul><li>Found negative clusters using threshold (-0.1) </li></ul><ul><li>Eliminated non-Negative Clusters </li></ul><ul><li>Sorted products list twice </li></ul><ul><ul><li>Products by sales rank (given by Amazon) </li></ul></ul><ul><ul><li>Products sorted by hypothesis with tweak </li></ul></ul><ul><ul><ul><li>Tweak: Relative Size * Distortion </li></ul></ul></ul><ul><li>Computed Spearman’s Distance </li></ul>
  20. 20. Results <ul><li>Hypothesis calculates with 82% accuracy! </li></ul><ul><li>But most of the four thousand products were pruned due to poor orientation </li></ul>
  21. 21. Conclusion <ul><li>Consumers are affected by negative reviews that correlate to show similar flaws. </li></ul><ul><li>Affected regardless of the positive reviews </li></ul>
  22. 22. Future Work <ul><li>Larger seed set for adjectives </li></ul><ul><li>Use more complicated NLP techniques </li></ul><ul><li>Experiment with the size of clusters </li></ul><ul><li>Dynamically determine features using summary techniques </li></ul><ul><li>Use different data sets </li></ul><ul><li>Use different distance measure in clustering </li></ul>
  23. 23. Questions

×