Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Product Reputations On the Web


Published on

Published in: Business, Technology
  • Be the first to comment

Mining Product Reputations On the Web

  1. Mining Product Reputations on the Web SIGKDD 02 Edmonton, Alberta, Canada Copyright 2002 ACM Satoshi Morinaga, Kenji Yamanishi NEC Corporation Kenji Tateishi, Toshikazu Fukushima NEC Corporation
  2. Agenda <ul><li>Introduction </li></ul><ul><li>Reputation mining system </li></ul><ul><li>Opinion extraction </li></ul><ul><li>Reputation analysis </li></ul><ul><li>Experiments </li></ul><ul><li>Concluding remarks </li></ul>
  3. Introduction <ul><li>Knowing the reputation of your own and/or competitors’ products is important. </li></ul><ul><li>Problems: </li></ul><ul><ul><li>Handling the large volume of open answer by manually </li></ul></ul><ul><ul><li>Gather the large volume of high quality survey data. </li></ul></ul><ul><li>Solution: </li></ul><ul><ul><li>New framework for automatically collecting and analyzing opinions on the internet. </li></ul></ul><ul><ul><li>Combining the opinion extraction technique and text mining methodologies. </li></ul></ul><ul><ul><ul><li>Previously employed in Survey Analyzer (SA. Is a trademark of NEC corporation in Japan.) </li></ul></ul></ul><ul><ul><ul><li>Text mining focus on open answer </li></ul></ul></ul><ul><ul><ul><li>Text classification through close answer or manual labeling </li></ul></ul></ul>
  4. Introduction (cont.) <ul><li>Opinion extraction </li></ul><ul><ul><li>Collects people’s opinions about products from the internet and attaches three labels: </li></ul></ul><ul><ul><ul><li>The name of product referred to </li></ul></ul></ul><ul><ul><ul><li>The positive/negative nature of opinion </li></ul></ul></ul><ul><ul><ul><li>opinion-likeliness (a numerical value the degree of system confidence that the extracted statement is.) </li></ul></ul></ul><ul><li>Labeled opinions put into an opinion database. </li></ul><ul><li>Reputation analysis </li></ul><ul><ul><li>Rule analysis (Extracting characteristic words) </li></ul></ul><ul><ul><ul><li>“ monochrome” and “inexpensive”, “lightweight” and “convenient” </li></ul></ul></ul><ul><ul><ul><li>Stochastic complexity </li></ul></ul></ul><ul><ul><li>Co-occurrence analysis </li></ul></ul><ul><ul><li>Typical sentence analysis </li></ul></ul><ul><ul><li>Correspondence analysis </li></ul></ul><ul><ul><ul><li>Two-dimensional positioning map </li></ul></ul></ul><ul><ul><ul><li>Display the corresponding relationships among the target categories. </li></ul></ul></ul>
  5. Reputation mining system
  6. Opinion extraction <ul><li>Web page collection module </li></ul><ul><ul><li>Use a crawler to collect web pages relevant to input product names. </li></ul></ul><ul><li>Positive/negative determining module </li></ul><ul><ul><li>Checked with a previously prepared “evaluation-expression dictionary” </li></ul></ul><ul><ul><li>“ fast”, “good”, “light” are positive expression </li></ul></ul><ul><ul><li>“ heavy”, “easily broken”, “noisy” are negative expression </li></ul></ul>
  7. Opinion extraction (cont.) <ul><li>Opinion-likeliness calculation module </li></ul><ul><ul><li>Calculate its opinion-likeliness score </li></ul></ul><ul><ul><li>A real value ranging from 1 to 5 </li></ul></ul><ul><ul><li>The higher score, the higher likelihood </li></ul></ul><ul><ul><li>Using syntactic property rules </li></ul></ul><ul><ul><ul><li>Learned manually from training examples or </li></ul></ul></ul><ul><ul><ul><li>Standard machine learning </li></ul></ul></ul>
  8. Reputation analysis <ul><li>Rule Analysis (Characteristic-Word Extraction) </li></ul><ul><ul><li>Training </li></ul></ul><ul><ul><ul><li>Resemble decision tree generation </li></ul></ul></ul><ul><ul><ul><li>Use stochastic complexity as a criterion </li></ul></ul></ul><ul><ul><li>Text classification rules & association rules </li></ul></ul><ul><ul><ul><li>Ordered sequences of IF-THEN-ELSE rules </li></ul></ul></ul><ul><ul><li>Extract keywords indicative of a specified category </li></ul></ul><ul><ul><li>Stochastic complexity formula </li></ul></ul><ul><ul><ul><li>Score(w) represents information gain </li></ul></ul></ul>
  9. Rule Analysis (cont.)
  10. Co-occurrence analysis <ul><li>Extract a list of words that co-occur with characteristic words </li></ul>
  11. Typical sentence analysis & Correspondence analysis <ul><li>Typical sentence analysis </li></ul><ul><ul><li>Give user a simple overview of tendencies </li></ul></ul><ul><ul><li>Scores are calculated on the basis of the naïve Bayesian theory (posterior probability ) </li></ul></ul><ul><li>Correspondence analysis </li></ul><ul><ul><li>Create a two-dimensional position map. </li></ul></ul><ul><ul><li>An extension of principal component analysis (PCA) </li></ul></ul>
  12. Experiments – Cellular Phone
  13. Experiments - PDAs
  14. Experiments – Internet Service Providers
  15. Concluding remarks <ul><li>Purpose a framework for mining product reputation on the web. </li></ul><ul><li>Four fundamental tasks: </li></ul><ul><ul><li>Characteristic word extraction </li></ul></ul><ul><ul><li>Co-occurring word extraction </li></ul></ul><ul><ul><li>Typical sentence extraction </li></ul></ul><ul><ul><li>Correspondence analysis </li></ul></ul><ul><li>The key to combining two parts is opinion labeling </li></ul><ul><li>This framework could applied to mining reputation far beyond industrial products. i.e., events, services, companies, governments, etc. </li></ul>