Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

876 views

Published on

Data Mining Concepts and Techniques 2nd Ed slides

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
876
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
58
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber

  1. 1. Data Mining: Concepts and Techniques — Chapter 11 — Additional Theme: Collaborative Filtering & Data Mining Jiawei Han and Micheline Kamber Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj©2006 Jiawei Han and Micheline Kamber. All rights reserved04/18/13 Data Mining: Principles and Algorithms 1
  2. 2. 04/18/13 Data Mining: Principles and Algorithms 2
  3. 3. Outline Motivation Systems in Action A Conceptual Framework User-User Methods Item-Item Methods Recent Advances and Open Problems04/18/13 Data Mining: Principles and Algorithms 3
  4. 4. Motivation User Perspective  Lots of online products, books, movies, etc.  Reduce my choices…please… Manager Perspective “ if I have 3 million customers on the web, I should have 3 million stores on the web.” CEO of Amazon.com [SCH01]04/18/13 Data Mining: Principles and Algorithms 4
  5. 5. Example: Recommendation Customers who bought this book also bought: •Data Preparation for Data Mining: by Dorian Pyle (Author) •The Elements of Statistical Learning: by T. Hastie, et al •Data Mining: Introductory and Advanced Topics: by Margaret H. Dunham •Mining the Web: Analysis of Hypertext and Semi Structured Data04/18/13 Data Mining: Principles and Algorithms 5
  6. 6. Example: Personalization04/18/13 Data Mining: Principles and Algorithms 6
  7. 7. Other Examples Movielens: movies Moviecritic: movies again My launch: music Gustos starrater: web pages Jester: Jokes TV Recommender: TV shows Suggest 1.0 : different products And much more…04/18/13 Data Mining: Principles and Algorithms 7
  8. 8. How it Works? Each user has a profile Users rate items  Explicitly: score from 1..5  Implicitly: web usage mining  Time spent in viewing the item  Navigation path  Etc… System does the rest, How?  This is what we will show today04/18/13 Data Mining: Principles and Algorithms 8
  9. 9. Basic Approaches Collaborative Filtering (CF)  Look at users collective behavior  Look at the active user history  Combine! Content-based Filtering  Recommend items based on key-words  More appropriate for information retrieval04/18/13 Data Mining: Principles and Algorithms 9
  10. 10. Collaborative Filtering: A Framework Items: I i1 i2 … ij … in u1 The task: u2 3 1.5 …. … 2 Q1: Find Unknown ratings? Q2: Which items should we … rij=? recommend to this user? . ui 2 . ... . 1Users: U um Unknown function f: U x I→ R 04/18/13 Data Mining: Principles and Algorithms 10
  11. 11. Collaborative Filtering Road Map User-User Methods  Identify like-minded users  Memory-based: K-NN  Model-based: Clustering Item-Item Method  Identify buying patterns  Correlation Analysis  Linear Regression  Belief Network  Association Rule Mining04/18/13 Data Mining: Principles and Algorithms 11
  12. 12. User-User Similarity: Intuition Target Customer Q3: How to combine?Q1: How to measure similarity? Q2: How to select neighbors?04/18/13 Data Mining: Principles and Algorithms 12
  13. 13. How to Measure Similarity? i1 in Pearson correlation coefficient ui ∑Rated Itemsra )(rij − ri ) j∈ Commonly (raj − ua w p ( a, i ) = ∑ (raj − ra ) 2 j∈Commonly Rated Items ∑ ( rij − ri ) 2 j∈Commonly Rated Items Cosine measure  Users are vectors in product-dimension space ra .ri wc (a, i ) = r a 2 * ri 204/18/13 Data Mining: Principles and Algorithms 13
  14. 14. Nearest Neighbor Approaches [SAR00a] Offline phase:  Do nothing…just store transactions Online phase:  Identify highly similar users to the active one  Best K ones  All with a measure greater than a threshold Prediction ∑ w(a, i)(r − r ) ij i raj = ra + i User a’s neutral ∑ w(a, i) i User i’s deviation User a’s estimated deviation04/18/13 Data Mining: Principles and Algorithms 14
  15. 15. Horting Method [ AGG99 ] K-NN is not transitive Horting takes advantage of transitivity Uses new similarity measure: Predictability User i predicts user a if  They have rated sufficiently common items  There is an error-bounded linear transformation from user i’s ratings to a’s ones04/18/13 Data Mining: Principles and Algorithms 15
  16. 16. How Horting Works?  Offline phase: build neighborhood graph  Online phase: Compute raj 1- Identify users who predict ua 2- Identify users who rated jUa 3- Find shortest paths from group1 to 2 4- Backward propagation and averaging - Better for sparse environments - Not well evaluated 04/18/13 Data Mining: Principles and Algorithms 16
  17. 17. Clustering [BRE98] Offline phase:  Build clusters: k-mean, k-medoid, etc. Online phase:  Identify the nearest cluster to the active user  Prediction:  Use the center of the cluster  Weighted average between cluster members  Weights depend on the active user Faster Slower but a little more accurate04/18/13 Data Mining: Principles and Algorithms 17
  18. 18. Clustering vs. k-NN Approaches  K-NN using Pearson measure is slower but more accurate  Clustering is more scalable Active user Bad recommendationsWe can use soft clustering butwill lose computational edge04/18/13 Data Mining: Principles and Algorithms 18
  19. 19. Did We Answer the Questions? Target Customer Q3: How to combine?Q1: How to measure similarity? Q2: How to select neighbors?04/18/13 Data Mining: Principles and Algorithms 19
  20. 20. Are We Done?  Q1:How to measure similarity? Done... Really?? Sparsity results from the poor representation! ∑ ...... j∈ Commonly Rated Itemsw p ( a, i ) = U1 rates recycled letter pads High ..... U2 rates recycled memo pads High Both of them like Recycled office products They are similar but the math won’t work for thatWhat about Sparsity?Not enough common Items Example from [SAR00P]implies spurious neighborsand hence bad recommendations By working at the right level of abstraction we can eliminate sparsity04/18/13 Data Mining: Principles and Algorithms 20
  21. 21. The Power of Representation [UNG98] Action Foreign Classic Q1-B: How can we formalize this intuition?04/18/13 Data Mining: Principles and Algorithms 21
  22. 22. How to Abstract? Semi-manual Methods  Use product features  Cluster products first, then cluster users  Works only if we have descriptive features Automatic Methods  Adjusted Product Taxonomy  Latent Semantic Indexing04/18/13 Data Mining: Principles and Algorithms 22
  23. 23. Adjusted Product Taxonomy [CHO04] • Input : product taxonomy •Output: modified taxonomy with even distribution04/18/13 Data Mining: Principles and Algorithms 23
  24. 24. Adjusted Product Taxonomy (2) Using original taxonomyNumber of transactionshaving this category Using adjusted taxonomy 04/18/13 Data Mining: Principles and Algorithms 24
  25. 25. Latent Semantic Indexing [SAR00b] = Sk I’ R R UUk S k Ik’ mXn k mXr rXr k k k rXn The reconstructed matrix Rk = Uk.Sk.Ik’ is the closest rank-k matrix to the original matrix R. • Captures latent associations • Reduced space is less-noisy04/18/13 Data Mining: Principles and Algorithms 25
  26. 26. Are We Done? (2) Not adequately  Q2:How to Select Neighbors? answered  We don’t expect to use the same neighbors for all products  Neighbors should be product-category specific Q2-B. How can we determine whether or not a user is relevant to a given product?04/18/13 Data Mining: Principles and Algorithms 26
  27. 27. Selecting Relevant Instances [YU01]  Superman and Batman and correlated Predict this  Titanic and Batman are negatively correlated  “Dances with Wolves” has nothing to do with Batman’s rating  Karen is not a good instance to consider How can we formalize this?  Mutual Information  MI(X;Y) = H(X) – H(X|Y)04/18/13 Data Mining: Principles and Algorithms 27
  28. 28. Selecting Relevant Instances (2) Offline phase:  Estimate mutual information between items  For each item:  Find users who rated it  Compute their strength (how many relevant items they also rated)  Retain subset of them (10% works fine) Online phase:  To predict the target item’s rating, run k-NN on its reduced instance space Better results with less data… quality not quantity is what matter04/18/13 Data Mining: Principles and Algorithms 28
  29. 29. Are We Done? (3) Q3:How to combine? Weighted average Discover association rules in neighbors’ transactions [LEE01, WAN04]  For every x in this group: like(x, Item1) ^ like(x, Item2) like(x, Item3)  Use confidence and support to judge the quality of the prediction  Prediction is done on the binary level (like, dislike)  Costly to run online04/18/13 Data Mining: Principles and Algorithms 29
  30. 30. User-User Methods Evaluation Achieve good quality in practice The more processing we push offline, the better the method scale However:  User preference is dynamic  High update frequency of offline-calculated information  No recommendation for new users  We don’t know much about them yet04/18/13 Data Mining: Principles and Algorithms 30
  31. 31. Collaborative Filtering Road Map User-User Methods  Identify like-minded users  Memory-based: K-NN  Model-based: Clustering Item-Item Method  Identify buying patterns  Correlation Analysis  Linear Regression  Belief Network  Association Rule Mining04/18/13 Data Mining: Principles and Algorithms 31
  32. 32. Item-Item Similarity: The Intuition  Search for similarities among items  All computations can be done offline  Item-Item similarity is more stable that user-user similarity  No need for frequent updates  First Order Models  Correlation Analysis  Linear Regression  Higher Order Models  Belief Network  Association Rule Mining04/18/13 Data Mining: Principles and Algorithms 32
  33. 33. Correlation-based Methods [SAR01] Same as in user-user similarity but on item vectors Pearson correlation coefficient  Look for users who rated both items i1 ii ij in ∑ (r uj − r )(rui − ri ) j u1 sij = u∈ Users Rated Both Items ∑ (ruj − rj ) 2 u∈Users Rated Both Items ∑ (rui − ri ) 2 u∈Users Rated Both Items um04/18/13 Data Mining: Principles and Algorithms 33
  34. 34. Correlation-based Methods (2) Offline phase:  Calculate n(n-1) similarity measures  For each item  Determine its k-most similar items Online phase:  Predict rating for a given user-item pair as a weighted sum over similar items that he rated ∑s r ij ai raj = i∈similar items Ua 2 3 ? 4 ∑s ij i∈similar items j04/18/13 Data Mining: Principles and Algorithms 34
  35. 35. Regression Based Methods [VUC00] Offline phase:  Fit n(n-1) linear regressions  F (x) is a linear transformation of a user rating on ij item i to his rating on item j Online phase  Same as previous method  The weights are inversely proportional to the regression error rates ∑ w f (r ij ij i∈rated items by a ai ) raj = ∑w ij i∈rated items by a04/18/13 Data Mining: Principles and Algorithms 35
  36. 36. Higher Order Models Previous approaches used the Naïve Bayes assumption  Item effects on a given one are independent Not always true Higher order models can do better  Belief Network  Association Rule Mining04/18/13 Data Mining: Principles and Algorithms 36
  37. 37. Bayesian Belief Network: introduction Bayesian belief network allows a subset of the variables to be conditionally independent A graphical model of causal relationships  Represents dependency among the variables  Gives a specification of joint probability distribution Nodes: random variables Links: dependency X Y X,Y are the parents of Z, and Y is the parent of P Z No dependency between Z and P P Has no loops or cycles04/18/13 Data Mining: Principles and Algorithms 37
  38. 38. Bayesian Belief Network: An Example Family Smoker History (FH, S) (FH, ~S) (~FH, S) (~FH, ~S) LC 0.8 0.5 0.7 0.1LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9 The conditional probability table for the variable LungCancer:PositiveXRay Dyspnea Shows the conditional probability for each possible combination of its parents n Bayesian Belief Networks P ( z1,..., zn) = ∏ P ( z i | Parents ( Z i )) i =104/18/13 Data Mining: Principles and Algorithms 38
  39. 39. Belief Network for CF [BRE98] Every item is a node Binary rating (like, dislike) Learn offline a belief network over the training date CPT table at each node is represented as a decision tree Use greedy algorithms to determine the best network structure Use probabilistic inference for online prediction04/18/13 Data Mining: Principles and Algorithms 39
  40. 40. Belief Network for CF: An Example CPT Friends B.H M.P Probability decision tree for the random variable “Melrose Palace” in the movie domain04/18/13 Data Mining: Principles and Algorithms 40
  41. 41. Association Rule Mining Offline processing  Work on the binary level (like, dislike)  View user as market basket containing items liked by user  Discover association rules between items Online processing:  Match items that the active user like with rules left hand side  Recommend rules’ consequent based on support and confidence04/18/13 Data Mining: Principles and Algorithms 41
  42. 42. Association Rule Mining : Problems High support threshold leads to low coverage and may eliminate important, but infrequent items from consideration Low support thresholds result in very large model sizes, computationally expensive offline pattern discovery phase and slower online matching phase Solution:  Adaptive Association Rule Mining04/18/13 Data Mining: Principles and Algorithms 42
  43. 43. Adaptive Association Rule Mining [LIN01]  Given:  transaction dataset Desired number minConfidence of rules  target item  desired range for number of rules  specified minimum confidence minSupport Find: set S of association rules for target item such that  number of rules in S is in given range  rules in S satisfy minimum confidence constraint  rules in S have higher support than rules not in S that satisfy above constraints04/18/13 Data Mining: Principles and Algorithms 43
  44. 44. Adaptive Association Rule Mining (2) Discover rules with one item on the head  Like (x, item1) ^ Like (x, item2)  Like(x, target) The miner discovers association rules iteratively (for each target item) until the desired number of rules are extracted Support is adjusted per-item04/18/13 Data Mining: Principles and Algorithms 44
  45. 45. Item-Item Methods: Why It Works? Like(x,Book1)^like(x,book2) Like(x,Movie1)  like(x,Movie2) like(x,book3) Book1, Book2 Support Movie1 Support Book Movie gang gang Without discovering theWe use the right neighbors for each groups themselves thusitem eliminating costly online matching In general better quality than user-user methods and better response time [LIN03] 04/18/13 Data Mining: Principles and Algorithms 45
  46. 46. Recent Work and Open Problems Order-based methods  Ordering items is more informative than rating them  [KAM03] developed k-o’mean to work on orders Preference-based methods  Total ordering of items is not feasible  Work on partial orders (preferences) [COH99] Integrating background knowledge  User demographic information, item-features, etc.. Modeling time  Sequential patterns04/18/13 Data Mining: Principles and Algorithms 46
  47. 47. References (1) Charu C. Aggarwal, Joel L. Wolf, Kun-Lung Wu, Philip S. Yu: Horting Hatches an Egg: A New Graph-Theoretic Approach to Collaborative Filtering. KDD 1999: 201-212 J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive Algorithms for Collaborative Filtering. In Proc. 14th Conf. Uncertainty in Artificial Intelligence, Madison, July 1998. Yoon Ho Cho and Jae Kyeong Kim: Application of Web usage mining and product taxonomy to collaborative recommendations in e-commerce. Expert Systems with Applications, 26(2), 2003 William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order things. In Advances in Neural Processing Systems 10, Denver, CO, 1997 Jiawe Han, Fall 2003 online course notes available at: http://www-courses.cs.uiuc.edu/~cs397han/slides/05.ppt Toshihiro Kamishima: Nantonac collaborative filtering: recommendation based on order responses. KDD 2003: 583-588 Lee, C.-H, Kim, Y.-H., Rhee, P.-K. Web personalization expert with combining collaborative filtering and association rule mining technique. Expert Systems with Applications, v 21, n 3, October, 2001, p 131-13704/18/13 Data Mining: Principles and Algorithms 47
  48. 48. References (2) W. Lin, 2001P, online presentation available at: http://www.wiwi.hu- berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/LinAlvarezRuiz_W ebKDD2000.ppt Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support association rule mining for recommender systems. Data Mining and Knowledge Discovery, 6:83--105, 2002 G. Linden, B. Smith, and J. York, "Amazon.com Recommendations Iemto -item collaborative filtering", IEEE Internet Computing, Vo. 7, No. 1, pp. 7680, Jan. 2003. Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: Analysis of recommendation algorithms for e-commerce. ACM Conf. Electronic Commerce 2000: 158-167 B. Sarwar, G. Karypis, J. Konstan, and J. Riedl: Application of dimensionality reduction in recommender systems--a case study. In ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000. B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based collaborative filtering recommendation algorithms. WWW’0104/18/13 Data Mining: Principles and Algorithms 48
  49. 49. References (3) B. Sarwar, 2000P, online presentation available at: http://www.wiwi.hu- berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/badrul.ppt J. Ben Schafer, Joseph A. Konstan, John Riedl: E-Commerce Recommendation Applications. Data Mining and Knowledge Discovery 5(1/2): 115-153, 2001 L.H. Ungar and D.P. Foster: Clustering Methods for Collaborative Filtering, AAAI Workshop on Recommendation Systems, 1998. Yi-Fan Wang, Yu-Liang Chuang, Mei-Hua Hsu and Huan-Chao Keh: A personalized recommender system for the cosmetic business. Expert Systems with Applications, v 26, n 3, April, 2004 Pages 427-434 S. Vucetic and Z. Obradovic. A regression-based approach for scaling-up personalized recommender systems in e-commerce. In ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000. Kai Yu, Xiaowei Xu, Martin Ester, and Hans-Peter Kriegel: Selecting relevant instances for efficient accurate collaborative filtering. In Proceedings of the 10th CIKM, pages 239--246. ACM Press, 2001. Cheng Zhai, Spring 2003 online course notes available at: http://sifaka.cs.uiuc.edu/course/2003-497CXZ/loc/cf.ppt04/18/13 Data Mining: Principles and Algorithms 49
  50. 50. 04/18/13 Data Mining: Principles and Algorithms 50

×