Comparing State-of-the-Art Collaborative Filtering Systems


Published on

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Comparing State-of-the-Art Collaborative Filtering Systems

  1. 1. Comparing State-of-the-Art Collaborative Filtering Systems Laurent Candillier, Frank Meyer, Marc Boull´ e Introduction France Telecom R&D Lannion Collaborative approaches MLDM 2007 Experiments Conclusions 1 Introduction 2 Collaborative approaches 3 Experiments 4 Conclusions
  2. 2. Recommender systems Help users find items they should appreciate from huge catalogues [Adomavicius and Tuzhilin, 2005] Introduction Collaborative approaches ⇒ Collaborative filtering : based on user to item rating matrix Experiments Conclusions i1 i2 i3 i4 i5 4 4 1 u1 4 3 u2 5 2 1 u3 4 5 u4 5 4 u5 5 3 u6 4 ? 1 u7
  3. 3. User-based approaches Recommend items appreciated by users whose tastes are similar to the ones of the given user [Resnick et al., 1994] Introduction ⇒ need a similarity measure between users Collaborative approaches ex : pearson similarity : cosine of deviation from the mean Experiments Conclusions i ∈Sa ∩Su (vai − va )(vui − vu ) w (a, u) = − va )2 − vu )2 i ∈Sa ∩Su (vai i ∈Sa ∩Su (vui vui : rating of user u on item i Su : set of items rated by user u vu : mean rating of user u vui i ∈Su vu = |Su |
  4. 4. User-based approaches Which rating for user a (active) on item i ? Introduction Collaborative approaches Prediction using weighted sum Experiments {u|i ∈Su } w (a, u) × vui Conclusions pai = {u|i ∈Su } |w (a, u)| Prediction using weighted sum of deviations from the mean {u|i ∈Su } w (a, u) × (vui − vu ) pai = va + {u|i ∈Su } |w (a, u)| How many neighbors considered ?
  5. 5. Cluster-based approaches Recommend items appreciated by users that belong to the Introduction same group as the given user [Breese et al., 1998] Collaborative approaches Experiments ⇒ need Conclusions a clustering method : ex : K-means a distance measure : ex : euclidian distance Then the rating of a user on an item is the mean rating given by the users that belong to the same cluster How many clusters considered ?
  6. 6. Item-based approaches Recommend items similar to those appreciated by the given user [Karypis, 2001] Introduction Collaborative approaches ⇒ dual of user-based approach Experiments Conclusions × (vaj − vj ) {j∈Sa |j=i } sim(i , j) pai = vi + {j∈Sa |j=i } |sim(i , j)| sim(i , j) : similarity measure between items i and j Sa : set of items rated by user a vi : mean rating on item i How many neighbors considered ?
  7. 7. Experiments For user- and item-based approaches, choose similarity measure prediction scheme Introduction Collaborative neighborhood size K approaches For cluster-based approaches, choose Experiments distance measure Conclusions prediction scheme number of clusters Evaluation protocol [Herlocker et al., 2004] movie rating dataset : MovieLens (6040 × 3706) 10-fold cross validation (10 × 9/10th for learning) Mean Absolute Error Rate on test set T = {(u, i , r )} 1 MAE = |pui − r | |T | (u,i ,r )∈T
  8. 8. User-based approaches, similarity measures MAE Introduction Pearson Collaborative Constraint approaches 0.8 Cosine Experiments Adjusted Conclusions Proba 0.76 0.72 0.68 0 500 1000 1500 2000 2500 K
  9. 9. User-based approaches, prediction schemes MAE Introduction PearsonWeighted Collaborative PearsonDeviation approaches 0.8 ProbaWeighted Experiments ProbaDeviation Conclusions 0.76 0.72 0.68 0 500 1000 1500 2000 2500 K
  10. 10. Item-based approaches, similarity measures MAE Introduction Pearson Collaborative Constraint approaches 0.76 Cosine Experiments Adjusted Conclusions Proba 0.72 0.68 0.64 0 200 400 600 800 1000 1200 1400 K
  11. 11. Summary of experiments BestDefault BestUser BestItem BestCluster Introduction model construction 1 730 170 254 time (in sec.) Collaborative prediction time approaches 1 31 3 1 (in sec.) Experiments MAE 0.6829 0.6688 0.6382 0.6736 Conclusions BestDefault : Bayes minimizing MAE BestUser : pearson similarity, 1500 neighbors, prediction using deviation from the mean BestItem : probabilistic similarity, 400 neighbors, prediction using deviation from the mean BestCluster : K-means, euclidian distance, 4 clusters, prediction using Bayes minimizing MAE
  12. 12. Conclusions Introduction Collaborative All approaches, and all their possible options, are tested approaches under exactly the same conditions Experiments Bayes is a good compromise : low error rate, low Conclusions execution time, incremental Deviation from the mean : better results, new for item-based approaches Similarity measures : pearson for user-based, probabilistic for item-based
  13. 13. Conclusions The item-based approach Introduction Collaborative get the best performances in the experiments approaches seems to need fewer neighbors than user-based approach Experiments Conclusions is also appropriate to navigate in item catalogues even with no user information may naturally use content data about items to improve its results (idem for user-based approach with demographic data) results depend on the number of items compared to the number of users ?
  14. 14. Next Need to scale well even when faced with huge datasets Introduction ex : netflix prize : 100,480,507 ratings from 480,189 users on Collaborative approaches 17,770 movies Experiments select most relevant users [Yu et al., 2002] Conclusions reduce dimensionality with PCA or SVD [Goldberg et al., 2001, Vozalis and Margaritis, 2005] create a set of super-users [Rashid et al., 2006] sampling ? stochastic ? bagging ? Combine approaches ⇒ ensemble methods [Polikar, 2006]
  15. 15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994) Grouplens: an open architecture for collaborative filtering Introduction of netnews Collaborative approaches In Conference on Computer Supported Cooperative Work, Experiments pages 175–186. ACM Conclusions J. Breese, D. Heckerman and C. Kadie (1998) Empirical analysis of predictive algorithms for collaborative filtering In 14th Conference on Uncertainty in Artificial Intelligence, pages 43–52. Morgan Kaufman G. Karypis (2001) Evaluation of item-based top-N recommendation algorithms
  16. 16. In 10th International Conference on Information and Knowledge Management, pages 247–254 K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001) Introduction Eigentaste: a constant time collaborative filtering Collaborative approaches algorithm Experiments Information Retrieval, 4(2):133–151 Conclusions K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002) Instance selection techniques for memory-based collaborative filtering In SIAM Data Mining J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004) Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems, 22(1):5–53 G. Adomavicius and A. Tuzhilin (2005)
  17. 17. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions IEEE Transactions on Knowledge and Data Engineering, Introduction 17(6):734–749 Collaborative approaches M. Vozalis and K. Margaritis (2005) Experiments Applying SVD on item-based filtering Conclusions In 5th International Conference on Intelligent Systems Design and Applications, pages 464–469 A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006) ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm In KDD Workshop on Web Mining and Web Usage Analysis R. Polikar (2006) Ensemble systems in decision making IEEE Circuits & Systems Magazine, 6(3):21–45
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.