WE KNOW YOU WILL LIKE THIS                                 Introduction to Recommendation EnginesMonday, January 14, 13
ML                                          X                     X     +Y              Supervised                        ...
MarabooKarnaf Ima Adama                                                                                                Liv...
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Monday, January 14, 13
Preference Problem (Ads)                         Rating Problem (Movies)Monday, January 14, 13
Monday, January 14, 13
Related problem: RankingMonday, January 14, 13
Maraboo   Karnaf   Ima Adama Liv                         Idan         1        ?         1        ?                       ...
Maraboo   Karnaf   Ima Adama Liv              Idan           1        ?         1        ?              Shahar         1  ...
Maraboo   Karnaf   Ima Adama Liv                  Idan         5        ?         3        ?                  Shahar      ...
User-based Collaborative FilteringMonday, January 14, 13
Monday, January 14, 13
Jaccard Distance                            “We share 5 preferences out of 7!”          Euclidean Distance            Cosi...
Item-Based Collaborative Filtering          Usually boundedMonday, January 14, 13
Case study: Amazon                         100,000,000 users                         2,000,000 items                      ...
Interpretability                         “People who go to                             La Colombe                  “Coffee...
Evaluation                         Rating Problem: Predictive accuracy (regression) metrics                            RMS...
Monday, January 14, 13
Challenges                         Cold-start problems (new item, new user)                         “Black” and “Grey” she...
Advanced Topics                         Dimensionality Reduction                         Map-Reducible calculations       ...
MapReduce Similarity Calculation                                          “User-based”                                    ...
MapReduce Similarity Calculation                                          “Item-Based”                                    ...
MapReduce Similarity Calculation                            Recall row outer-product matrix multiplication:               ...
MapReduce Similarity Calculation                         All of the classic similarity functions are                      ...
Bibliography                         Google News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Ga...
Thanks!                         Nimrod Priell                         nimrod.priell@gmail.com                         @nim...
Upcoming SlideShare
Loading in...5
×

Collaborative filtering intro - Full

785

Published on

A new version of the collaborative filtering talk:
- Presenting the Netflix Prize story
- Discussing User-based and Item-based collaborative filtering, and various similarity metrics
- Discussing how to Map-Reduce the calculation

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
785
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
38
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Collaborative filtering intro - Full

  1. 1. WE KNOW YOU WILL LIKE THIS Introduction to Recommendation EnginesMonday, January 14, 13
  2. 2. ML X X +Y Supervised Unsupervised Clustering T + YT X X +Y Hierarchical Clustering Regression Classification Turnout Class 30 Spam Y= (numeric) Y = Not Spam (Categorical) 12 25 SpamMonday, January 14, 13
  3. 3. MarabooKarnaf Ima Adama Liv Idan 5 ? 3 ? Shahar 4 3 ? 2 Gadi ? 1 ? 5 Content/Model-Based (Agnostic, Behavioural) (predicting the rating) RecommendationMonday, January 14, 13
  4. 4. Monday, January 14, 13
  5. 5. Monday, January 14, 13
  6. 6. Monday, January 14, 13
  7. 7. Monday, January 14, 13
  8. 8. Monday, January 14, 13
  9. 9. Monday, January 14, 13
  10. 10. Monday, January 14, 13
  11. 11. Monday, January 14, 13
  12. 12. Preference Problem (Ads) Rating Problem (Movies)Monday, January 14, 13
  13. 13. Monday, January 14, 13
  14. 14. Related problem: RankingMonday, January 14, 13
  15. 15. Maraboo Karnaf Ima Adama Liv Idan 1 ? 1 ? Shahar 1 1 ? 1 Gadi ? 1 ? 1 Maraboo Karnaf Ima Adama Liv Idan 5 ? 3 ? Shahar 4 3 ? 2 Gadi ? 1 ? 5Monday, January 14, 13
  16. 16. Maraboo Karnaf Ima Adama Liv Idan 1 ? 1 ? Shahar 1 1 ? 1 Gadi ? 1 ? 1Monday, January 14, 13
  17. 17. Maraboo Karnaf Ima Adama Liv Idan 5 ? 3 ? Shahar 4 3 ? 2 Gadi ? 1 ? 5Monday, January 14, 13
  18. 18. User-based Collaborative FilteringMonday, January 14, 13
  19. 19. Monday, January 14, 13
  20. 20. Jaccard Distance “We share 5 preferences out of 7!” Euclidean Distance Cosine Similiarity Pearson’s Correlation 1- “Our preferences go Distance in the same direction!” (but only 2 such preferences do...) Log-Likelihood Ratio Measure of “Surprise” at correlationMonday, January 14, 13
  21. 21. Item-Based Collaborative Filtering Usually boundedMonday, January 14, 13
  22. 22. Case study: Amazon 100,000,000 users 2,000,000 items Each user expresses preference for 10 items Each item has 500 reviews User-Based CF: Item-Based CF: 100,000,000 x 100,000,000 2,000,000 x 2,000,000 similarity similarity matrix matrix 2,000,000 x 500 sum terms 2,000,000 x 10 sum termsMonday, January 14, 13
  23. 23. Interpretability “People who go to La Colombe “Coffee Shop Torrefaction & connoisseurs tend FourSquare HQ tend to come here” to go here”Monday, January 14, 13
  24. 24. Evaluation Rating Problem: Predictive accuracy (regression) metrics RMSE, MAE, etc. Preference (Binary) Problem: Classification accuracy (IR) metrics Accuracy, Precision, Recall, F-1, ROC, etc. Benchmark vs. ‘random’ and ‘popular’ Ranking accuracy metrics: Similarity of permutations Pearson’s correlation, Spearman’s rho, Kendall’s tauMonday, January 14, 13
  25. 25. Monday, January 14, 13
  26. 26. Challenges Cold-start problems (new item, new user) “Black” and “Grey” sheep Exploration-exploitation and reinforcement learning ScaleMonday, January 14, 13
  27. 27. Advanced Topics Dimensionality Reduction Map-Reducible calculations Content-based (feature-based) Multiple modelsMonday, January 14, 13
  28. 28. MapReduce Similarity Calculation “User-based” A ui Maraboo Karnaf Ima Adama Liv Gadi Gadi Idan Shahar 1 1 ? 1 1 ? ? 1 * Maraboo Karnaf ? 1 = Idan Shahar 0 2 Gadi ? 1 ? 1 Ima Adama ? Gadi 2 Liv 1 User similarity vector AT Aui T(Au ) Maraboo Idan 1 Shahar Gadi 1 ? * Idan Gadi 0 = Maraboo Gadi 2 A i Karnaf ? 1 1 Shahar 2 Karnaf 4 Ima Adama 1 ? ? Gadi 2 Ima Adama 0 Liv ? 1 1 Liv 4Monday, January 14, 13
  29. 29. MapReduce Similarity Calculation “Item-Based” A T A Idan Shahar Gadi Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv Maraboo 1 1 ? Idan 1 ? 1 ? Karnaf ? 1 1 * Shahar 1 1 ? 1 = Maraboo Karnaf 2 1 1 2 1 0 1 2 Ima Adama 1 ? ? Gadi ? 1 ? 1 Ima Adama 1 0 1 0 Liv ? 1 1 Liv 1 2 0 2 Item similarity matrix ATA ui Maraboo Karnaf Ima Adama Liv Gadi Gadi Maraboo 2 1 1 1 Maraboo ? Maraboo 2 = * T (A A)ui Karnaf 1 2 0 2 Karnaf 1 Karnaf 4 Ima Adama 1 0 1 0 Ima Adama ? Ima Adama 0 Liv 1 2 0 2 Liv 1 Liv 4 Similarity of item x to item y is <ix,iy>Monday, January 14, 13
  30. 30. MapReduce Similarity Calculation Recall row outer-product matrix multiplication: Maraboo Karnaf Ima Adama Liv Maraboo 2 1 1 1 Karnaf 1 2 0 2 Ima Adama 1 0 1 0 Liv 1 2 0 2 = Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv Maraboo Karnaf Ima Adama Liv Maraboo 1 0 1 0 Maraboo 1 1 0 1 Maraboo 0 0 0 0 Karnaf Ima Adama 0 1 0 0 0 1 0 0 + Karnaf Ima Adama 1 0 1 0 0 0 1 0 + Karnaf Ima Adama 0 0 1 0 0 0 1 0 Liv 0 0 0 0 Liv 1 1 0 1 Liv 0 1 0 1 uIdanuIdan T uShaharuShahar T uGadiuGadi T Only one user’s list of items is used every time!Monday, January 14, 13
  31. 31. MapReduce Similarity Calculation All of the classic similarity functions are made up of 3 stages: Preprocess (uses only one ELEMENT) Norm (Can be done in reduce on one VECTOR) T Similarity utilizes the A A matrix joined with norm entriesMonday, January 14, 13
  32. 32. Bibliography Google News Personalization: Scalable Online Collaborative Filtering - Das, Datar, Garg, Rajaram, WWW2007 Logistic Regression and Collaborative Filtering for Sponsored Search Term Recommendation - Bartz, Murthi, Sebastian, EC2006 Evaluating Collaborative Filtering Recommender Systems - Herlocker, Konstan, Tenveen, Riedl, ACM TIS2004 A Survey of Collaborative Filtering Techniques - Su, Khoshgoftaar, AAI2009 An Introduction to Information Retrieval - Manning, Raghavan, Schutze, Cambridge Press Mahout in Action - Friedman, Dunning, Anil, Owen, Manning Publications Lessons from the Netflix Prize Challenge - Bell, Koren, KDD2009 Factorization meets the Neighbourhood: a Multifaceted Collaborative Filtering Model - Koren, KDD2008 Accurate Methods for the Statistics of Surprise and Coincidence - Dunning, ACL1993 Item-Based Collaborative Filtering Recommendation Algorithms - Sarwar, Konstan, Karypis, Riedl, WWW2001 Matrix Factorization Techniques for Recommender Systems - Koren, Bell, Volinsky, IEEE2009 recommenderlab: A Framework for Developing and Testing Recommendation Algorithms - Hahsler, 2001 Scalable Similarity-Based Neighbourhood Methods with MapReduce - Schelter, Boden, Markl, RecSys2012Monday, January 14, 13
  33. 33. Thanks! Nimrod Priell nimrod.priell@gmail.com @nimrodpriell http://www.educated-guess.comMonday, January 14, 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×