Data-driven modeling: Lecture 09

5,827 views

Published on

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,827
On SlideShare
0
From Embeds
0
Number of Embeds
4,093
Actions
Shares
0
Downloads
39
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Data-driven modeling: Lecture 09

  1. 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University April 2, 2012Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 1 / 30
  2. 2. Personalized recommendationsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 2 / 30
  3. 3. Personalized recommendationsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 3 / 30
  4. 4. http://netflixprize.comJake Hofman (Columbia University) Data-driven modeling April 2, 2012 4 / 30
  5. 5. http://netflixprize.com/rulesJake Hofman (Columbia University) Data-driven modeling April 2, 2012 5 / 30
  6. 6. http://netflixprize.com/faqJake Hofman (Columbia University) Data-driven modeling April 2, 2012 6 / 30
  7. 7. Netflix prize: results http://en.wikipedia.org/wiki/Netflix_PrizeJake Hofman (Columbia University) Data-driven modeling April 2, 2012 7 / 30
  8. 8. Netflix prize: results See [TJB09] and [Kor09] for more gory details.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 8 / 30
  9. 9. Recommendation systems High-level approaches: • Content-based methods (e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7) • Collaborative methods (e.g., “Users who liked this also liked”)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 9 / 30
  10. 10. Netflix prize: data (userid, movieid, rating, date)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
  11. 11. Netflix prize: data (movieid, year, title)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
  12. 12. Recommendation systems High-level approaches: • Content-based methods (e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7) • Collaborative methods (e.g., “Users who liked this also liked”)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 11 / 30
  13. 13. Collaborative filtering Memory-based Model-based (e.g., k-nearest neighbors) (e.g., matrix factorization) http://research.yahoo.com/pub/2859Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 12 / 30
  14. 14. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ?Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
  15. 15. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ? • Make recommendations of several forms • Predict unseen item ratings for a particular user • Suggest items for a particular user • Suggest items similar to a particular item • ...Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
  16. 16. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ? • Make recommendations of several forms • Predict unseen item ratings for a particular user • Suggest items for a particular user • Suggest items similar to a particular item • ... • Compare to natural baselines • Guess global average for item ratings • Suggest globally popular itemsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
  17. 17. k-nearest neighbors Key intuition: Take a local popularity vote amongst “similar” usersJake Hofman (Columbia University) Data-driven modeling April 2, 2012 14 / 30
  18. 18. k-nearest neighbors User similarity Quantify similarity as a function of users’ past ratings, e.g. • Fraction of items u and v have in common |ru ∩ rv | i Rui Rvi Suv = = (1) |ru ∪ rv | i (Rui + Rvi − Rui Rvi ) Retain top-k most similar neighbors v for each user uJake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
  19. 19. k-nearest neighbors User similarity Quantify similarity as a function of users’ past ratings, e.g. • Angle between rating vectors ru · rv i Rui Rvi Suv = = (1) |ru | |rv | 2 2 i Rui j Rvj Retain top-k most similar neighbors v for each user uJake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
  20. 20. k-nearest neighbors Predicted ratings ˆ Predict unseen ratings Rui as a weighted vote over u’s neighbors’ ratings for item i ˆ Rvi Suv Rui = v (2) v SuvJake Hofman (Columbia University) Data-driven modeling April 2, 2012 16 / 30
  21. 21. k-nearest neighbors Practical notes We expect most users have nothing in common, so calculate similarities as: for each item i: for all pairs of users u, v that have rated i: calculate Suv (if not already calculated)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
  22. 22. k-nearest neighbors Practical notes Alternatively, we can make recommendations using an item-based approach [LSY03]: • Compute similarities Sij between all pairs of items ˆ • Predict ratings with a weighted vote Rui = Ruj Sij / Sij j jJake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
  23. 23. k-nearest neighbors Practical notes Several (relatively) simple ways to scale: • Sample a subset of ratings for each user (by, e.g., recency) • Use MinHash to cluster users [DDGR07] • Distribute calculations with MapReduceJake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
  24. 24. Matrix factorization Key intuition: Model item attributes as belonging to a set of unobserved “topics and user preferences across these “topics”Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 18 / 30
  25. 25. Matrix factorization Linear model Start with a simple linear model: ˆ Rui = b0 + bu + bi (3) global average user bias item biasJake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
  26. 26. Matrix factorization Linear model For example, we might predict that a harsh critic would score a popular movie as ˆ Rui = 3.6 + −0.5 + 0.8 (3) global average user bias item bias = 3.9 (4)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
  27. 27. Matrix factorization Low-rank approximation Add an interaction term: ˆ Rui = b0 + bu + bi + Wui (5) global average user bias item bias user-item interaction where Wui = pu · qi = k Puk Qik • Puk is user u’s preference for topic k • Qik is item i’s association with topic kJake Hofman (Columbia University) Data-driven modeling April 2, 2012 20 / 30
  28. 28. Matrix factorization Loss function Measure quality of model fit with squared-loss: 2 L = ˆ Rui − Rui (6) (u,i) 2 = PQ T − Rui (7) ui (u,i)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 21 / 30
  29. 29. Matrix factorization Optimization The loss is non-convex in (P, Q), so no global minimum exists Instead we can optimize L iteratively, e.g.: • Alternating least squares: update each row of P, holding Q fixed, and vice-versa • Stochastic gradient descent: update individual rows pu and qi for each observed RuiJake Hofman (Columbia University) Data-driven modeling April 2, 2012 22 / 30
  30. 30. Matrix factorization Alternating least squares L is convex in rows of P with Q fixed, and Q with P fixed, so alternate solutions to the normal equations: −1 pu = Q(u)T Q(u) Q(u)T r (u) (8) −1 qi = P(i)T P(i) P(i)T r (i) (9) where: • Q(u) is the item association matrix restricted to items rated by user u • P(i) is the user preference matrix restricted to users that have rated item i • r (u) are ratings by user u and r (i) are ratings on item iJake Hofman (Columbia University) Data-driven modeling April 2, 2012 23 / 30
  31. 31. Matrix factorization Stochastic gradient descent Alternatively, we can avoid inverting matrices by taking steps in the direction of the negative gradient for each observed rating: ∂L ˆ pu ← pu − η = pu + Rui − Rui qi (10) ∂pu ∂L ˆ qi ← qi − η = qi + Rui − Rui pu (11) ∂qi for some step-size ηJake Hofman (Columbia University) Data-driven modeling April 2, 2012 24 / 30
  32. 32. Matrix factorization Practical notes Several ways to scale: • Distribute matrix operations with MapReduce [GHNS11] • Parallelize stochastic gradient descent [ZWSL10] • Expectation-maximization for pLSI with MapReduce [DDGR07]Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 25 / 30
  33. 33. Datasets • Movielens http://www.grouplens.org/node/12 • Reddit http://bit.ly/redditdata • CU “million songs” http://labrosa.ee.columbia.edu/millionsong/ • Yahoo Music KDDcup http://kddcup.yahoo.com/ • AudioScrobbler http://bit.ly/audioscrobblerdata • Delicious http://bit.ly/deliciousdata • ...Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 26 / 30
  34. 34. Photo recommendations http://koala.sandbox.yahoo.comJake Hofman (Columbia University) Data-driven modeling April 2, 2012 27 / 30
  35. 35. References I AS Das, M Datar, A Garg, and S Rajaram. Google news personalization: scalable online collaborative filtering. page 280, 2007. R Gemulla, PJ Haas, E Nijkamp, and Y Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. 2011. Yehuda Koren. The bellkor solution to the netflix grand prize. pages 1–10, Aug 2009. G Linden, B Smith, and J York. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet computing, 7(1):76–80, 2003.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 28 / 30
  36. 36. References II A Toscher, M Jahrer, and RM Bell. The bigchaos solution to the netflix grand prize. 2009. M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Neural Information Processing Systems (NIPS), 2010.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 29 / 30

×