Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Data-driven Modeling: Lecture 03 by jakehofman 10756 views
- Data-driven modeling: Lecture 10 by jakehofman 5482 views
- Using Data to Understand the Brain by jakehofman 6831 views
- Large-scale social media analysis w... by jakehofman 22679 views
- Dynamic and Static Modeling by Saurabh Kumar 46470 views
- Castles by filipj2000 700 views

5,827 views

Published on

No Downloads

Total views

5,827

On SlideShare

0

From Embeds

0

Number of Embeds

4,093

Shares

0

Downloads

39

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Data-driven modeling APAM E4990 Jake Hofman Columbia University April 2, 2012Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 1 / 30
- 2. Personalized recommendationsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 2 / 30
- 3. Personalized recommendationsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 3 / 30
- 4. http://netﬂixprize.comJake Hofman (Columbia University) Data-driven modeling April 2, 2012 4 / 30
- 5. http://netﬂixprize.com/rulesJake Hofman (Columbia University) Data-driven modeling April 2, 2012 5 / 30
- 6. http://netﬂixprize.com/faqJake Hofman (Columbia University) Data-driven modeling April 2, 2012 6 / 30
- 7. Netﬂix prize: results http://en.wikipedia.org/wiki/Netflix_PrizeJake Hofman (Columbia University) Data-driven modeling April 2, 2012 7 / 30
- 8. Netﬂix prize: results See [TJB09] and [Kor09] for more gory details.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 8 / 30
- 9. Recommendation systems High-level approaches: • Content-based methods (e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7) • Collaborative methods (e.g., “Users who liked this also liked”)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 9 / 30
- 10. Netﬂix prize: data (userid, movieid, rating, date)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
- 11. Netﬂix prize: data (movieid, year, title)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 10 / 30
- 12. Recommendation systems High-level approaches: • Content-based methods (e.g., wgenre: thrillers = +2.3, wdirector: coen brothers = +1.7) • Collaborative methods (e.g., “Users who liked this also liked”)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 11 / 30
- 13. Collaborative ﬁltering Memory-based Model-based (e.g., k-nearest neighbors) (e.g., matrix factorization) http://research.yahoo.com/pub/2859Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 12 / 30
- 14. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ?Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
- 15. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ? • Make recommendations of several forms • Predict unseen item ratings for a particular user • Suggest items for a particular user • Suggest items similar to a particular item • ...Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
- 16. Problem statement • Given a set of past ratings Rui that user u gave item i • Users may explicitly assign ratings, e.g., Rui ∈ [1, 5] is number of stars for movie rating • Or we may infer implicit ratings from user actions, e.g. Rui = 1 if u purchased i; otherwise Rui = ? • Make recommendations of several forms • Predict unseen item ratings for a particular user • Suggest items for a particular user • Suggest items similar to a particular item • ... • Compare to natural baselines • Guess global average for item ratings • Suggest globally popular itemsJake Hofman (Columbia University) Data-driven modeling April 2, 2012 13 / 30
- 17. k-nearest neighbors Key intuition: Take a local popularity vote amongst “similar” usersJake Hofman (Columbia University) Data-driven modeling April 2, 2012 14 / 30
- 18. k-nearest neighbors User similarity Quantify similarity as a function of users’ past ratings, e.g. • Fraction of items u and v have in common |ru ∩ rv | i Rui Rvi Suv = = (1) |ru ∪ rv | i (Rui + Rvi − Rui Rvi ) Retain top-k most similar neighbors v for each user uJake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
- 19. k-nearest neighbors User similarity Quantify similarity as a function of users’ past ratings, e.g. • Angle between rating vectors ru · rv i Rui Rvi Suv = = (1) |ru | |rv | 2 2 i Rui j Rvj Retain top-k most similar neighbors v for each user uJake Hofman (Columbia University) Data-driven modeling April 2, 2012 15 / 30
- 20. k-nearest neighbors Predicted ratings ˆ Predict unseen ratings Rui as a weighted vote over u’s neighbors’ ratings for item i ˆ Rvi Suv Rui = v (2) v SuvJake Hofman (Columbia University) Data-driven modeling April 2, 2012 16 / 30
- 21. k-nearest neighbors Practical notes We expect most users have nothing in common, so calculate similarities as: for each item i: for all pairs of users u, v that have rated i: calculate Suv (if not already calculated)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
- 22. k-nearest neighbors Practical notes Alternatively, we can make recommendations using an item-based approach [LSY03]: • Compute similarities Sij between all pairs of items ˆ • Predict ratings with a weighted vote Rui = Ruj Sij / Sij j jJake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
- 23. k-nearest neighbors Practical notes Several (relatively) simple ways to scale: • Sample a subset of ratings for each user (by, e.g., recency) • Use MinHash to cluster users [DDGR07] • Distribute calculations with MapReduceJake Hofman (Columbia University) Data-driven modeling April 2, 2012 17 / 30
- 24. Matrix factorization Key intuition: Model item attributes as belonging to a set of unobserved “topics and user preferences across these “topics”Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 18 / 30
- 25. Matrix factorization Linear model Start with a simple linear model: ˆ Rui = b0 + bu + bi (3) global average user bias item biasJake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
- 26. Matrix factorization Linear model For example, we might predict that a harsh critic would score a popular movie as ˆ Rui = 3.6 + −0.5 + 0.8 (3) global average user bias item bias = 3.9 (4)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 19 / 30
- 27. Matrix factorization Low-rank approximation Add an interaction term: ˆ Rui = b0 + bu + bi + Wui (5) global average user bias item bias user-item interaction where Wui = pu · qi = k Puk Qik • Puk is user u’s preference for topic k • Qik is item i’s association with topic kJake Hofman (Columbia University) Data-driven modeling April 2, 2012 20 / 30
- 28. Matrix factorization Loss function Measure quality of model ﬁt with squared-loss: 2 L = ˆ Rui − Rui (6) (u,i) 2 = PQ T − Rui (7) ui (u,i)Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 21 / 30
- 29. Matrix factorization Optimization The loss is non-convex in (P, Q), so no global minimum exists Instead we can optimize L iteratively, e.g.: • Alternating least squares: update each row of P, holding Q ﬁxed, and vice-versa • Stochastic gradient descent: update individual rows pu and qi for each observed RuiJake Hofman (Columbia University) Data-driven modeling April 2, 2012 22 / 30
- 30. Matrix factorization Alternating least squares L is convex in rows of P with Q ﬁxed, and Q with P ﬁxed, so alternate solutions to the normal equations: −1 pu = Q(u)T Q(u) Q(u)T r (u) (8) −1 qi = P(i)T P(i) P(i)T r (i) (9) where: • Q(u) is the item association matrix restricted to items rated by user u • P(i) is the user preference matrix restricted to users that have rated item i • r (u) are ratings by user u and r (i) are ratings on item iJake Hofman (Columbia University) Data-driven modeling April 2, 2012 23 / 30
- 31. Matrix factorization Stochastic gradient descent Alternatively, we can avoid inverting matrices by taking steps in the direction of the negative gradient for each observed rating: ∂L ˆ pu ← pu − η = pu + Rui − Rui qi (10) ∂pu ∂L ˆ qi ← qi − η = qi + Rui − Rui pu (11) ∂qi for some step-size ηJake Hofman (Columbia University) Data-driven modeling April 2, 2012 24 / 30
- 32. Matrix factorization Practical notes Several ways to scale: • Distribute matrix operations with MapReduce [GHNS11] • Parallelize stochastic gradient descent [ZWSL10] • Expectation-maximization for pLSI with MapReduce [DDGR07]Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 25 / 30
- 33. Datasets • Movielens http://www.grouplens.org/node/12 • Reddit http://bit.ly/redditdata • CU “million songs” http://labrosa.ee.columbia.edu/millionsong/ • Yahoo Music KDDcup http://kddcup.yahoo.com/ • AudioScrobbler http://bit.ly/audioscrobblerdata • Delicious http://bit.ly/deliciousdata • ...Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 26 / 30
- 34. Photo recommendations http://koala.sandbox.yahoo.comJake Hofman (Columbia University) Data-driven modeling April 2, 2012 27 / 30
- 35. References I AS Das, M Datar, A Garg, and S Rajaram. Google news personalization: scalable online collaborative ﬁltering. page 280, 2007. R Gemulla, PJ Haas, E Nijkamp, and Y Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. 2011. Yehuda Koren. The bellkor solution to the netﬂix grand prize. pages 1–10, Aug 2009. G Linden, B Smith, and J York. Amazon. com recommendations: Item-to-item collaborative ﬁltering. IEEE Internet computing, 7(1):76–80, 2003.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 28 / 30
- 36. References II A Toscher, M Jahrer, and RM Bell. The bigchaos solution to the netﬂix grand prize. 2009. M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. In Neural Information Processing Systems (NIPS), 2010.Jake Hofman (Columbia University) Data-driven modeling April 2, 2012 29 / 30

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment