Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Simple Matrix Factorization for Rec... by Data Science London 13105 views
- Matrix Factorization Techniques For... by Lei Guo 19035 views
- Collaborative Filtering with Spark by Chris Johnson 27763 views
- Introduction to Matrix Factorizatio... by DKALab 16179 views
- Matrix Factorization In Recommender... by YONG ZHENG 2800 views
- Latent factor models for Collaborat... by sscdotopen 13233 views

22,852 views

22,280 views

22,280 views

Published on

Published in:
Technology

No Downloads

Total views

22,852

On SlideShare

0

From Embeds

0

Number of Embeds

452

Shares

0

Downloads

517

Comments

0

Likes

61

No embeds

No notes for slide

- 1. Big, Practical Recommendationswith Alternating Least SquaresSean Owen • Apache Mahout / Myrrix.com
- 2. WHERE’S BIG LEARNING? Next: Application Layer Analytics Machine Learning Applications Like Apache Mahout Common Big Data app today Processing Clustering, recommenders, classifiers on Hadoop Database Free, open source; not mature Where’s commercialized Storage Big Learning?
- 3. A RECOMMENDER SHOULD … Answer in Real-time Accept Diverse Input Ingest new data, now Not just people and products Modify recommendations based Not just explicit ratings on newest data Clicks, views, buys No “cold start” for new data Side information Scale Horizontally Be “Pretty Accurate” For queries per second For size of data set
- 4. NEED: 2-TIER ARCHITECTURE Real-time Serving Layer Quick results based on precomputed model Incremental update Partitionable for scale Batch Computation Layer Builds model Scales out (on Hadoop?) Asynchronous, occasional, long-lived runs
- 5. A PRACTICAL ALGORITHMMATRIX FACTORIZATION BENEFITS Factor user-item matrix to Models intuition user-feature + feature-item Factorization is batch matrix parallelizable Well understood in ML, as: Reconstruction (recs) in Principal Component Analysis low-dimension is fast Latent Semantic Indexing Allows projection of new data Several algorithms, like: Cold start solution Singular Value Decomposition Approximate update solution Alternating Least Squares
- 6. A PRACTICAL IMPLEMENTATIONALTERNATING LEASTSQUARES BENEFITS Simple factorization P ≈ X YT Parallelizable by row -- Approximate: X, Y are very Hadoop-friendly “skinny” (low-rank) Iterative: OK answer fast, Faster than the SVD refine as long as desired Trivially parallel, iterative Yields to “binary” input model Dumber than the SVD Ratings as regularization instead No singular values, Sparseness / 0s no longer a orthonormal basis problem
- 7. ALS ALGORITHM 1 Input: (user, item, strength) 1 4 3 tuples 3 Anything you can quantify is input 4 3 2 Strength is positive 5 2 3 Many tuples per user-item 5 R is sparse user-item 2 4 R interaction matrix rij = total strength of interaction between user i and item j
- 8. ALS ALGORITHM 2 Follow “Collaborative 1 1 1 0 0 Filtering for Implicit 0 0 1 0 0 Feedback Datasets” www2.research.att.com/~yifanhu/PUB/cf. 0 1 0 1 1 pdf 1 0 1 0 1 Construct “binary” matrix P 0 0 0 1 0 1 where R > 0 1 1 0 0 0 P 0 where R = 0 Factor P, not R R returns in regularization Still sparse; implicit 0s fine
- 9. ALS ALGORITHM 3 P is m x n Choose k << m, n Factor P as Q = X YT, Q ≈ P X is m x k ; YT is k x n YT Find best approximation Q Minimize L2 norm of diff: || P-Q X ||2 Minimal squared error: “Least Squares” Recommendations are largest values in Q
- 10. ALS ALGORITHM 4 Optimizing X, Y simultaneously is non- convex, hard If X or Y are fixed, system of YT linear equations: convex, easy Initialize Y with random X values Solve for X Fix X, solve for Y Repeat (“Alternating”)
- 11. ALS ALGORITHM 5 Define regularization weights cui = 1 + α rui Minimize: Σ cui(pui – xuTyi)2 + λ(Σ||xu||2 + Σ||yi||2) Simple least-squares regression objective, plus Weighted least-squared error terms by strength, a penalty for not reconstructing 1 at “strong” association is higher Standard L2 regularization term
- 12. ALS ALGORITHM 6 With fixed Y, compute optimal X Each row xu is independent Define Cu as diagonal matrix of cu (user strength weights) xu = (YTCuY + λI)-1 YTCupu Compare to simple least-squares regression solution (YTY)-1 YTpu Adds Tikhonov / ridge regression regularization term λI Attaches cu weights to YT See paper for how YTCuY is computed efficiently; skipping the engineering!
- 13. EXAMPLE FACTORIZATION k = 3, λ = 2, α = 40, 10 iterations 0.96 0.99 0.99 0.38 0.93 1 1 1 0 0 0.44 0.39 0.98 -0.11 0.39 0 0 1 0 0 ≈ 0.70 0.99 0.42 0.98 0.98 0 1 0 1 1 1 0 1 0 1 1.00 1.04 0.99 0.44 0.98 Q = X•YT 0.11 0.51 -0.13 1.00 0.57 0 0 0 1 0 0.97 1.00 0.68 0.47 0.91 1 1 0 0 0
- 14. FOLD-IN Need immediate, if Note (YTY)(YTY)-1 = I approximate, updates for Gives YT’s right inverse: new data YT (Y(YTY)-1) = I New user u needs new row Xu = Qu Y(YTY)-1 Qu = Xu YT Xu ≈ Pu Y(YTY)-1 We have Pu ≈ Qu Recommend as usual: Compute Xu via right inverse: Qu = XuYT X YT(YT)-1 = Q(YT)-1 so: For existing user, instead X = Q(YT)-1 add to existing row Xu What is (YT)-1?
- 15. THIS IS MYRRIX Soft-launched Serving Layer available as open source download Computation Layer available as beta Ready on Amazon EC2 / EMR srowen@myrrix.com Full launch Q4 2012 myrrix.com
- 16. APPENDIX
- 17. EXAMPLESSTACKOVERFLOW TAGS WIKIPEDIA LINKS Recommend tags to Recommend new linked questions articles from existing links Tag questions automatically, Propose missing, related improve tag coverage links 3.5M questions x 30K tags 2.5M articles x 1.8M articles 4.3 hours x 5 machines on 28 hours x 2 PCs on Amazon EMR Apache Hadoop 1.0.3 $3.03 ≈ $0.08 per 100,000 recs

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment