SVD and the Netflix Dataset

20,800 views
20,321 views

Published on

Short summary and explanation of LSI (SVD) and how it can be applied to recommendation systems and the Netflix dataset in particular.

Published in: Technology, Business
1 Comment
29 Likes
Statistics
Notes
  • [FRESH RELEASE HD MOVIE] Disconnect (2013) Only Copy and Paste This 'Link' xxxxo=>>> http://movizones.com/play.php?movie=1433811
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
20,800
On SlideShare
0
From Embeds
0
Number of Embeds
237
Actions
Shares
0
Downloads
449
Comments
1
Likes
29
Embeds 0
No embeds

No notes for slide

SVD and the Netflix Dataset

  1. 1. SVD Applied to Collaborative Filtering ~ URUG 7-12-07 ~
  2. 2. Recommendation System
  3. 3. Recommendation System Answers the question: What do I want next?!?
  4. 4. Recommendation System Answers the question: What do I want next?!? Very consumer driven. Must provide good results or a user may not trust the system in the future.
  5. 5. Collaborative Filtering Base user recommendations off of: User’s past history. History of like-minded users. View data as product X user matrix. Find a “neighborhood” of similar users for that user. Return the top-N recommendations.
  6. 6. Early Approaches Goldberg, et. al. (1992), Using collaborative filtering to weave an information tapestry Konstan, J., el. at (1997), Applying Collaborative Filtering to Usenet news. Use Pearson Correlation or cosine similarity as a measure of similarity to form neighborhoods.
  7. 7. Early CF Challenges
  8. 8. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs.
  9. 9. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users.
  10. 10. Early CF Challenges Sparsity - No correlation between users can be found. Reduced coverage occurs. Scalability - Nearest neighbor algorithms computation time grows with the number of products and users. Synonymy
  11. 11. Dimensionality Reduction
  12. 12. Dimensionality Reduction Latent Semantic Indexing (LSI)
  13. 13. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.)
  14. 14. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets.
  15. 15. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships.
  16. 16. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  17. 17. Dimensionality Reduction Latent Semantic Indexing (LSI) Algorithm from IR community (late 80s-early 90s.) Addresses the problems of synonymy, polysemy, sparsity, and scalability for large datasets. Reduces dimensionality of a dataset and captures the latent relationships. Easily maps to CF!
  18. 18. Framing LSI for CF Products X Users matrix instead of Terms X Documents. Netflix Dataset 480,189 users, 17,770 movies, only ~100 milion ratings. 17,770 X 480,189 matrix that is 99% sparse! About 8.5 billion potential ratings.
  19. 19. SVD- The math behind LSI Singular Value Decomposition For any M x N matrix A of rank r, it can decomposed as: T A = UΣV U is a M x M orthogonal matrix. V is a N X N orthogonal matrix. Σ is a M x N diagonal matrix whose first r diagonal entries are the nonzero singular values of A. σ1 ≥ σ2 ... ≥ σr > σr+1 = ... = σn = 0
  20. 20. Related to eigenvalue decomposition (PCA) U is the orthornormal eigenspace of AA^T. Spans the “column space”, known as left singular vectors. V is the orthornormal eigenspace of A^TA. Spans “row space”. Right vectors. Singular values are the square roots of the eigenvalues.
  21. 21. Reducing Dimensionality T Ak = Uk ΣkVk A_k is the closest approximation to A. A_k minimizes the Frobenius norm over all rank-k matrices: ||A − Ak ||F
  22. 22. Making Recommendations Cosine Similarity- common way to find neighborhood. i· j cos(i, j) = ||i||2 ∗ || j||2 Somehow base recommendations off of that neighborhood and its users. Can also make predictions of products with a simple dot product if the singular values are combined with the singular vectors. 1/2 1/2 T CPprod = Cavg +Uk Sk (c) · Sk Vk (p)
  23. 23. Challenges with SVD Scalability - Once again, compute time grows with the number of users and products. O(m^3) Offline stage. Online stage. Even doing the SVD computation offline is not possible for large datasets. Other methods are needed.
  24. 24. Incremental SVD T uk = u Vk Σk −1
  25. 25. Incremental SVD Results
  26. 26. GHA for SVD Gorrell (2006),GHA for Incremental SVD in NLP Based off of Sanger’s (1989) GHA for eigen decomposition. a ∆ci b = ci · b(x − ∑ a a (a · c j )c j ) j<i b ∆ci a = ci · a(b − ∑ b b (b · c j )c j ) j<i
  27. 27. GHA extended by Funk void train(int user, int movie, real rating) { real err = lrate * (rating - predictRating(movie, user)); userValue[user] += err * movieValue[movie]; movieValue[movie] += err * userValue[user]; }
  28. 28. Netflix Results Best RMSEs 0.9283 0.9212 Blended to get 0.9189, 3.42% better than Netflix.
  29. 29. Summary SVD provides an elegant and automatic recommendation system that has the potential to scale. There are many different algorithms to calculate or at least approximate SVD which can be used in offline stages for websites that need to have CF. Every dataset is different and requires experimentation with to get the best results.

×