Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Matrix Factorization Techniques For... by Lei Guo 25289 views
- Matrix Factorization In Recommender... by YONG ZHENG 4214 views
- Introduction to Matrix Factorizatio... by DKALab 17678 views
- Simple Matrix Factorization for Rec... by Data Science London 13840 views
- Beginners Guide to Non-Negative Mat... by Benjamin Bengfort 10141 views
- Algorithmic Music Recommendations a... by Chris Johnson 65959 views

16,193 views

Published on

No Downloads

Total views

16,193

On SlideShare

0

From Embeds

0

Number of Embeds

36

Shares

0

Downloads

417

Comments

0

Likes

26

No embeds

No notes for slide

- 1. AIM3 – Scalable Data Analysis and Data Mining 11 – Latent factor models for Collaborative Filtering Sebastian Schelter, Christoph Boden, Volker Markl Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin20.06.2012 http://www.dima.tu-berlin.de/ DIMA – TU Berlin 1
- 2. Recap: Item-Based Collaborative FilteringItembased Collaborative Filtering • compute pairwise similarities of the columns of the rating matrix using some similarity measure • store top 20 to 50 most similar items per item in the item-similarity matrix • prediction: use a weighted sum over all items similar to the unknown item that have been rated by the current user p ui = j S ( i , u ) s ij ruj j S ( i , u ) s ij 20.06.2012 DIMA – TU Berlin 2
- 3. Drawbacks of similarity-based neighborhood methods • the assumption that a rating is defined by all the users ratings for commonly co-rated items is hard to justify in general • lack of bias correction • every co-rated item is looked at in isolation, say a movie was similar to „Lord of the Rings“, do we want each part to of the trilogy to contribute as a single similar item? • best choice of similarity measure is based on experimentation not on mathematical reasons20.06.2012 DIMA – TU Berlin 3
- 4. Latent factor models■ Idea • ratings are deeply influenced by a set of factors that are very specific to the domain (e.g. amount of action in movies, complexity of characters) • these factors are in general not obvious, we might be able to think of some of them but its hard to estimate their impact on the ratings • the goal is to infer those so called latent factors from the rating data by using mathematical techniques 20.06.2012 DIMA – TU Berlin 4
- 5. Latent factor models■ Approach • users and items are characterized by latent n f factors, each user and item is mapped onto ui ,m j R a latent feature space • each rating is approximated by the dot T rij m j u i product of the user feature vector and the item feature vector • prediction of unknown ratings also uses this dot product • squared error as a measure of loss r ij T m j ui 2 20.06.2012 DIMA – TU Berlin 5
- 6. Latent factor models■ Approach • decomposition of the rating matrix into the product of a user feature and an item feature matrix • row in U: vector of a users affinity to the features • row in M: vector of an items relation to the features • closely related to Singular Value Decomposition which produces an optimal low-rank optimization of a matrix MT R ≈ U 20.06.2012 DIMA – TU Berlin 6
- 7. Latent factor models■ Properties of the decomposition • automatically ranks features by their „impact“ on the ratings • features might not necessarily be intuitively understandable 20.06.2012 DIMA – TU Berlin 7
- 8. Latent factor models■ Problematic situation with explicit feedback data • the rating matrix is not only sparse, but partially defined, missing entries cannot be interpreted as 0 they are just unknown • standard decomposition algorithms like Lanczos method for SVD are not applicableSolution • decomposition has to be done using the known ratings only • find the set of user and item feature vectors that minimizes the squared error to the known ratings r m j ui T 2 min U, M i, j 20.06.2012 DIMA – TU Berlin 8
- 9. Latent factor models■ quality of the decomposition is not measured with respect to the reconstruction error to the original data, but with respect to the generalization to unseen data■ regularization necessary to avoid overfitting■ model has hyperparameters (regularization, learning rate) that need to be chosen■ process: split data into training, test and validation set □ train model using the training set □ choose hyperparameters according to performance on the test set □ evaluate generalization on the validation set □ ensure that each datapoint is used in each set once (cross-validation) 20.06.2012 DIMA – TU Berlin 9
- 10. Stochastic Gradient Descent • add a regularizarion term min U, M r i, j T m j ui 2 + λ ui 2 + m j 2 • loop through all ratings in the training set, compute associated prediction error T e ui = rij m j u i • modify parameters in the opposite direction of the gradient u i u i + γ e u, i m j λu i m j m j + γ e u, i u i λm j • problem: approach is inherently sequential (although recent research might have unveiled a parallelization technique)20.06.2012 DIMA – TU Berlin 10
- 11. Alternating Least Squares with Weighted λ-Regularization■ Model • feature matrices are modeled directly by using only the observed ratings • add a regularization term to avoid overfitting • minimize regularized error of: f U, M = r ij m j ui + λ T 2 n u i ui 2 + nm j m j 2 Solving technique • fixing one of the unknown variable to make this a simple quadratic equation • rotate between fixing u and m until convergence („Alternating Least Squares“) 20.06.2012 DIMA – TU Berlin 11
- 12. ALS-WR is scalable■ Which properties make this approach scalable? • all the features in one iteration can be computed independently of each other • only a small portion of the data necessary to compute a feature vectorParallelization with Map/Reduce • Computing user feature vectors: the mappers need to send each users rating vector and the feature vectors of his/her rated items to the same reducer • Computing item feature vectors: the mappers need to send each items rating vector and the feature vectors of users who rated it to the same reducer 20.06.2012 DIMA – TU Berlin 12
- 13. Incorporating biases■ Problem: explicit feedback data is highly biased □ some users tend to rate more extreme than others □ some items tend to get higher ratings than others■ Solution: explicitly model biases □ the bias of a rating is model as a combination of the items average rating, the item bias and the user bias b ij b i b j □ the rating bias can be incorporated into the prediction rij b i b j m j u i T ˆ 20.06.2012 DIMA – TU Berlin 13
- 14. Latent factor models■ implicit feedback data is very different from explicit data! □ e.g. use the number of clicks on a product page of an online shop □ the whole matrix is defined! □ no negative feedback □ interactions that did not happen produce zero values □ however we should have only little confidence in these (maybe the user never had the chance to interact with these items) □ using standard decomposition techniques like SVD would give us a decomposition that is biased towards the zero entries, again not applicable 20.06.2012 DIMA – TU Berlin 14
- 15. Latent factor models■ Solution for working with implicit data: weighted matrix factorization 1 rij 0■ create a binary preference matrix P p ij 0 rij 0 ■ each entry in this matrix can be weighted by a confidence function □ zero values should get low confidence c ( i , j ) 1 rij □ values that are based on a lot of interactions should get high confidence■ confidence is incorporated into the model □ the factorization will ‚prefer‘ more confident values f U, M = T c ( i , j ) p ij m j u i 2 + λ ui 2 + m j 2 20.06.2012 DIMA – TU Berlin 15
- 16. Sources • Sarwar et al.: „Item-Based Collaborative Filtering Recommendation Algorithms“, 2001 • Koren et al.: „Matrix Factorization Techniques for Recommender Systems“, 2009 • Funk: „Netflix Update: Try This at Home“, http://sifter.org/~simon/journal/20061211.html, 2006 • Zhou et al.: „Large-scale Parallel Collaborative Filtering for the Netflix Prize“, 2008 • Hu et al.: „Collaborative Filtering for Implicit Feedback Datasets“, 200820.06.2012 DIMA – TU Berlin 16

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment