View stunning SlideShares in full-screen with the new iOS app!Introducing SlideShare for AndroidExplore all your favorite topics in the SlideShare appGet the SlideShare app to Save for Later — even offline
View stunning SlideShares in full-screen with the new Android app!View stunning SlideShares in full-screen with the new iOS app!
1.
AIM3 – Scalable Data Analysis and Data Mining 11 – Latent factor models for Collaborative Filtering Sebastian Schelter, Christoph Boden, Volker Markl Fachgebiet Datenbanksysteme und Informationsmanagement Technische Universität Berlin20.06.2012 http://www.dima.tu-berlin.de/ DIMA – TU Berlin 1
2.
Recap: Item-Based Collaborative FilteringItembased Collaborative Filtering • compute pairwise similarities of the columns of the rating matrix using some similarity measure • store top 20 to 50 most similar items per item in the item-similarity matrix • prediction: use a weighted sum over all items similar to the unknown item that have been rated by the current user p ui = j S ( i , u ) s ij ruj j S ( i , u ) s ij 20.06.2012 DIMA – TU Berlin 2
3.
Drawbacks of similarity-based neighborhood methods • the assumption that a rating is defined by all the users ratings for commonly co-rated items is hard to justify in general • lack of bias correction • every co-rated item is looked at in isolation, say a movie was similar to „Lord of the Rings“, do we want each part to of the trilogy to contribute as a single similar item? • best choice of similarity measure is based on experimentation not on mathematical reasons20.06.2012 DIMA – TU Berlin 3
4.
Latent factor models■ Idea • ratings are deeply influenced by a set of factors that are very specific to the domain (e.g. amount of action in movies, complexity of characters) • these factors are in general not obvious, we might be able to think of some of them but its hard to estimate their impact on the ratings • the goal is to infer those so called latent factors from the rating data by using mathematical techniques 20.06.2012 DIMA – TU Berlin 4
5.
Latent factor models■ Approach • users and items are characterized by latent n f factors, each user and item is mapped onto ui ,m j R a latent feature space • each rating is approximated by the dot T rij m j u i product of the user feature vector and the item feature vector • prediction of unknown ratings also uses this dot product • squared error as a measure of loss r ij T m j ui 2 20.06.2012 DIMA – TU Berlin 5
6.
Latent factor models■ Approach • decomposition of the rating matrix into the product of a user feature and an item feature matrix • row in U: vector of a users affinity to the features • row in M: vector of an items relation to the features • closely related to Singular Value Decomposition which produces an optimal low-rank optimization of a matrix MT R ≈ U 20.06.2012 DIMA – TU Berlin 6
7.
Latent factor models■ Properties of the decomposition • automatically ranks features by their „impact“ on the ratings • features might not necessarily be intuitively understandable 20.06.2012 DIMA – TU Berlin 7
8.
Latent factor models■ Problematic situation with explicit feedback data • the rating matrix is not only sparse, but partially defined, missing entries cannot be interpreted as 0 they are just unknown • standard decomposition algorithms like Lanczos method for SVD are not applicableSolution • decomposition has to be done using the known ratings only • find the set of user and item feature vectors that minimizes the squared error to the known ratings r m j ui T 2 min U, M i, j 20.06.2012 DIMA – TU Berlin 8
9.
Latent factor models■ quality of the decomposition is not measured with respect to the reconstruction error to the original data, but with respect to the generalization to unseen data■ regularization necessary to avoid overfitting■ model has hyperparameters (regularization, learning rate) that need to be chosen■ process: split data into training, test and validation set □ train model using the training set □ choose hyperparameters according to performance on the test set □ evaluate generalization on the validation set □ ensure that each datapoint is used in each set once (cross-validation) 20.06.2012 DIMA – TU Berlin 9
10.
Stochastic Gradient Descent • add a regularizarion term min U, M r i, j T m j ui 2 + λ ui 2 + m j 2 • loop through all ratings in the training set, compute associated prediction error T e ui = rij m j u i • modify parameters in the opposite direction of the gradient u i u i + γ e u, i m j λu i m j m j + γ e u, i u i λm j • problem: approach is inherently sequential (although recent research might have unveiled a parallelization technique)20.06.2012 DIMA – TU Berlin 10
11.
Alternating Least Squares with Weighted λ-Regularization■ Model • feature matrices are modeled directly by using only the observed ratings • add a regularization term to avoid overfitting • minimize regularized error of: f U, M = r ij m j ui + λ T 2 n u i ui 2 + nm j m j 2 Solving technique • fixing one of the unknown variable to make this a simple quadratic equation • rotate between fixing u and m until convergence („Alternating Least Squares“) 20.06.2012 DIMA – TU Berlin 11
12.
ALS-WR is scalable■ Which properties make this approach scalable? • all the features in one iteration can be computed independently of each other • only a small portion of the data necessary to compute a feature vectorParallelization with Map/Reduce • Computing user feature vectors: the mappers need to send each users rating vector and the feature vectors of his/her rated items to the same reducer • Computing item feature vectors: the mappers need to send each items rating vector and the feature vectors of users who rated it to the same reducer 20.06.2012 DIMA – TU Berlin 12
13.
Incorporating biases■ Problem: explicit feedback data is highly biased □ some users tend to rate more extreme than others □ some items tend to get higher ratings than others■ Solution: explicitly model biases □ the bias of a rating is model as a combination of the items average rating, the item bias and the user bias b ij b i b j □ the rating bias can be incorporated into the prediction rij b i b j m j u i T ˆ 20.06.2012 DIMA – TU Berlin 13
14.
Latent factor models■ implicit feedback data is very different from explicit data! □ e.g. use the number of clicks on a product page of an online shop □ the whole matrix is defined! □ no negative feedback □ interactions that did not happen produce zero values □ however we should have only little confidence in these (maybe the user never had the chance to interact with these items) □ using standard decomposition techniques like SVD would give us a decomposition that is biased towards the zero entries, again not applicable 20.06.2012 DIMA – TU Berlin 14
15.
Latent factor models■ Solution for working with implicit data: weighted matrix factorization 1 rij 0■ create a binary preference matrix P p ij 0 rij 0 ■ each entry in this matrix can be weighted by a confidence function □ zero values should get low confidence c ( i , j ) 1 rij □ values that are based on a lot of interactions should get high confidence■ confidence is incorporated into the model □ the factorization will ‚prefer‘ more confident values f U, M = T c ( i , j ) p ij m j u i 2 + λ ui 2 + m j 2 20.06.2012 DIMA – TU Berlin 15
16.
Sources • Sarwar et al.: „Item-Based Collaborative Filtering Recommendation Algorithms“, 2001 • Koren et al.: „Matrix Factorization Techniques for Recommender Systems“, 2009 • Funk: „Netflix Update: Try This at Home“, http://sifter.org/~simon/journal/20061211.html, 2006 • Zhou et al.: „Large-scale Parallel Collaborative Filtering for the Netflix Prize“, 2008 • Hu et al.: „Collaborative Filtering for Implicit Feedback Datasets“, 200820.06.2012 DIMA – TU Berlin 16
Views
Actions
Embeds 0
Report content