Your SlideShare is downloading.
×

- 1. Multiverse Recommendation: N-dimensional Tensor Factorization for Context-aware Collaborative Filtering Alexandros Karatzoglou1 Xavier Amatriain 1 Linas Baltrunas 2 Nuria Oliver 2 1Telefonica Research Barcelona, Spain 2Free University of Bolzano Bolzano, Italy November 4, 2010 1
- 2. Context in Recommender Systems 2
- 3. Context in Recommender Systems Context is an important factor to consider in personalized Recommendation 2
- 4. Context in Recommender Systems Context is an important factor to consider in personalized Recommendation 2
- 5. Context in Recommender Systems Context is an important factor to consider in personalized Recommendation 2
- 6. Context in Recommender Systems Context is an important factor to consider in personalized Recommendation 2
- 7. Current State of the Art in Context- aware Recommendation 3
- 8. Current State of the Art in Context- aware Recommendation Pre-Filtering Techniques 3
- 9. Current State of the Art in Context- aware Recommendation Pre-Filtering Techniques Post-Filtering Techniques 3
- 10. Current State of the Art in Context- aware Recommendation Pre-Filtering Techniques Post-Filtering Techniques Contextual modeling 3
- 11. Current State of the Art in Context- aware Recommendation Pre-Filtering Techniques Post-Filtering Techniques Contextual modeling The approach presented here ﬁts in the Contextual Modeling category 3
- 12. Collaborative Filtering problem setting Typically data sizes e.g. Netlix data n = 5 × 105, m = 17 × 103 4
- 13. Standard Matrix Factorization Find U ∈ Rn×d and M ∈ Rd×m so that F = UM minimizeU,ML(F, Y) + λΩ(U, M) Movies! Users! U! M! 5
- 14. Multiverse Recommendation: Tensors for Context Aware Collaborative Filtering Movies! Users! 6
- 15. Tensors for Context Aware Collaborative Filtering Movies! Users! U! C! M! S! 7
- 16. Tensors for Context Aware Collaborative Filtering Movies! Users! U! C! M! S! Fijk = S ×U Ui∗ ×M Mj∗ ×C Ck∗ R[U, M, C, S] := L(F, Y) + Ω[U, M, C] + Ω[S] 7
- 17. Regularization Ω[F] = λM M 2 F + λU U 2 F + λC C 2 F Ω[S] := λS S 2 F (1) 8
- 18. Squared Error Loss Function Many implementations of MF used a simple squared error regression loss function l(f, y) = 1 2 (f − y)2 thus the loss over all users and items is: L(F, Y) = n i m j l(fij, yij) Note that this loss provides an estimate of the conditional mean 9
- 19. Absolute Error Loss Function Alternatively one can use the absolute error loss function l(f, y) = |f − y| thus the loss over all users and items is: L(F, Y) = n i m j l(fij, yij) which provides an estimate of the conditional median 10
- 20. Optimization - Stochastic Gradient Descent for TF The partial gradients with respect to U, M, C and S can then be written as: ∂Ui∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×M Mj∗ ×C Ck∗ ∂Mj∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×U Ui∗ ×C Ck∗ ∂Ck∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×U Ui∗ ×M Mj∗ ∂Sl(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )Ui∗ ⊗ Mj∗ ⊗ Ck∗ 11
- 21. Optimization - Stochastic Gradient Descent for TF The partial gradients with respect to U, M, C and S can then be written as: ∂Ui∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×M Mj∗ ×C Ck∗ ∂Mj∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×U Ui∗ ×C Ck∗ ∂Ck∗ l(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )S ×U Ui∗ ×M Mj∗ ∂Sl(Fijk , Yijk ) = ∂Fijk l(Fijk , Yijk )Ui∗ ⊗ Mj∗ ⊗ Ck∗ We then iteratively update the parameter matrices and tensors using the following update rules: Ut+1 i∗ = Ut i∗ − η∂UL − ηλUUi∗ Mt+1 j∗ = Mt j∗ − η∂ML − ηλMMj∗ Ct+1 k∗ = Ct k∗ − η∂CL − ηλCCk∗ St+1 = St − η∂Sl(Fijk , Yijk ) − ηλSS where η is the learning rate. 11
- 22. Optimization - Stochastic Gradient Descent for TF Movies! Users! U! C! M! S! 12
- 23. Optimization - Stochastic Gradient Descent for TF Movies! Users! U! C! M! S! 13
- 24. Optimization - Stochastic Gradient Descent for TF Movies! Users! U! C! M! S! 14
- 25. Optimization - Stochastic Gradient Descent for TF Movies! Users! U! C! M! S! 15
- 26. Optimization - Stochastic Gradient Descent for TF Movies! Users! U! C! M! S! 16
- 27. Experimental evaluation We evaluate our model on contextual rating data and computing the Mean Absolute Error (MAE),using 5-fold cross validation deﬁned as follows: MAE = 1 K n,m,c ijk Dijk |Yijk − Fijk | 17
- 28. Data Data set Users Movies Context Dim. Ratings Scale Yahoo! 7642 11915 2 221K 1-5 Adom. 84 192 5 1464 1-13 Food 212 20 2 6360 1-5 Table: Data set statistics 18
- 29. Context Aware Methods Pre-ﬁltering based approach, (G. Adomavicius et.al), computes recommendations using only the ratings made in the same context as the target one Item splitting method (L. Baltrunas, F. Ricci) which identiﬁes items which have signiﬁcant differences in their rating under different context situations. 19
- 30. Results: Context vs. No Context No context Tensor Factorization 1.9 2.0 2.1 2.2 2.3 2.4 2.5 MAE (a) No context Tensor Factorization 0.80 0.85 0.90 0.95 MAE (b) Figure: Comparison of matrix (no context) and tensor (context) factorization on the Adom and Food data. 20
- 31. Yahoo Artiﬁcial Data 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 MAE α=0.1 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 α=0.5 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 α=0.9 No Context Reduction Item-Split Tensor Factorization Figure: Comparison of context-aware methods on the Yahoo! artiﬁcial data 21
- 32. Yahoo Artiﬁcial Data 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Probability of contextual influence 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95MAE No Context Reduction Item-Split Tensor Factorization 22
- 33. Tensor Factorization Reduction Item-Split Tensor Factorization 1.9 2.0 2.1 2.2 2.3 2.4 2.5 MAE Figure: Comparison of context-aware methods on the Adom data. 23
- 34. Tensor Factorization No context Reduction Tensor Factorization 0.80 0.85 0.90 0.95 MAE Figure: Comparison of context-aware methods on the Food data. 24
- 35. Conclusions Tensor Factorization methods seem to be promising for CARS Many different TF methods exist Future work: extend to implicit taste data Tensor representation of context data seems promising 25
- 36. Thank You ! 26