PhD Consortium ADBIS presetation.


Published on

My presentation for the PhD consortium of ADBIS conference.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PhD Consortium ADBIS presetation.

  1. 1. Mathematical methods of Tensor Factorization applied to Recommender Systems Giuseppe Ricci, PhD Student in Computer Science University of Study of Bari “A. Moro” Advances in DataBases and Information Systems PhD Consortium, Genoa, 01 Septembre 2013 Semantic Web Access and Personalization research group Dipartimento di Informatica
  2. 2. Information Overload & Recommender Systems On internet today, an overabundance of information can be accessed, making it difficult for users to process and evaluate options and make appropriate choices. Recommender Systems (RS) are techniques for information filtering which play an important role in e- commerce, advertising, e-mail filtering, etc.
  3. 3. What do RS do exactly? ① Predict how much you may like a certain product/service ② Compose a list of N best items for you ③ Compose a list of N best users for a certain product/service ④ Explain why these items are recommended to you ⑤ Adjust the prediction and recommendation based on your feedback (ratings) and other people I1 I2 I3 I4 I5 I6 I7 I8 I9 U1 1 5 4 U2 4 2 5 U3 4 5 U4 5 2 4 A 1 3 1 3 1 4 5 8 user-item matrix
  4. 4. Matrix Factorization Matrix Factorization (MF) techniques fall in the class of collaborative filtering (CF) methods  latent factor models: similarity between users and items is induced by some factors hidden in the data Latent factor models build a matrix of users and items and each element is associated with a vector of characteristics MF techniques represent users and items by vectors of features derived from ratings given by users for the items seen or tried Yehuda Koren, Robert Bell, and Chris Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8):30-37, 2009.
  5. 5. Matrix Factorization U set of users, D set of items, R rating matrix. MF aims to factorize R into two matrices P and Q such that their product approximates R: P row: strength of the association between user and k latent features. Q column: strength of the association between an item and the latent features. Once these vectors are discovered, recommendations are calculated using the expression of A MF used in literature: Singular Value Decomposition (SVD): • introduced by Simon Funk in the NetFlix Prize • has the objective of reducing the dimensionality, i. e. the rank, of the user-item matrix • capture latent relationships between users and items T T ij i jR P Q r p q ijr
  6. 6. SVD Different SVD algorithms were used in RS literature: • in [15], the authors uses a small SVD obtained retaining only k << r singular values by discarding other entries; • in [11], the authors propose an algorithm to perform SVD on large matrices, by focusing the study on parameters that affect the convergence speed; • in [9], Koren presents an approach oriented on factor models which projected users and items in the same latent space where some measures for comparison are defined. He propose several versions of SVD with the objective of having better recommendations as well as good scalability [15] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl. Incremental singular value decomposition algorithms for highly scalable recommender systems. [11] Miklos Kurucz, Andras A. Benczur, and Balazs Torma. Methods for large scale SVD with missing values. [9] Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model.
  7. 7. Limitation of MF Techniques They take into account only the standard profile of users and items This does not allow to integrate further information such as context Contextual information (the place where the user see the movie, the device, the company...) cannot be managed with simple user-item matrices Family with children At cinema with friends or collegues
  8. 8. Tensors & Tensor Factorization [6] R.A. Harshman. Foundations of the PARAFAC Procedure: Models and Conditions for an "explanatory" Multi-modal Factor Analysis, volume 1 (16) of Working papers in phonetics. University of California at Los Angeles, 1970. [12] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. A multilinear singular value decomposition. SIAM J. Matrix Anal. Appl, 21:1253-1278, 2000. Tensors are higher-dimensional arrays of numbers might be exploited in order to include additional contextual information in the recommendation process. The techniques that generalize the MF can also be applied to tensors. Two particular Tensor Factorizations (TF) can be considered to be higher- order extensions of matrix singular value decomposition: • PARallel FACtor analysis [6] or CANonical DECOMPosition (PARAFAC/CANDECOMP), which decomposes a tensor as a sum of rank-one tensors; • High Order Singular Value Decomposition [12] (HOSVD), which is an higher-order form of Principal Component Analysis (PCA)
  9. 9. HOSVD is the most widely adopted TF technique. HOSVD is a generalization of the SVD for matrices: decomposes the initial tensor in N matrices (N is the size of the tensor) and a “small tensor”. Examples of HOSVD in RS: • Multiverse recommendation [7]: TF is applied to manage data for users, movies, user ratings and contextual information such as age, day of the week, companion; • Tensor factorization for tag recommendation [13]: for a social tagging system, users' data, items and tags are stored in a 3rd order tensor factored, aim: discovering latent factors which bind the associations user-item, user-tag and tag-item; [7] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. [13] Steen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD, pages 727-736, 2009. HOSVD & RS 1/2
  10. 10. HOSVD & RS 2/2 • Cubesvd [17]: system of personalized web search, in order to discover the hidden relationships between users, queries, web pages. Data are collected in a 3rd order tensor that is decomposed. [17] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web, WWW '05, pages 382-390, New York, NY, USA, 2005. ACM.
  11. 11. HOSVD: advantages & disadvantages Advantages: • the ability of taking into account more dimensions simultaneously • better data modeling than standard SVD, dimensionality reduction can be performed not only in one dimension but also separately for each dimension Disadvantages: • is not an optimal tensor decomposition, in the sense of least squares data fitting: in SVD truncating the first n singular values allows to find the best n-rank approximation of a given matrix • high computational cost • cannot deal with missing values  they are treated as 0
  12. 12. PARAFAC PARAFAC model of a 3-dimensional array is given by 3 loading matrices A, B and C with typical elements aif , bjf , and ckf . PARAFAC model is defined by: ˆxijk = aif bjf ckf f =1 F å F: number of rank-one components. PARAFAC Advantages: • alternative to HOSVD • more simplicity • linear computation time compared to HOSVD • does not collapse data, but it retains its natural 3-dimensional structure • components are unique, up to permutation and scaling, under mild conditions
  13. 13. PARAFAC, RS and not only 1/2 In Tfmap: optimizing map for top-n context-aware recommendation [16]: tensor of 3-dimensions (users, items and context types) is factorized with PARAFAC. Dimensions are associated with the 3 factor matrices and used to calculate user preference for item i under context type k. Problem: PARAFAC & Missing Data Solution: CP-WOPT algorithm [16] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap: optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, SIGIR '12, pages 155{164, New York, NY, USA, 2012. ACM.
  14. 14. PARAFAC, RS and not only 2/2 In Scalable tensor factorizations with missing data, PARAFAC & Missing Data. CP-WOPT [1] (CP Weighted OPTimization) algorithm uses 1st-order optimization to solve the weighted least squares objective function. Using extensive numerical experiments on simulated data sets CP-WOPT can successfully factor tensors with noise and up to 70% missing data. CP-WOPT is significantly faster and accurate than the best published method in literature. [1] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mrup. Scalable tensor factorizations with missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining, pages 701- 712, Philadelphia, April 2010. SIAM.
  15. 15. CP-WOPT adaptation: Preliminary Experiments 1/3 CP-WOPT algorithm adapted to RS: • takes into account missing values  the algorithm is suitable for very sparse user-item matrices • computation of a weighted factorization that models only known values, rather to simply employ 0 values for missing data • main goals: • good reconstruction of missing values • consider contextual information  to achieve more precise recommendations. Preliminary user study: users rated some movies (not all) under contextual factors  7 real users  11 movies in the Movielens 100k dataset  contextual factors: if they like to see the movie  at home or cinema;  with friends or with partner;  with or without family.
  16. 16. CP-WOPT adaptation: Preliminary Experiments 2/3 Main Goal: good reconstruction of missing values with CP-WOPT adapted Ratings range: 1 to 5 Rating coding: • 1-2: strong-modest preference for the 1st option • 3: neutrality; • 4-5: modest-strong preference for the 2nd option Metrics: accuracy (acc ), % of known values correctly reconstructed coverage (cov ), % of non-zero values returned Results: 105 maximum iterations acc = 94.4% cov = 91.7% 100 100 known values errors acc 100 cov 100 unknown values errors
  17. 17. Other quality results: the experiment showed that it is possible to express, through the n- dimensional factorization, not only the recommendations for the single user, but also more specific suggestions about the consumption of an item. CP-WOPT adaptation: Preliminary Experiments 3/3
  18. 18. In Vitro: Preliminary Experiment Main Goal: test CP-WOPT adapted on RS for more precise recommendations Adapted version of CP-WOPT  subset (significant number of ratings) of Movielens 100k dataset. Ratings given by users wich have a profession are stored in a 3rd order tensor. Input: tensor of dimensions 100 users, 150 movies, 21 occupations (the contextual factor) Results: acc = 92.09% cov = 99.96% MAE = 0.60 RMSE = 0.93 in line with results reported in literature
  19. 19. Ongoing and Future Work • Extend the evaluation of our version of CP-WOPT on tensor having high dimensionality (Movielens dataset) • investigate methods to assess whether and which contextual factors (occupation, company) inuflence the users' preferences • user’s segmentation • plan to test our approach in other domains such as news recommendation or Electronic Program Guides
  20. 20. Thanks for your attention!! Dott. Giuseppe Ricci PhD Student in Computer Science Department of Computer Science 4 floor LACAM Lab., SWAP Room Phone: +39-080-5442298 E-mail: