PhD defense


Published on

My PhD defense presentation in Computer Science.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

PhD defense

  1. 1. Mathematical Methods of Tensor Factorization Applied to Recommender Systems Dott. Giuseppe Ricci Scuola di Dottorato in Informatica XXVI Ciclo PhD Defense – 26 May 2014 Semantic Web Access and Personalization research group Dipartimento di Informatica 1
  2. 2. Outline  Motivations and Contributions  Information Overload & Recommender Systems  Matrix and Tensor Factorization in RS literature  Proposed solutions  Experimental Evaluation  Summary and Future Work 2
  3. 3. Motivations and Contributions 1/2  Matrix Factorization (MF) techniques have proved to be a quite promising solution to the problem of designing efficient filtering algorithms in the Big Data Era.  Several challenges in Recommender Systems (RS) research area:  missing values: data sparsity  incorporating contextual information: CARS  context relevance (weighting) in CARS. This work focuses on CARS Objective: to propose new methods to understand which contextual information is relevant, and use this information to improve the quality of the recommendations. 3
  4. 4.  Matrix and Tensor Factorization literature review.  CP-WOPT algorithm  solution for sparsity of RS data.  CARS and context-weighting:  2 proposed solutions to introduce only relevant contextual information in recommendation process  empirical evaluation of the 2 solutions. 4 Motivations and Contributions 2/2
  5. 5. Information Overload & Recommender Systems 5
  6. 6. Information Overload Source: Surplus of content compared to user’s ability to find relevant information  result is either you are late in making decisions, or you make the wrong decisions. “Information Overload” was used by the futurologist Alvin Toffler in 1970, when he predicted that the rapidly increasing amounts of information being produced would eventually cause people problems. 6
  7. 7. Recommender Systems 1/2  Recommender Systems (RS) represent a response to the problem of Information Overload and are now a widely recognized field of research [Ricci].  RS fall in the area of information filtering. With the growing amount of information available on the web, a very sensitive issue is to develop methods that can effectively and efficiently handle large amounts of data.  Mathematical methods have been proved useful in dealing with this problem recently in the context of the RS.  The search for more effective and efficient methods than those known in literature also guided by the interest in industrial research in this field, as evidenced by the NetFlixPrize competition. [Ricci] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor, editors. Recommender Systems Handbook. Springer, 2011. 7
  8. 8. Recommender Systems 2/2  Usually rating is stored in a matrix called user-item matrix or rating matrix.  RS calculate a rating estimate for item/product not purchased/tried  suggestion list with the highest rating estimation. 8 2 5
  9. 9. Examples of RS Applications: • e-commerce • advertising • e-mail filtering • social network …… 9
  10. 10. Basics of Recommender Systems 10
  11. 11. Recommender Systems: definitions  The area of RSs is relatively new  mid-1990s.  Concept: tools and techniques able to provide personalized information access to large collections of structured and unstructured data and to provide users with advices about items they might be interested in. Some definitions:  [Olsson]: “RS is a system that helps a user to select a suitable item among a set of selectable items using a knowledge-base that can be hand-coded by experts or learned from recommendations generated by the users”.  [Burke]: “RS have the effect of guiding the user in a personalized way to interesting or useful objects in a large space of possible options”. [Olsson] Tomas Olsson. Bootstrapping and Decentralizing Recommender Systems . PhD thesis, Department of Information Technology, Uppsala University and SICS, 2003. [Burke] R. Burke. Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction, 12(4):331–370, 2002. 11
  12. 12. RS Classification [Burke] [Burke] Robin Burke. Hybrid recommender systems: Survey and experiments. User Modeling and User-Adapted Interaction , 12(4):331–370, 2002. Context Aware Recommender Systems (CARS) 12
  13. 13. Content-Based RS (CBRS)  Assumption: user preferences remain stable over time.  They suggest items similar to those previously labeled as relevant by the target user.  Based on the analysis and exploitation of textual contents since  each item to be recommended has to be described by means of textual features.  Needs 2 pieces of information: a textual description of the item and a user profile describing user interests in terms of textual features. 13
  14. 14. Collaborative Filtering RS  Assumption: users that in the past shared similar tastes will have similar tastes in the future as well  nearest neighbors.  Rely with a matrix where each user is mapped on a row and each item is represented by a column  user/item or rating matrix.  A recent trend is to exploit matrix factorization methods  A common technique applied in CFRS is Singular Value Decomposition (SVD). 14
  15. 15. Hybrid Recommender Systems  Combining 2 or more classes of algorithms in order to emphasize their strengths and to level out their corresponding weaknesses.  For example, a collaborative system and a content-based system might be combined to compensate the new user problem, providing recommendations to users whose profiles are too poor to trigger the collaborative recommendation process.  Burke proposed an analytical classification of hybrid systems, listing a number of hybridization methods to combine pairs of recommender algorithms. In [Burke] 7 different hybridization techniques are introduced. [Burke] Robin Burke. The adaptive web. chapter HybridWeb Recommender Systems, pages 377–408. Springer-Verlag, Berlin, Heidelberg, 2007. 15
  16. 16. Context What is the context?  One of the most cited definition of context is that of Dey [Dey] et al. that defines context as: ”Any information that can be used to characterize the situation of an entity. An entity is a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and the applications themselves”. Bazire and Brezillon [Bazire] examined and compared some 150 different definitions of context from a number of different fields and concluded that the multifaceted nature of the concept makes it difficult in find a unifying definition. Li [Li] et al. define 5 context dimensions: who (user), what (object), how (activities), where (location) and when (time). [Dey] Anind K. Dey. Understanding and using context. Personal Ubiquitous Comput.,5(1):4–7, 2001. [Bazire] Mary Bazire and Patrick Brézillon. Understanding context before using it. In Proceedings of the 5th International Conference on Modeling and Using Context ,CONTEXT’05, pages 29–40, Berlin, Heidelberg, 2005. Springer-Verlag. [Li] Luyi Li, Yanlin Zheng, Hiroaki Ogata, and Yoneo Yano. A framework of ubiquitous learning environment. In CIT , pages 345–350. IEEE Computer Society, 2004. 16
  17. 17.  Context-Aware Recommender System (CARS) take account of contextual factors,such as available time, location, people nearby, etc., that identify the context where the product is tried.  We suppose these factors may have a structure:  for example "location" may be defined in terms of home, public place, theatre, cinema, etc. Context Aware RS (CARS) 17
  18. 18. Challenges of a CARS are:  relevance of contextual factors: it is important to decide which contextual variables are relevant in the recommendation process;  availability of contextual information: relevant contestual factors can be considered as a part of the data collection but such historical contextual information is often not available when designing the system;  extraction of contextual information from user’s activities: these data need to be recorded;  evaluation and lack of publicly available datasets. Context Aware RS 18
  19. 19.  CARS incorporates users and items information as well as other types of data such as context, using these to infer unkonwn ratings: f: Users x Items x Contexts  Rating  CARS deals with a quadruple input: <user, item, context, rating> where the recommender records the preference of the user from the selected item according to the context information which tells you if the product is consumed by the user. Context Aware RS 19
  20. 20. Paradigm to incoporate context In a movie RS, if a user wants to see a film one day during the holidays, only the ratings assigned in holidays are used Data are used in the estimation of the ratings by a multidimensional function or by a heuristic calculations to incorporate contextual information in addition to the user and item data 20 Pre-filtering Post-filtering Contextual Modeling
  21. 21. Context Weighting  It is not always simple to provide what contextual information is important for a specific scope.  Many parameters - in different manners. Not all acquired contextual information are important for the recommendation process: some contextual variables can introduce noise  degrade the quality of suggestions.  For each user, what contextual information is helpful to give, for more precise and reliable recommendations.  PROBLEM: users may rate items in different contexts, but it is not guaranteed that we can find dense contextual ratings under the same context, i.e. there may be very few users who have rated the items in the same contexts.  Solutions: 2 branches: Context Selection (survey) and Context Relaxation (binary selection). 21
  22. 22. Matrix Factorization in RS literature 22
  23. 23. Background  With the ever-increasing information available,the challenge of implementing personalized filters has become the challenge of designing algorithms able to manage huge amounts of data for the elicitation of user needs and preferences.  Matrix Factorization techniques have proved to be a quite promising solution.  MF techniques fall into the class of CF methods, and, particularly, in the class of latent factor models  similarity between users and items is induced by some factors hidden in the data.  We will focus our attention on Singular Value Decomposition (SVD). 23
  24. 24. Basics of MF  U: set of users  D: set of items  R: the matrix of ratings.  MF aims to factorize R into two matrices P and Q such that their product approximates R: A factorization used in RS literature is Singular Value Decomposition (SVD) introduced by Simon Funk in the NetFlix Prize. SVD-objective: reducing the dimensionality, i.e. the rank, of the user-item matrix, in order to capture latent relationships between users and items. 24
  25. 25. SVD in RS Literature 1/2  Sarwar:  SVD based algorithm  Low-rank approximation: retaining only k << r singular values (the biggest) by discarding other entries.  Koren:  SVD based algorithm (Asymmetric-SVD, SVD++)  Explicit and implicit feedback  Baseline estimates.  Julià:  Alternation Algorithm  An alternative to SVD  The aim is the same as the one of SVD  Alternation makes it possible to deal with missing. 25 user-factors vector pu item-factors vector qi
  26. 26. Advantages:  limited computational cost and good quality recommendations (Sarwar)  good algorithms and high accuracy (Koren)  Alternation Algorithm deals with missing values and good computational resources required (Julià). Problems:  technique not applicable on frequently updated database (Sarwar)  models are not justified by a formal model (previous ratings are not explained) (Koren)  r known values in each row/column (Julià). [Sarwar] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems, 5th International Conference on Computer and Information Technology (ICCIT), 2002 [Koren] Yehuda Koren Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD'08), 2008 [Julià] Carme Julià, Angel D Sappa, Felipe Lumbreras, Joan Serrat, Antonio López Predicting Missing Ratings in Recommender Systems: Adapted Factorization Approach, in International Journal of Electronic Commerce (2009) 26 SVD in RS Literature 2/2
  27. 27. Summary • We analyzed MF technique • We focused our attention on SVD techniques • The main limitations of MF techniques: • they take into account only the standard profile of the users • does not allow to integrate further information such as the context. 27
  28. 28. Matrix 2 Tensor  Matrix and MF can’t be used in a CARS based on a contextual modeling paradigm:  context information is used in the process of recommendation and matrices are not adeguate for this scope.  We need to introduce tensors. users contexts items <user, item, context, rating> 28
  29. 29. Tensor Factorization: HOSVD and PARAFAC in RS literature 29
  30. 30. Tensors  Tensors  higher-dimensional arrays of numbers, might be exploited in order to include additional contextual information in the recommendation process.  In standard multivariate data analysis, data are arranged in a 2D structure, but for a wide variety of domains, more appropriate structures are required for taking into account more dimensions: xijk i=1,..,I j=1,..,J k=1,..,K. 2 particular TF can be considered to be higher-order extensions of matrix Singular Value Decomposition: 1. High Order Singular Value Decomposition (HOSVD) which is a generalization of SVD for matrices; 2. PARallel FACtor analysis or CANonical DECOMPosition (PARAFAC/CANDECOMP) higher-order form of Principal Component Analysis. 30
  31. 31. HOSVD decomposes the initial tensor in N matrices (where N is the size of the tensor) and a tensor whose size is smaller than the original one (core tensor). Tensor Factorization 31 In RS literature, the most frequently used technique for tensor factorization is HOSVD.
  32. 32. HOSVD in RS Literature 1/2  Baltrunas:  Multiverse Recommendations algorithm  HOSVD TF based algorithm  data: users, movies, contextual information and user ratings  3-order tensor.  Rendle:  RTF algorithm  social tagging system  Reconstructed tensor: measure the strength of association between users, items and tags.  Chen:  CubeSVD  Personalized web search  Hidden relationships <user, query, web pages>  Output: < u, q, p, w>: w measures the popularity of page p as a result of query q made by the user u. 32
  33. 33. 33 HOSVD in RS Literature 2/2 Advantages:  good algorithm with improvement of results (Baltrunas)  good algorithm with improvement of results (Rendle)  CubeSVD tested on MSN clickthrough gives good results (Chen). Problems:  high computational cost (all)  time consuming algorithm (Chen). [Baltrunas] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver. Multiverse recommendation: n-dimensional tensor factorization for context-aware collaborative filtering. In Proceedings of the fourth ACM conference on Recommender systems , RecSys ’10, pages 79–86, New York, NY, USA, 2010. ACM. [Rendle] Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt- Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD , pages 727–736, 2009. [Chen] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a novel approach to personalized web search. In Proceedings of the 14th international conference on World Wide Web , WWW’05, pages 382–390, New York, NY, USA, 2005. ACM.
  34. 34. PARAFAC (PARallel FACtor analysis) PARAFAC (PARallel FACtor analysis) is a decomposition method. The PARAFAC model was independently proposed by Harshman and by Carroll and Chang. A PARAFAC model of a 3D array is given by 3 loading matrices A, B, and C with typical elements aif, bjf, and ckf. 34
  35. 35. HOSVD Vs PARAFAC HOSVD: • HOSVD is an extension of the SVD to higher order dimensions; • is the ability of simultaneously taking into account more dimensions; • better data modeling than standard SVD; • dimension reduction can be performed not only in one dimension but also separately for each dimension. HOSVD: • it is not an optimal tensor decomposition: HOSVD does not require an iterative algorithms, but needs standard SVD computation only; • it has not the truncation property of the SVD, where truncating the first n singular values allows to find the best n-rank approximation of a given matrix; • HOSVD cannot deal with missing values, they are treated as 0; • to prevent overfitting, HOSVD should use regularization. 35
  36. 36. PARAFAC: • is faster than HOSVD: linear computation time in comparison to HOSVD; • does not collapse data, but retains its natural three- dimensional structure; • despite PARAFAC mode’s lack of ortogonalithy, Kruskal showed that components are unique, up to permutation and scaling, under mild conditions. PARAFAC Vs HOSVD 36
  37. 37. PARAFAC in [Baltrunas12] TFMAP  PARAFAC  top-N context-aware recommendations of mobile applications. A tensor of 3 dimensions is factorized: • users • items • context types. Dimensions  3 factor matrices  calculate user m’s preference to item i under context type k: The authors introduced an optimization process using a gradient ascendent to avoid overfitting. [Baltrunas12] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic, and Nuria Oliver. Tfmap: optimizing map for top-n context-aware recommendation. In Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval , SIGIR ’12, pages 155–164, New York, NY, USA, 2012. ACM 37
  38. 38. Advantages:  TFMAP tested on Appazar projecet dataset increase MAE and Precision compared to other algorithms  good scalability: the training time of TFMAP increases almost linearly.  Problems:  TFMAP is tested only on 1 dataset  Significance of results ?? PARAFAC in [Baltrunas12] 38
  39. 39. PARAFAC in [Acar] [Acar] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mørup. Scalable tensor factorizations with missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on Data Mining , pages 701–712, Philadelphia, April 2010. SIAM. PARAFAC  goal: to capture the latent structure of the data via a higher-order factorization, even in the presence of missing data. The authors develop a scalable algorithm called CP-WOPT (CP Weighted OPTimization). Numerical experiments on simulated data sets  CP-WOPT can successfully factor tensors with noise and up to 70% missing data. 39
  40. 40. CP-WOPT is tested on EEG dataset: • it is not uncommon in EEG analysis that the signals from some channels are ignored due to the malfunctioning of the electrodes • the factors extracted by the CP-WOPT algorithm can capture brain dynamics in EEG analysis even if signals are missing from some channels. PARAFAC in [Acar] 40
  41. 41. Advantages:  CP-WOPT deal with missing values  CP-WOPT uses a weighted factorization based on PARAFAC  good results on tested dataset.  IDEA: CP-WOPT  RS  Problems  Computational cost ?? PARAFAC in [Acar] 41
  42. 42. Proposed solutions for missing values and context weighting 42
  43. 43. Scenario  CARS represent an evolution of the traditional CF paradigm.  State-of-the-art is based on TF as a generalization of the classical user-item MF that accomodates for the contextual information.  We are interested in the PARAFAC technique for its ability to deal with missing values.  We will propose the use of the algorithm CP-WOPT: our target is to identify the most promising method of factorization (PARAFAC) and the best algorithm implementing this factorization.  We propose 2 solutions to the problem of context weighting. 43
  44. 44. CP-WOPT Algorithm W tensor Rank of the tensor X Gradient Matrices 44
  45. 45. Implementation Details  CP-WOPT algorithm is implemented in Java.  Input tensor is given from a CSV file.  Values range from 1 to 5.  Missing values are conventionally represented 0.  The output returned approximation of the input tensor with the reconstructed missing data is stored into a CSV file.  Values less than 0 are normalized to 0. 45
  46. 46. CWBPA (Context Weighting with Bayesian Probabilistic Approach) 1/4 Idea: Conditional Probability + Bayes’ Theorem. 1) Conditional Probability for each user and each context. 2) Compare this distribution with an equiprobable distribution  divergence measure. • If the 2 distributions are similar  context does not influence the user’s rating; • If they are very different  rating is influenced by the context where the divergence measure is the highest. 46
  47. 47. CWBPA 2/4 cij="clearly", "sunny", "cloudy", "rainy” Assumption: liking = rating is influenced from context Contingency table for the context ci L: Liking variable E. G.: ci=“weather” n tables (contexts’ nr) x 1 user 47
  48. 48. CWBPA 3/4 P(ci=cij|L = 1); i = 1,..,mi ?  Bayes’ Theorem 48
  49. 49. • Comparing 2 distributions  divergent? • Degree of divergence: divergence index. DEF.: given 2 distributions A and B, which both refer to the same quality character X, calling fA k and fB k the relative frequencies related to the k, k = 1,..,K modality of the A and B distributions, a possibile family of divergence index is: CWBPA 4/4 49
  50. 50. CWAIC (Context Weighting Association Index Calculation) 1/2 • Idea: for each user and each context we want to calculate the Association Index of Cramér between liking and context. • Objective: to determine if context influences the rating. • We establish a threshold under which there is not a dependency rating-context, but over which there is influence or dependency. • Association measures are based on the value of X2, obtained from a r x c contingency table. • X2 test is helpful to verify independence hypotheses (corresponding to a zero association) between: • the modalities of the row variable • the modalities of the coloumn variable. 50
  51. 51. CWAIC 2/2 Cramér’s Index Φc The Cramér’s Index  contingency table of dimensions rxc. Based on X2 which is the most applied index for associations measures. It is calculated as: Φc=>0  not association Φc=1  perfectly correlation but only if the table is square Total observation number k=min(r, c) 51
  52. 52. 52 Using CWBPA and CWAIC Tensor – all context CWBPA CWAIC Influential Variable NOT Influential Variable Output REDUCED TENSOR Factorization with CP-WOPT
  53. 53. Experimental Evaluation 53
  54. 54. Evaluation of RS 1/3  Standard metrics have been defined by judging how much the prediction deviate from the actual rating.  Predictive accuracy metrics:  Mean Absolute Error (MAE): this metric measures the deviation between prediction and actual rating provided by the user:  Root Mean Squared Error (RMSE): follows the same principle of MAE but it squares the error before summing. Consequently, it penalizes large errors since they become much more pronounced than small ones. 54
  55. 55.  Classification metrics: these metrics evaluate how well a RS can split the item space into relevant and non-relevant items.  Precision: this metric counts how many items among the recommended ones are actually relevant for the target user.  Recall: this metric counts how many items among those that are relevant for the target user are actually recommended. Evaluation of RS 2/3 Recommended Content NOT Recommended Content Relevant Content True Positive (TR) False Negative (FN) Irrelevant Content False Positive (FR) True Negative (TR) 55
  56. 56.  F-Measure: a metric defined as the harmonic mean of precision and recall metrics. Let β be a parameter that determines the relative influence of both precision and recall, the F-Measure is calculated as follows: β=1  Evaluation of RS 3/3 F = 2 PR·RE PR+ RE 56
  57. 57. • 3 preliminary tests of the CP-WOPT  verify the effectiveness of this algorithm and to evaluate standard metrics; • 1 evaluation without context; • 2 evaluations to test our solutions CWBPA and CWAIC for context weighting. Introduction 1/2 57
  58. 58. 58 Introduction 2/2 Why 2 Baselines? • 1 without contextual information on 1 dataset • 1 with all contextual information available on 1 dataset. Does the proposed solutions work as a “filter” for contextual information?
  59. 59. CP-WOPT: preliminary evaluations 1/5 Preliminary user study: • 7 real users • rated a fixed number of movies (11) • 3 contextual factors. 3 contextual factors: i) if they like to watch the movie at home or at the cinema; ii) with friends or with a partner; iii) with or without family. Ratings range: 1-5 with “encoding” of context into rating: • rating 1 and 2 express a strong and a modest preference, respectively, for the first context term; • rating 3 expresses neutrality; • rating 4 and 5 express a modest and a strong preference, respectively, for the second context term. 59
  60. 60. CP-WOPT: preliminary evaluations 2/5 Metrics used: accuracy – coverage. Accuracy: the percentage of known values correctly reconstructed: Coverage: the percentage of non-zero values returned: 60
  61. 61. The experiment shows that it is possible to express, through the n - dimensional factorization, not only recommendations to the single user, but also more general considerations such as the mode of using an item, i.e. its trend of use. CP-WOPT: preliminary evaluations 3/5 61
  62. 62. CP-WOPT: preliminary evaluations 4/5 • Dataset used: subset of Movielens 100K • Input: tensor of dimensions 100 users x 150 movies x 21 occupations. • Contextual information: occupation (only available information in the dataset as contextual information) • Results: • acc = 92,09% • cov = 99,96% • MAE = 0,60 • RMSE = 0,93. Acceptable accuracy Coverage is very good 62
  63. 63. CP-WOPT: preliminary evaluations 5/5 Baseline: MyMediaLite* RS • UserItem-Baseline: CF algorithm • SVDPlusPlus: MF algorithm based on Singular Value Decomposition * 63
  64. 64. Evaluation of an explicit context dataset Dataset: LDOS-CoMoDa** LDOS-CoMoDa contains: • ratings for the movies • the 12 pieces of contextual information describing the situation in which the movies were watched. Properties: • ratings and the contextual information are explicitly acquired from the users immediately after they consumed the item; • the ratings and the contextual information are from real user-item Interaction; • users are able to rate the same item more than once if they consumed the item multiple times. ** 64
  65. 65.  LDOS-CoMoDa dataset has been in development since 15 September 2010. It contains 3 main groups of information:  general user information: provided by the user upon registering in the system  user’s age, sex, country and city;  item metadata: inserted into the dataset for each movie rated by at least one user  director’s name and surname, country, language, year;  contextual information. LDOS-CoMoDa 65
  66. 66. We experimented CP-WOPT on LDOS-CoMoDa dataset with ALL CONTEXT selected (19 contextual features). Accuracy Metrics We use 70% of ratings, by replacing the 30% of known rating with zero values. The 30% of values is randomly choosen. Evaluation on explicit context dataset 1/2 66
  67. 67. 0 0.2 0.4 0.6 0.8 1 1.2 CAMF (CAMF_C) DCW 1.017 SpliingApproaches (UI Splitting) CP-WOPT RMSE Evaluation of explicit context dataset 2/2 67
  68. 68. Baseline without context  This experiment aims at creating a baseline to compare our standard recommendation algorithms which do not exploit contextual information, so we want to use a 2D recommender.  For this purpose we run Mahout Algorithms on LDOS-CoMoDa dataset.  The Mahout recommender requires an input file or data. We will use a CSV file where user’s ratings assigned under some contextual situations are stored.  We neglect contextual information.  We remove the ratings given on the same item under different contexts case.  We consider the first rating in temporal order ignoring the others.  We will rearrange the data as triplet: <id user, id item, rating>. 68
  69. 69. Mahout algorithms compared  Some standard collaborative filtering algorithms are compared:  Singular Valued Decomposition  Different algorithms based on several user similarity measures (Spearman Correlation, Pearson Correlation, Euclidean Distance, Tanimoto Coefficient)  Algorithms based on item similarity (Log Likelihood, Euclidean Distance, Pearson Correlation)  Slope One Recommender.  For user similarity we use 10 neighborhoods to calculate the similarity between users.  We use 60% of the data as training set and 40% as test set. 69
  70. 70. Experimental Evaluation 1/6 0.00 0.20 0.40 0.60 0.80 1.00 1.20 1.40 1.60 1.80 SVD Pearson User Similarity Euclidean User Similarity Tanimoto User Similarity Spearman User Similarity Euclidian Item Similarity Pearson Item Similarity Tanimoto Item Similarity LogLikelihood Item Similarity SlopeOne MAE RMSE 70
  71. 71. Experimental Evaluation 2/6 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.10 SVD PearsonUser Similarity EuclideanUser Similarity TanimotoUser Similarity SpearmanUser Similarity EuclidianItem Similarity PearsonItem Similarity TanimotoItem Similarity LogLikelihoodItem Similarity SlopeOne P@5 R@5 F-score @5 71
  72. 72. Experimental Evaluation 3/6 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 SVD Pearson User Similarity Euclidean User Similarity Tanimoto User Similarity Spearman User Similarity Euclidian Item Similarity Pearson Item Similarity Tanimoto Item Similarity LogLikelihood Item Similarity SlopeOne P@10 R@10 F-score @10 72
  73. 73. Experimental Evaluation 4/6 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 SVD Pearson User Similarity Euclidean User Similarity Tanimoto User Similarity Spearman User Similarity Euclidian Item Similarity Pearson Item Similarity Tanimoto Item Similarity LogLikelihood Item Similarity SlopeOne P@20 R@20 F-score @20 73
  74. 74. Experimental Evaluation 5/6 0.00 0.05 0.10 0.15 0.20 0.25 SVD Euclidean User Similarity Spearman User Similarity Pearson Item Similarity LogLikelihood Item Similarity P@50 R@50 F-score @50 74
  75. 75.  In general the low values are due to the fact that the methodology used for evaluating the ranked item lists includes unrated items in the test set.  These items are tagged as not-relevant, therefore leading to likely underestimated performance, compared to a situation where all ratings are available.  This is not a problem in our evaluation, since the goal is just to compare algorithms, and performance is equally understimated for all of them.  Spearman User Similarity algorithm, which gave the lowest error, and Euclidean User Similarity algorithms, which gave the best accurancy, as baseline. Experimental Evaluation 6/6 75
  76. 76. LDOS-CoMoDa dataset: d = 19 contextual features User’s ratings with context information are stored in a CSV file. We use 70% of ratings, by replacing the 30% of known rating with zero values. The 30% of values is randomly choosen. CW Evaluation: Preliminary Phase 76 CW Proposed Solutions Reduced Tensor
  77. 77. CWBPA Evaluation 1/2 This experiment is performed to test the 2 proposed solutions CWBPA and CWAIC for context weighting. We apply the 2 methods on LDOS-CoMoDa dataset for evaluating standard metrics MAE, RMSE, accuracy, coverage, P and R. Contingency table L=1 We compare the probability distribution obtained from the previous calculations with the probability distribution 1/K, K = number of context variables. Divergence measure: 77
  78. 78. CWBPA Evaluation 2/2 78
  79. 79. Contingency table L=1 for each context and each user. For each table we calculate the X2 coefficient and the Cramér’s index Threshold. CWAIC Evaluation 79
  80. 80. CWBPA Vs CWAIC 7 runs of the 2 algorithms: 4 for CWBPA 3 for CWAIC we select the most significant contextual configurations. 80
  81. 81. CWBPA Vs CWAIC 1/2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Spearman User Similarity Euclidean User Similarity CWAIC CWBPA CP-WOPT MAE RMSE 81
  82. 82. CWBPA Vs CWAIC 2/2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Spearman User Similarity Euclidean User Similarity CWAIC CWBPA CP-WOPT P R 82
  83. 83. CWBPA Vs CWAIC – All users 1/2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Spearman User Similarity Euclidean User Similarity CWAIC CWBPA CP-WOPT MAE RMSE 83
  84. 84. CWBPA Vs CWAIC – All users 2/2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Spearman User Similarity Euclidean User Similarity CWAIC CWBPA CP-WOPT P R 84
  85. 85. Result Analysis 1/2 • Evaluated CP-WOPT algorithm as possibile solution to the missing values: • with a small dataset • on a Movilens 100K subset we had good results with a low error and good coverage value  CP-WOPT is able to reconstruct the tensor leaving only few values as missing data; • On Movielens: results reached are in line with those know in literature; • CP-WOPT on LDOS-CoMoDa dataset is better than other state- of-art recommendation algorithms; • Neglecting the contextual information by using a regular 2D RS, CF algorithms Spearman User Similarity and Euclidean User Similarity provided better performance. 85
  86. 86. • CWBPA and CWAIC give different responses to the problem of context weighting; • CWBPA and CWAIC are evaluated on LDOS-CoMoDa dataset, showing their effectiveness; • Using only some contextual variables lead to give more precise recommendations; • CWAIC has better performance than CWBPA. Result Analysis 2/2 86
  87. 87. Summary and Future Work 87
  88. 88. Recap Information Overload 88
  89. 89. Recap Recommender Systems 89
  90. 90. Recap CF  MF Tensors TF - ContextProposals: CP-WOPT CWBPA CWAIC 90
  91. 91. Recap – Experimental Evaluation 5 Evaluations to test: • Effectiveness of CP-WOPT into RS; • 2 proposed solutions for context weighting: • both approaches seem effective; • using only relevant contexts leads better recommendations compared to a traditional 2D RS or using all contextual information available. 91
  92. 92. Future Work 1/3 LDOS-CoMoDa dataset experiment on all context available. • 12 contextual variables in the LDOS-CoMoDa dataset; • We used only 5 of them to reduce the computational effort; • New extended evaluation of the Bayesian Probabilistic Approach and of the Association Index to minimize the dimensions of the tensor. 92
  93. 93. Future Work 2/3 Test on another contextual dataset. We want to test CP-WOPT, CWBPA and CWAIC on other datasets having explicit contextual information such as: • AIST Food dataset • TripAdvisor dataset to improve the significance of the results. 93
  94. 94. Future Work 3/3 A Real Application. We want to implement a web-based system to acquire data and test our proposed solutions in a concrete scenario, such as: Personalized Context-Aware Electronic Program Guides. 94
  95. 95. 95 Pubblications Most of the work presented is collected in the publications: Giuseppe Ricci, Marco de Gemmis, Giovanni Semeraro Matrix and Tensor Factorization Techniques applied to Recommender Systems: a Survey. International Journal of Computer and Information Technology (2277 – 0764) Volume 01– Issue 01, September 2012. Giuseppe Ricci, Marco de Gemmis, Giovanni Semeraro Mathematical Methods of Tensor Factorization Applied to Recommender Systems New Trends in Databases and Information Systems 17th East European Conference on Advances in Databases and Information Systems Volume 241, ISBN 978-3-319-01862-1, 2013, pp 383-388. Results of Experimental Evaluation are in phase of submission.
  96. 96. Questions? 96 “In things which are absolutely indifferent there can be no choice and consequently no option or will.” Gottfried Wilhelm von Leibniz