Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

A scalable collaborative filtering framework based on co clustering


Published on

A scalable collaborative filtering framework based on co clustering

  • Be the first to comment

  • Be the first to like this

A scalable collaborative filtering framework based on co clustering

  1. 1. A SCALABLE COLLABORATIVE FILTERING FRAMEWORK BASED ON CO-CLUSTERING1 Authors/ Thomas George and Srujana Merugu Source/ ICDM’05, pp. 628-628 Presenter/ Allen
  2. 2. OUTLINE Introduction Related Work Problem Definition Collaborative Filtering via Co-clustering Scalable Collaborative Filtering System Experimental Results Conclusion 2
  3. 3. INTRODUCTION Due to the overwhelming increasing in web-based activities, users are often forced to choose from a large number of products or content items. To aid users in the decision making process, it has become increasingly important to design recommender systems. Collaborative filtering identify the likely preferences of a user based on the known preferences of other users. 3
  4. 4. INTRODUCTION (CONT.) Existing collaborative filtering methods based on correlation criteria  Singular value decomposition (SVD)  Non-negative matrix factorization (NNMF)  Drawbacks:  Computationally expensive of training component The practical scenarios such as real-time news personalization require dynamic collaborative filtering. The key idea  Simultaneously obtaining user and item neighborhoods via co- clustering.  Generating predictions based on average ratings. 4
  5. 5. INTRODUCTION (CONT.) Two new contributions:  Dynamic collaborative filtering approach  Supporting the entry of new users, items and ratings via a hybrid of incremental and batch versions of the co-clustering algorithm.  A scalable, real-time collaborative filtering system  Developing parallel versions of co-clustering, prediction and incremental training routines. Notation:  A: matrix, e.g. Aij denoting the corresponding matrix elements.  χ: sets, and enumerated as {xi}ni=1, where xi are the elements of 5 the set.
  6. 6. RELATED WORK Recommender System  Content-based filtering system  Collaborative filtering system Co-clustering  SVD and NNMF-based filtering techniques that predict the unknown ratings based on a low rank approximation of the original ratings matrix.  The missing values are filled with the average ratings.  Incrementalversions of SVD has been proposed to solve the computational expensive problem. (SDM 2003) 6
  7. 7. PROBLEM DEFINITION Let U={ui}mi=1 be the set of users such that |U|=m and P={pj}nj=1 be the set of items such that |P|=n. Let A be the m×n ratings matrix such that Aij is the rating of the user ui to the item pj.  Let W be the m×n matrix corresponding to the condifence of the ratings in A.  Wij=1, the rating is known and 0 otherwise. Let user clustering ρ: {1, …, m} → {1, …, k}, and item clustering γ:{1, …, n} → {1, …, l} 7  k: # user clusters; l: # item clusters
  8. 8. PROBLEM DEFINITION (CONT.) The approximate matrix  is given by  where g=ρ(i), h=γ(j).  AiR, AjC are the average ratings of user ui and item pj.  AghCOC, AgRC and AhCC are the average ratings of the corresponding co- cluster, user-cluster and item-cluster. 8
  9. 9. COLLABORATIVE FILTERING VIACO-CLUSTERING Static training (co-clustering): the goal is to minimize The row and column assignment steps can be implemented efficiently by pre-computing the invariant parts of the update cost functions.  Requiredinfo.  Row updating: minimizing  Column updating: minimizing Aρ ( i )3j − Aρ (i ) h + Ah tmp COC CC 9
  11. 11. PREDICTION 11
  13. 13. SCALABLE COLLABORATIVEFILTERING SYSTEM Using a distributed memory representation for the data objects so that each of the processors P1 and P2 are in fact clusters of processors.  P1 handles the prediction and incremental training.  P2 is responsible for the static training. 13
  15. 15. EXPERIMENTAL RESULTS Datasets and algorithm  Movie-lens (100K): 943 users and 1682 movies consists of 100,000 ratings(1-5).  BookCrossing: 470034 users and 133438 books consists of 269392 ratings(1-10).  Movie1-Movie10: 10-100% ratings of the movie-lens 100K. 80% training and 20% testing for all the datasets. Evaluation metrics: Mean Absolute Error (MAE)  The experiments evaluated the effectiveness and efficiency in terms of MAE and execution time. 15
  16. 16. MAE COMPARISON Mov1: movie-lens Mov2: BookCrossing Mov3: 10 subsets of movie-lens K=3 16
  17. 17. VARIATION OF MAE WITH #PARAMETERS # prediction parameters:  COCLUST:(m+n+kl-k-l) values  SVD, NNMF: (m+n)(k+l) values Movie3 dataset 17
  18. 18. EFFICIENCY The time is needed for prediction on each given test pair of movie-lens. Training time (co-clustering) vs. Data size  Movie-lensdataset  Experimental devices  AMD 1.4Ghz on 128 computer nodes with 384MB RAM 18
  19. 19. TRAINING TIME VS. # OFPROCESSORS Movie-lens dataset Experimental devices  AMD 1.4Ghz on different # of processors with 384MB RAM 19
  20. 20. CONCLUSION Recommender system are proving to be extremely useful for a number of online activities such as e-commerce. Regarding to the dynamic scenario, the efficiency and effectiveness issues should be concerned.  New users, items and ratings enter the system at a rapid rate. This paper proposed a new dynamic CF approach based on co-clustering. Empirical results indicate the high quality predictions at 20 a much lower computational cost.