Semantics In Digital Photos A Contenxtual Analysis
A scalable collaborative filtering framework based on co clustering
1. A SCALABLE
COLLABORATIVE
FILTERING FRAMEWORK
BASED ON CO-CLUSTERING
1 Authors/ Thomas George and Srujana Merugu
Source/ ICDM’05, pp. 628-628
Presenter/ Allen
2. OUTLINE
Introduction
Related Work
Problem Definition
Collaborative Filtering via Co-clustering
Scalable Collaborative Filtering System
Experimental Results
Conclusion
2
3. INTRODUCTION
Due to the overwhelming increasing in web-based
activities, users are often forced to choose from a large
number of products or content items.
To aid users in the decision making process, it has
become increasingly important to design recommender
systems.
Collaborative filtering identify the likely preferences of a
user based on the known preferences of other users.
3
4. INTRODUCTION (CONT.)
Existing collaborative filtering methods based on correlation criteria
Singular value decomposition (SVD)
Non-negative matrix factorization (NNMF)
Drawbacks:
Computationally expensive of training component
The practical scenarios such as real-time news personalization
require dynamic collaborative filtering.
The key idea
Simultaneously obtaining user and item neighborhoods via co-
clustering.
Generating predictions based on average ratings. 4
5. INTRODUCTION (CONT.)
Two new contributions:
Dynamic collaborative filtering approach
Supporting the entry of new users, items and ratings via a hybrid of
incremental and batch versions of the co-clustering algorithm.
A scalable, real-time collaborative filtering system
Developing parallel versions of co-clustering, prediction and
incremental training routines.
Notation:
A: matrix, e.g. Aij denoting the corresponding matrix elements.
χ: sets, and enumerated as {xi}ni=1, where xi are the elements of
5
the set.
6. RELATED WORK
Recommender System
Content-based filtering system
Collaborative filtering system
Co-clustering
SVD and NNMF-based filtering techniques that predict the
unknown ratings based on a low rank approximation of the
original ratings matrix.
The missing values are filled with the average ratings.
Incrementalversions of SVD has been proposed to solve the
computational expensive problem. (SDM 2003)
6
7. PROBLEM DEFINITION
Let U={ui}mi=1 be the set of users such that |U|=m and
P={pj}nj=1 be the set of items such that |P|=n.
Let A be the m×n ratings matrix such that Aij is the rating
of the user ui to the item pj.
Let W be the m×n matrix corresponding to the condifence of
the ratings in A.
Wij=1, the rating is known and 0 otherwise.
Let user clustering ρ: {1, …, m} → {1, …, k}, and item
clustering γ:{1, …, n} → {1, …, l} 7
k: # user clusters; l: # item clusters
8. PROBLEM DEFINITION (CONT.)
The approximate matrix  is given by
where g=ρ(i), h=γ(j).
AiR, AjC are the average ratings of user ui and item pj.
AghCOC, AgRC and AhCC are the average ratings of the corresponding co-
cluster, user-cluster and item-cluster.
8
9. COLLABORATIVE FILTERING VIA
CO-CLUSTERING
Static training (co-clustering): the goal is to minimize
The row and column assignment steps can be
implemented efficiently by pre-computing the invariant
parts of the update cost functions.
Requiredinfo.
Row updating: minimizing
Column updating: minimizing
Aρ ( i )3j − Aρ (i ) h + Ah
tmp COC CC
9
13. SCALABLE COLLABORATIVE
FILTERING SYSTEM
Using a distributed memory representation for the data
objects so that each of the processors P1 and P2 are in
fact clusters of processors.
P1 handles the prediction and incremental training.
P2 is responsible for the static training.
13
15. EXPERIMENTAL RESULTS
Datasets and algorithm
Movie-lens (100K): 943 users and 1682 movies consists of
100,000 ratings(1-5).
BookCrossing: 470034 users and 133438 books consists of
269392 ratings(1-10).
Movie1-Movie10: 10-100% ratings of the movie-lens 100K.
80% training and 20% testing for all the datasets.
Evaluation metrics: Mean Absolute Error (MAE)
The experiments evaluated the effectiveness and efficiency in
terms of MAE and execution time.
15
16. MAE COMPARISON
Mov1: movie-lens
Mov2: BookCrossing
Mov3: 10 subsets of movie-lens
K=3
16
17. VARIATION OF MAE WITH #
PARAMETERS
# prediction parameters:
COCLUST:(m+n+kl-k-l) values
SVD, NNMF: (m+n)(k+l) values
Movie3 dataset
17
18. EFFICIENCY
The time is needed for prediction on each given test pair
of movie-lens.
Training time (co-clustering) vs. Data size
Movie-lensdataset
Experimental devices
AMD 1.4Ghz on 128 computer
nodes with 384MB RAM
18
19. TRAINING TIME VS. # OF
PROCESSORS
Movie-lens dataset
Experimental devices
AMD 1.4Ghz on different # of processors with 384MB RAM
19
20. CONCLUSION
Recommender system are proving to be extremely useful
for a number of online activities such as e-commerce.
Regarding to the dynamic scenario, the efficiency and
effectiveness issues should be concerned.
New users, items and ratings enter the system at a rapid rate.
This paper proposed a new dynamic CF approach based
on co-clustering.
Empirical results indicate the high quality predictions at 20
a much lower computational cost.