This document discusses building a recommender system using collaborative filtering. It begins with an introduction to collaborative filtering and common methods like memory-based and model-based collaborative filtering. It then explains the process for memory-based collaborative filtering including similarity calculation, determining peer groups, and making recommendations. Model-based collaborative filtering is introduced using matrix factorization to predict ratings. The document concludes with the steps to build a recommender system which includes understanding data, pre-processing, building the collaborative filtering model, training and testing the model, and evaluating accuracy.
8. Memory-based CF
Types
User-based collaborative filtering Item-based collaborative filtering
Recommendations based on similar
users ratings to target user.
Recommendations based on target
user’s own ratings on similar items.
Item-based
9. Memory-based CF
Process explained
Prediction:
1- Similarity calculation
between items (users).
2-”Peer Group”: top-k
most similar items
(users).
i1 i2 i3 i4 i5
u1 3 5 1 1 ?
u2 1 0 0 2 1
u3 2 5 1 ? ?
u4 0 1 1 ? ?
10. Memory-based CF
Implementation &
challenges
Step 1: Choice of Similarity Measure
- Pearson (mean-centered ratings)
- Cosine
- Adjusted Cosine
Step 2: Determination of Peer Group
- Top-k most similar
items (users)
What if one user has a general tendency to rate generously while
the other is harsh in his ratings?
Step 3: Recommendation
13. Matrix Inception Frozen King-kong Zootopia
Alice -1 -1 1 1 ?
Patrick 1 0 0 -1 1
John 1 -1 -1 ? ?
Sara 0 1 1 ? ?
Matrix Inception Frozen King-kong Zootopia
Children -1 -1 1 1 1
Action 1 1 -1 -1 0
Children Action
Alice 1 -1
Patrick -1 1
John -1 1
Sara 1 0
Model-based CF
Matrix factorization
14. X
Matrix Inception Frozen King-kong Zootopia
Alice -1 -1 1 1 ?
Patrick 1 0 0 -1 1
John 1 -1 -1 ? ?
Sara 0 1 1 ? ?
Matrix Inception Frozen King-kong Zootopia
Children -1 -1 1 1 1
Action 1 1 -1 -1 0
Children Action
Alice 1 -1
Patrick -1 1
John -1 1
Sara 1 0
Latent factors
Model-based CF
Matrix factorization
15. Model-based CF
Ratings prediction
How can one determine the
factor matrices U and V ?
Goal: Minimize the
difference between
predicted ratings &
observed ratings.
Stochastic Gradient
Descent SGD (~SVD)
or Alternating Least
Squares ALS
Advantages of item-based: leverage the user’s own ratings => more consistency.
More stable with changes to the ratings. (adding new item VS adding new users : happening less VS more)
Similarity challenges: User Bias + Number of common ratings between users
Peer Group: weakly or negatively correlated users + Long tail impact.
Compute all possible rating predictions for the relevant user-item pairs (e.g all items for a particular user) and then rank them. => Computationally expensive / possible MemoryError
Exploit the fact that significant portions of the rows and columns of data matrices are highly correlated.
Data redundancies + high correlation => Low-rank Matrix.