Building a recommender
system using Collaborative
Filtering (CF)
Sarah Mestiri
Machine Learning Engineer
“Customer expect to be treated as
human, not number.”
Introduction
“State of the connected customer” report- Salesforce - 2016
Outline
Experiencing personalization through
collaborative filtering
Common-Methods in CF
Building a recommender system using CF
Experiencing
personalization through
collaborative filtering
Amazon.com
What’s Collaborative
Filtering?
Common Methods in CF
Memory-based CF
Types
User-based collaborative filtering Item-based collaborative filtering
Recommendations based on similar
users ratings to target user.
Recommendations based on target
user’s own ratings on similar items.
Item-based
Memory-based CF
Process explained
Prediction:
1- Similarity calculation
between items (users).
2-”Peer Group”: top-k
most similar items
(users).
i1 i2 i3 i4 i5
u1 3 5 1 1 ?
u2 1 0 0 2 1
u3 2 5 1 ? ?
u4 0 1 1 ? ?
Memory-based CF
Implementation &
challenges
Step 1: Choice of Similarity Measure
- Pearson (mean-centered ratings)
- Cosine
- Adjusted Cosine
Step 2: Determination of Peer Group
- Top-k most similar
items (users)
What if one user has a general tendency to rate generously while
the other is harsh in his ratings?
Step 3: Recommendation
Model-based CF
Introduction
Where do Neighborhood methods fail?
Computation
Scalability
Sparsity
Accuracy
Challenges
What to recommend to Alice?
Model-based CF
Matrix Inception Frozen King-kong Zootopia
Alice -1 -1 1 1 ?
Patrick 1 0 0 -1 1
John 1 -1 -1 ? ?
Sara 0 1 1 ? ?
Matrix Inception Frozen King-kong Zootopia
Children -1 -1 1 1 1
Action 1 1 -1 -1 0
Children Action
Alice 1 -1
Patrick -1 1
John -1 1
Sara 1 0
Model-based CF
Matrix factorization
X
Matrix Inception Frozen King-kong Zootopia
Alice -1 -1 1 1 ?
Patrick 1 0 0 -1 1
John 1 -1 -1 ? ?
Sara 0 1 1 ? ?
Matrix Inception Frozen King-kong Zootopia
Children -1 -1 1 1 1
Action 1 1 -1 -1 0
Children Action
Alice 1 -1
Patrick -1 1
John -1 1
Sara 1 0
Latent factors
Model-based CF
Matrix factorization
Model-based CF
Ratings prediction
How can one determine the
factor matrices U and V ?
Goal: Minimize the
difference between
predicted ratings &
observed ratings.
Stochastic Gradient
Descent SGD (~SVD)
or Alternating Least
Squares ALS
Building A Recommender
System
Step 1: Understand your Data
Visualizations (Pandas, Matplotlib,
Seaborn)
Step 2: Pre-Process your data
Cleanup, merge, etc. (Pandas,
Numpy)
Step 3: Build your ML Model
Implement the CF method
(KNN,SGD,ALS) or use it from
available libraries (Scikit-learn,
Surprise, MLLib Spark)
Step 4: Train your model
Predict ratings on the train set.
Step 5: Test your model
Predict ratings on the test set.
Step 6: Evaluate your model:
Measure accuracy using RMSE
https://github.com/SarahMestiri/RecommenderSystems
GitHub Repo:
Thank you!
mestiri.sa@gmail.com
Sarahmestiri.com
@mestirisarah

DN 2017 | Building a recommender system using collaborative filtering (CF) | Sarah Mestiri

Editor's Notes

  • #9 Advantages of item-based: leverage the user’s own ratings => more consistency. More stable with changes to the ratings. (adding new item VS adding new users : happening less VS more)
  • #11 Similarity challenges: User Bias + Number of common ratings between users Peer Group: weakly or negatively correlated users + Long tail impact.
  • #12 Compute all possible rating predictions for the relevant user-item pairs (e.g all items for a particular user) and then rank them. => Computationally expensive / possible MemoryError
  • #15 Exploit the fact that significant portions of the rows and columns of data matrices are highly correlated. Data redundancies + high correlation => Low-rank Matrix.
  • #16 State the advantages of ALS