This document provides an overview of recommender systems. It defines recommender systems as tools that suggest items like books, movies, music or products to users based on their preferences and behavior. It then describes several common types of recommender system algorithms including content-based, collaborative filtering using similarity measures, and latent factor models. The document also briefly discusses open problems in recommender systems like how to evaluate and improve different evaluation metrics over time.
2. Who am I?
Chu-Yu Hsu
Data Scientist @ IBM Taiwan
Dedicated to Recommender
System
rio512hsu@gmail.com
https://github.com/ChuyuHsu
3. Outline
• What is Recommender System
• Related Algorithms
• Content Based Algorithms
• Collaborative Filtering (CF)
• Latent Factor Model
• Going Any Further
12. Pros
1.No need for data of other
users
2. Able to recommend to
users with unique tastes
3. Able to recommend new
& unpopular items
4. Explanations for
recommendations
Cons
1.Finding appropriate
features is hard
2. Overspecialisation
3. Cold-start for new users
14. Similarity
• Jaccard Similarity
• Cosine Similarity
• Centered Cosine Similarity
Normalize ratings by subtracting row mean
Also known as Pearson Correlation
16. Item Based v.s. User Based
• In theory user based CF and item based CF are dual
• Item based CF usually outperforms user-based in
many use cases
• Items are "simpler" than users
• Items belong to a small set of "genres", users have
varied tastes
• Item similarity is more meaningful than User
Similarity
18. • SVD should be a intuitive choice
• But R has missing entries
• SVD assumes all missing entries are zero
• Ignore the missing entries
• Forget to be orthogonal/unit length
19. • Our goal is to find P and Q such that (Sum of
Square Error):
• Root Mean Square Error (RMSE)
20. Alternative Least Squares
• Because p and q are both unknown, the object
function is not convex
• If fix one of the unknowns -> can be solved as a
least squares problem
21. Overfitting
• To solve overfitting we introduce regularization:
• Allow rich model where there are sufficient data
• Shrink aggressively where data are scarce
22. What’s More
• Prediction accuracy won’t always be the most
important
• Recentness
• Novelty
• Explanation based diversity
• Temporary diversity
24. Open Problems
• How to weight different behaviors
• How to improve deferent metrics
• How to evaluate and evolve
25. References
• Anand Rajaraman and Jeffrey David Ullman. 2011.
Mining of Massive Datasets. Cambridge University
Press, New York, NY, USA.
• 项亮. 2012. 推荐系统实践. ⼈人⺠民邮电出版社, 北京
26. – Jeffrey M. O’Brien, CNN Money
“The Age of Search has come to an end.
Long live the recommendation!”