This document discusses recommender systems and provides an overview of key concepts. It begins by discussing the Netflix Prize challenge to improve Netflix's recommendation system. It then covers major challenges in recommender systems like data sparsity and cold starts. Different evaluation metrics and classifications of recommender systems are defined. Similarity-based collaborative filtering recommender algorithms like user-based and item-based are described. The document concludes by discussing Mahout's recommender system implementations and an example CNTV recommendation system.
2. We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems
◦ Recommend methods
◦ Mahout
◦ CNTV 5+ VIP Recommendation
3. We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability
4. Netflix Prize
◦
Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail
◦
60% of DVDs rented by Netflix are selected based on personalized
recommendations.
5. Netflix Prize
◦
In October 2006, Netflix released a dataset containing
approximately 100 million anonymous movie ratings and
challenged researchers and practitioners to develop recommender
systems that could beat the accuracy of the company's
recommendation system, Cinematch.
◦
On 21 September 2009, the grand prize of $1,000,000 was
awarded to a team that over performed the Cinematch's accuracy
by 10%.
6. Major challenges
◦
Data sparsity – 数据庞大;评分分布不均匀。
◦
Scalability– 数据庞大;增量更新。
◦
Cold start – 新来的用户
◦
Diversity vs. accuracy – 不要把路人皆知的推介给我
◦
Vulnerability to attacks – 有榜单,就有人刷榜
◦
The value of time – 不同时期喜欢不同的东西
◦
Evaluation of recommendations – 不同的推介方法谁好谁差
◦
User interface – 优化的展示方式,让用户乐于接受我们的推介
7. Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The probe set EP
-- no information from the probe set is allowed to
be used for recommendation.
8. Evaluation Metrics for Recommendation
◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)
◦ Root Mean Squared Error (RMSE)
10. Evaluation Metrics for Recommendation
◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the proportion of good recommendations that
appear in top recommendations.
12. Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based collaborative filtering
◦ Standard similarity-based methods
◦ methods employing social filtering
◦ Model-based collaborative filtering
◦ dimensionality reduction methods
◦ diffusion-based methods
◦ Hybrid approaches
13. Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average
14. Similarity-based methods
◦ User-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood,
similarity);
15. Similarity-based methods
◦ User-based recommender
•
•
•
•
Data model, implemented via DataModel
User-user similarity metric, implemented via UserSimilarity
User neighborhood definition, implemented via UserNeighborhood
Recommender engine, implemented via a Recommender (here,
GenericUserBasedRecommender)
16. Similarity-based methods
◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
17. Similarity-based methods
◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
Recommender recommender =
new GenericUserBasedRecommender(model, similarity);
24. Architecture of NeuRecommendation
Request for
recommendation
IMS etc.
Dispatch
request using
round robin
Dispatcher
Recommender
Recommender
Data Feeder
Fetching users’
preferences