Recommender System

hellojinjie
2013-06-19
We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems

◦ Recommend methods
◦ Mahout...
We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability
Netflix Prize
◦

Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail

◦
...
Netflix Prize
◦

In October 2006, Netflix released a dataset containing

approximately 100 million anonymous movie ratings...
Major challenges
◦

Data sparsity – 数据庞大;评分分布不均匀。

◦

Scalability– 数据庞大;增量更新。

◦

Cold start – 新来的用户

◦

Diversity vs. acc...
Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The prob...
Evaluation Metrics for Recommendation
◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)

◦ Root Mean Squared Error (RMSE)
Evaluation Metrics for Recommendation
Evaluation Metrics for Recommendation
◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the ...
Evaluation Metrics for Recommendation
Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based coll...
Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the...
Similarity-based methods
◦ User-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilari...
Similarity-based methods
◦ User-based recommender
•
•
•
•

Data model, implemented via DataModel
User-user similarity metr...
Similarity-based methods
◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that ...
Similarity-based methods
◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilari...
Summary of available recommender implementations in
Mahout
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮...
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮...
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮...
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮...
References
1. Sean Owen, Mahout in Action
2. Linyuan Lv, Recommender Systems
Architecture of NeuRecommendation
Request for
recommendation

IMS etc.
Dispatch
request using
round robin
Dispatcher

Reco...
Architecture of NeuRecommendation
Recommender

1.
2.

RPC

Data Store

Mahout

Serve recommendation
request
Fetch users’ p...
Recommender system
Upcoming SlideShare
Loading in …5
×

Recommender system

1,010 views

Published on

mahout

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,010
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Recommender system

  1. 1. Recommender System hellojinjie 2013-06-19
  2. 2. We will talk about ◦ Netflix Prize ◦ Major challenges ◦ Definitions of subjects and problems ◦ Recommend methods ◦ Mahout ◦ CNTV 5+ VIP Recommendation
  3. 3. We will not talk about ◦ Architecture of a recommender system ◦ How to make it robust and scalability
  4. 4. Netflix Prize ◦ Netflix, Inc. is an American provider of on-demand Internet streaming media and flat rate DVD-by-mail ◦ 60% of DVDs rented by Netflix are selected based on personalized recommendations.
  5. 5. Netflix Prize ◦ In October 2006, Netflix released a dataset containing approximately 100 million anonymous movie ratings and challenged researchers and practitioners to develop recommender systems that could beat the accuracy of the company's recommendation system, Cinematch. ◦ On 21 September 2009, the grand prize of $1,000,000 was awarded to a team that over performed the Cinematch's accuracy by 10%.
  6. 6. Major challenges ◦ Data sparsity – 数据庞大;评分分布不均匀。 ◦ Scalability– 数据庞大;增量更新。 ◦ Cold start – 新来的用户 ◦ Diversity vs. accuracy – 不要把路人皆知的推介给我 ◦ Vulnerability to attacks – 有榜单,就有人刷榜 ◦ The value of time – 不同时期喜欢不同的东西 ◦ Evaluation of recommendations – 不同的推介方法谁好谁差 ◦ User interface – 优化的展示方式,让用户乐于接受我们的推介
  7. 7. Evaluation Metrics for Recommendation ◦ The training set ET -- The training set is treated as known information ◦ The probe set EP -- no information from the probe set is allowed to be used for recommendation.
  8. 8. Evaluation Metrics for Recommendation ◦ Accuracy Metrics ◦ Mean Absolute Error (MAE) ◦ Root Mean Squared Error (RMSE)
  9. 9. Evaluation Metrics for Recommendation
  10. 10. Evaluation Metrics for Recommendation ◦ Precision is the proportion of top recommendations that are good. ◦ Recall is the proportion of good recommendations that appear in top recommendations.
  11. 11. Evaluation Metrics for Recommendation
  12. 12. Classifications of recommender systems ◦ Content-based recommendations ◦ Collaborative recommendations ◦ Memory-based collaborative filtering ◦ Standard similarity-based methods ◦ methods employing social filtering ◦ Model-based collaborative filtering ◦ dimensionality reduction methods ◦ diffusion-based methods ◦ Hybrid approaches
  13. 13. Similarity-based methods ◦ User-based recommender for every other user w compute a similarity s between u and w retain the top users, ranked by similarity, as a neighborhood n for every item i that some user in n has a preference for, but that u has no preference for yet for every other user v in n that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average
  14. 14. Similarity-based methods ◦ User-based recommender DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
  15. 15. Similarity-based methods ◦ User-based recommender • • • • Data model, implemented via DataModel User-user similarity metric, implemented via UserSimilarity User neighborhood definition, implemented via UserNeighborhood Recommender engine, implemented via a Recommender (here, GenericUserBasedRecommender)
  16. 16. Similarity-based methods ◦ Item-based recommender for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average
  17. 17. Similarity-based methods ◦ Item-based recommender DataModel model = new FileDataModel(new File("intro.csv")); ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); Recommender recommender = new GenericUserBasedRecommender(model, similarity);
  18. 18. Summary of available recommender implementations in Mahout
  19. 19. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483 http://172.16.0.237:10008/recommend/userID/260676/howMany/10
  20. 20. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  21. 21. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  22. 22. CNTV 5+ VIP Recommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  23. 23. References 1. Sean Owen, Mahout in Action 2. Linyuan Lv, Recommender Systems
  24. 24. Architecture of NeuRecommendation Request for recommendation IMS etc. Dispatch request using round robin Dispatcher Recommender Recommender Data Feeder Fetching users’ preferences
  25. 25. Architecture of NeuRecommendation Recommender 1. 2. RPC Data Store Mahout Serve recommendation request Fetch users’ preferences

×