Recommender System

hellojinjie
2013-06-19
We will talk about
◦ Netflix Prize
◦ Major challenges
◦ Definitions of subjects and problems

◦ Recommend methods
◦ Mahout
◦ CNTV 5+ VIP Recommendation
We will not talk about
◦ Architecture of a recommender system
◦ How to make it robust and scalability
Netflix Prize
◦

Netflix, Inc. is an American provider of on-demand Internet
streaming media and flat rate DVD-by-mail

◦

60% of DVDs rented by Netflix are selected based on personalized

recommendations.
Netflix Prize
◦

In October 2006, Netflix released a dataset containing

approximately 100 million anonymous movie ratings and
challenged researchers and practitioners to develop recommender
systems that could beat the accuracy of the company's
recommendation system, Cinematch.
◦

On 21 September 2009, the grand prize of $1,000,000 was
awarded to a team that over performed the Cinematch's accuracy

by 10%.
Major challenges
◦

Data sparsity – 数据庞大;评分分布不均匀。

◦

Scalability– 数据庞大;增量更新。

◦

Cold start – 新来的用户

◦

Diversity vs. accuracy – 不要把路人皆知的推介给我

◦

Vulnerability to attacks – 有榜单,就有人刷榜

◦

The value of time – 不同时期喜欢不同的东西

◦

Evaluation of recommendations – 不同的推介方法谁好谁差

◦

User interface – 优化的展示方式,让用户乐于接受我们的推介
Evaluation Metrics for Recommendation
◦ The training set ET
-- The training set is treated as known information
◦ The probe set EP

-- no information from the probe set is allowed to
be used for recommendation.
Evaluation Metrics for Recommendation
◦ Accuracy Metrics
◦ Mean Absolute Error (MAE)

◦ Root Mean Squared Error (RMSE)
Evaluation Metrics for Recommendation
Evaluation Metrics for Recommendation
◦ Precision is the proportion of top recommendations
that are good.
◦ Recall is the proportion of good recommendations that
appear in top recommendations.
Evaluation Metrics for Recommendation
Classifications of recommender systems
◦ Content-based recommendations
◦ Collaborative recommendations
◦ Memory-based collaborative filtering
◦ Standard similarity-based methods
◦ methods employing social filtering
◦ Model-based collaborative filtering
◦ dimensionality reduction methods
◦ diffusion-based methods
◦ Hybrid approaches
Similarity-based methods
◦ User-based recommender
for every other user w
compute a similarity s between u and w
retain the top users, ranked by similarity, as a neighborhood n
for every item i that some user in n has a preference for,
but that u has no preference for yet
for every other user v in n that has a preference for i
compute a similarity s between u and v
incorporate v's preference for i, weighted by s, into a running
average
Similarity-based methods
◦ User-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood =
new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender =
new GenericUserBasedRecommender(model, neighborhood,
similarity);
Similarity-based methods
◦ User-based recommender
•
•
•
•

Data model, implemented via DataModel
User-user similarity metric, implemented via UserSimilarity
User neighborhood definition, implemented via UserNeighborhood
Recommender engine, implemented via a Recommender (here,
GenericUserBasedRecommender)
Similarity-based methods
◦ Item-based recommender
for every item i that u has no preference for yet
for every item j that u has a preference for
compute a similarity s between i and j
add u's preference for j, weighted by s, to a running average
return the top items, ranked by weighted average
Similarity-based methods
◦ Item-based recommender
DataModel model = new FileDataModel(new File("intro.csv"));
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
Recommender recommender =
new GenericUserBasedRecommender(model, similarity);
Summary of available recommender implementations in
Mahout
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483

http://172.16.0.237:10008/recommend/userID/260676/howMany/10
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
CNTV 5+ VIP Recommendation
passport_260676's preference
(上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0
(第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0
MV-即刻出发(演唱:吉克隽逸)nba 3.0
(第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0
userBasedBooleanPref
(第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504
(第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127
wings nba 9.839406
(上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188
托尼·帕克 现场秀中文 nba 7.3381634
埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103
(下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148
歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176
罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416
(第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
References
1. Sean Owen, Mahout in Action
2. Linyuan Lv, Recommender Systems
Architecture of NeuRecommendation
Request for
recommendation

IMS etc.
Dispatch
request using
round robin
Dispatcher

Recommender

Recommender

Data Feeder

Fetching users’
preferences
Architecture of NeuRecommendation
Recommender

1.
2.

RPC

Data Store

Mahout

Serve recommendation
request
Fetch users’ preferences
Recommender system

Recommender system

  • 1.
  • 2.
    We will talkabout ◦ Netflix Prize ◦ Major challenges ◦ Definitions of subjects and problems ◦ Recommend methods ◦ Mahout ◦ CNTV 5+ VIP Recommendation
  • 3.
    We will nottalk about ◦ Architecture of a recommender system ◦ How to make it robust and scalability
  • 4.
    Netflix Prize ◦ Netflix, Inc.is an American provider of on-demand Internet streaming media and flat rate DVD-by-mail ◦ 60% of DVDs rented by Netflix are selected based on personalized recommendations.
  • 5.
    Netflix Prize ◦ In October2006, Netflix released a dataset containing approximately 100 million anonymous movie ratings and challenged researchers and practitioners to develop recommender systems that could beat the accuracy of the company's recommendation system, Cinematch. ◦ On 21 September 2009, the grand prize of $1,000,000 was awarded to a team that over performed the Cinematch's accuracy by 10%.
  • 6.
    Major challenges ◦ Data sparsity– 数据庞大;评分分布不均匀。 ◦ Scalability– 数据庞大;增量更新。 ◦ Cold start – 新来的用户 ◦ Diversity vs. accuracy – 不要把路人皆知的推介给我 ◦ Vulnerability to attacks – 有榜单,就有人刷榜 ◦ The value of time – 不同时期喜欢不同的东西 ◦ Evaluation of recommendations – 不同的推介方法谁好谁差 ◦ User interface – 优化的展示方式,让用户乐于接受我们的推介
  • 7.
    Evaluation Metrics forRecommendation ◦ The training set ET -- The training set is treated as known information ◦ The probe set EP -- no information from the probe set is allowed to be used for recommendation.
  • 8.
    Evaluation Metrics forRecommendation ◦ Accuracy Metrics ◦ Mean Absolute Error (MAE) ◦ Root Mean Squared Error (RMSE)
  • 9.
    Evaluation Metrics forRecommendation
  • 10.
    Evaluation Metrics forRecommendation ◦ Precision is the proportion of top recommendations that are good. ◦ Recall is the proportion of good recommendations that appear in top recommendations.
  • 11.
    Evaluation Metrics forRecommendation
  • 12.
    Classifications of recommendersystems ◦ Content-based recommendations ◦ Collaborative recommendations ◦ Memory-based collaborative filtering ◦ Standard similarity-based methods ◦ methods employing social filtering ◦ Model-based collaborative filtering ◦ dimensionality reduction methods ◦ diffusion-based methods ◦ Hybrid approaches
  • 13.
    Similarity-based methods ◦ User-basedrecommender for every other user w compute a similarity s between u and w retain the top users, ranked by similarity, as a neighborhood n for every item i that some user in n has a preference for, but that u has no preference for yet for every other user v in n that has a preference for i compute a similarity s between u and v incorporate v's preference for i, weighted by s, into a running average
  • 14.
    Similarity-based methods ◦ User-basedrecommender DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender(model, neighborhood, similarity);
  • 15.
    Similarity-based methods ◦ User-basedrecommender • • • • Data model, implemented via DataModel User-user similarity metric, implemented via UserSimilarity User neighborhood definition, implemented via UserNeighborhood Recommender engine, implemented via a Recommender (here, GenericUserBasedRecommender)
  • 16.
    Similarity-based methods ◦ Item-basedrecommender for every item i that u has no preference for yet for every item j that u has a preference for compute a similarity s between i and j add u's preference for j, weighted by s, to a running average return the top items, ranked by weighted average
  • 17.
    Similarity-based methods ◦ Item-basedrecommender DataModel model = new FileDataModel(new File("intro.csv")); ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); Recommender recommender = new GenericUserBasedRecommender(model, similarity);
  • 18.
    Summary of availablerecommender implementations in Mahout
  • 19.
    CNTV 5+ VIPRecommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483 http://172.16.0.237:10008/recommend/userID/260676/howMany/10
  • 20.
    CNTV 5+ VIPRecommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 21.
    CNTV 5+ VIPRecommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 22.
    CNTV 5+ VIPRecommendation passport_260676's preference (上半场11:00) 9-马竞-拉达梅尔.法尔考 攻入一球 lfp 3.0 (第一节08:59) 6-EAST-勒布朗.詹姆斯 灌篮得分 nba 5.0 MV-即刻出发(演唱:吉克隽逸)nba 3.0 (第二节11:00) 24-EAST-保罗.乔治 灌篮得分 nba 5.0 userBasedBooleanPref (第四节00:47) 32-WEST-布雷克.格里芬 灌篮得分 nba 20.860504 (第二节02:33) 32-WEST-布雷克.格里芬 接 24-WEST-科比.布莱恩特 传球,灌篮 nba 17.332127 wings nba 9.839406 (上半场22:00) 7-皇家马德里-克里斯蒂亚诺.罗纳尔多 自摆乌龙 lfp 8.962188 托尼·帕克 现场秀中文 nba 7.3381634 埃文斯再秀创意 空中换手+飞跃海报 nba 7.2042103 (下半场58:00) 10-巴塞罗那-梅西 攻入一球 lfp 7.201148 歌手 NE-YO 劲歌热舞 引导东部全明星入场 nba 7.151176 罗斯复制世纪之扣+向文斯·卡特致敬 nba 7.0464416 (第三节01:25) 34-掘金-贾维尔.麦基 灌篮得分 nba 6.4302483
  • 23.
    References 1. Sean Owen,Mahout in Action 2. Linyuan Lv, Recommender Systems
  • 24.
    Architecture of NeuRecommendation Requestfor recommendation IMS etc. Dispatch request using round robin Dispatcher Recommender Recommender Data Feeder Fetching users’ preferences
  • 25.
    Architecture of NeuRecommendation Recommender 1. 2. RPC DataStore Mahout Serve recommendation request Fetch users’ preferences