Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Coping with the Persistent Coldstart
Problem
Siarhei Bykau, Georgia Koutrika, Yannis Velegrakis
PersDB, 30.08.2013
Siarhei Bykau, U of Trento 2
Recommendation Systems
● Amazon (products)
● Netflix (movies)
● Facebook (friends)
● Google (...
Siarhei Bykau, U of Trento 3
Recommendation Approaches
● Content-based filtering (CB)
– build user's profile & look for si...
Siarhei Bykau, U of Trento 4
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox 1...
Siarhei Bykau, U of Trento 5
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox 1...
Siarhei Bykau, U of Trento 6
cid year area instructor trimester exam student rating
cs343 2011 DB Fox 1 written s5 avg
cs3...
Siarhei Bykau, U of Trento 7
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox 1...
Siarhei Bykau, U of Trento 8
Cold-Start Problem
Existing items
New items
Existing users New users
● Collaborative filterin...
Siarhei Bykau, U of Trento 9
Cold-Start: Existing Approaches
● Random recommendations
● External knowledge
– social networ...
Siarhei Bykau, U of Trento 10
Similarity Based Predictions
● Similar items have similar ratings:
● Similarity between two ...
Siarhei Bykau, U of Trento 11
Feature Based Prediction
● Rating transfers equally to ratings of features
● Rating of a fea...
Siarhei Bykau, U of Trento 12
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox ...
Siarhei Bykau, U of Trento 13
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox ...
Siarhei Bykau, U of Trento 14
Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox ...
Siarhei Bykau, U of Trento 15
Preference Pattern
cid year area instructor trimester exam student rating
cs343 2011 DB Fox ...
Siarhei Bykau, U of Trento 16
Preference Pattern
cid year area instructor trimester exam student rating
cs343 2011 DB Fox ...
Siarhei Bykau, U of Trento 17
Entropy Based Prediction
1. model features and ratings as variables:
2. introduce a joint di...
Siarhei Bykau, U of Trento 18
Max Entropy Intuition
Siarhei Bykau, U of Trento 19
Metrics
● Predictability
– Root Mean Square Error (individual rating accuracy)
– Normalized ...
Siarhei Bykau, U of Trento 20
Datasets
● Stanford Courses
– from 1997 to 2008
– 9799 ratings
– 675 courses
– 193 instructo...
Siarhei Bykau, U of Trento 21
Algorithms
● Similarity-based
● Feature-based
● Max entropy
● Linear regression [Park et al....
Siarhei Bykau, U of Trento 22
Accuracy/Coverage for Varying
Training Data Size (Stanford)
Siarhei Bykau, U of Trento 23
Average/Coverage for Varying Density of Features
(MovieLens)
Siarhei Bykau, U of Trento 24
Conclusions
● Addressed the new-user new-item cold start
problem
● Proposed a number of algo...
Upcoming SlideShare
Loading in …5
×

Coping with the Persistent Coldstart Problem

336 views

Published on

Recommender systems predict items a user is likely to like using historical data about users and items. A key challenge is how to provide recommendations when historical data is sparse or missing, known as the cold-start problem. Current solutions to this problem
assume that given an item and a user, the recommendation process misses historical data only about one of them but not both. In this paper, we are interested in the challenging more severe form of the cold-start problem of new-user/new-item. In particular, we are in-
terested in cases where a system collects historical data about users and items but produces recommendations mostly for new or anonymous users and about new or evolved items. We present methods that can be used to deal with this problem, study them and present our experimental findings.

Published in: Data & Analytics
  • Unlock Her Legs(Official) $69 | Get 90% Off + 8 Special Bonus? ♣♣♣ http://t.cn/AijLRbnO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Real Ways To Make Money, Most online opportunities are nothing but total scams! ●●● http://scamcb.com/ezpayjobs/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Coping with the Persistent Coldstart Problem

  1. 1. Coping with the Persistent Coldstart Problem Siarhei Bykau, Georgia Koutrika, Yannis Velegrakis PersDB, 30.08.2013
  2. 2. Siarhei Bykau, U of Trento 2 Recommendation Systems ● Amazon (products) ● Netflix (movies) ● Facebook (friends) ● Google (news) ● Twitter (who to follow)
  3. 3. Siarhei Bykau, U of Trento 3 Recommendation Approaches ● Content-based filtering (CB) – build user's profile & look for similar items ● Collaborative filtering (CF) – find users with similar tastes ● Hybrid – combine previous two
  4. 4. Siarhei Bykau, U of Trento 4 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high
  5. 5. Siarhei Bykau, U of Trento 5 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high cs241 2012 PL Smith 2 oral s19 ?
  6. 6. Siarhei Bykau, U of Trento 6 cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high cs241 2012 PL Smith 2 oral s19 ? Course Evaluations
  7. 7. Siarhei Bykau, U of Trento 7 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high cs421 2012 DB Fox 3 oral s19 ?
  8. 8. Siarhei Bykau, U of Trento 8 Cold-Start Problem Existing items New items Existing users New users ● Collaborative filtering ● Content based filtering ● Hybrid approaches ● SVD ● ... recommend highly-rated items to new users recommend new items to existing users based on the users’ historical ratings and features of items We are here
  9. 9. Siarhei Bykau, U of Trento 9 Cold-Start: Existing Approaches ● Random recommendations ● External knowledge – social network [Guy et al. 2009] – trust network [Jamali et al. 2010] – ontologies [Middleton et al. 2002] ● Interviews [Rashid et al. 2002] ● Pairwise regression [Park et al. 2009]
  10. 10. Siarhei Bykau, U of Trento 10 Similarity Based Predictions ● Similar items have similar ratings: ● Similarity between two items: ● Pick only topK similar items
  11. 11. Siarhei Bykau, U of Trento 11 Feature Based Prediction ● Rating transfers equally to ratings of features ● Rating of a feature: ● Prediction is the average of feature ratings:
  12. 12. Siarhei Bykau, U of Trento 12 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high
  13. 13. Siarhei Bykau, U of Trento 13 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high
  14. 14. Siarhei Bykau, U of Trento 14 Course Evaluations cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high
  15. 15. Siarhei Bykau, U of Trento 15 Preference Pattern cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high
  16. 16. Siarhei Bykau, U of Trento 16 Preference Pattern cid year area instructor trimester exam student rating cs343 2011 DB Fox 1 written s5 avg cs343 2010 DB Fox 1 written s6 low cs343 2011 DB Fox 1 written s7 avg cs241 2010 PL Smith 2 oral s9 avg cs241 2011 PL Smith 2 oral s5 low cs241 2011 PL Smith 2 oral s1 high cs241 2010 PL Smith 2 oral s2 low cs120 2008 OS Fox 1 oral s4 low cs120 2009 OS Fox 1 oral s4 high cs400 2010 DB Newton 3 oral s20 high cs400 2011 DB Newton 3 oral s18 high <<DB,Fox>,avg> pattern frequency is 2/11
  17. 17. Siarhei Bykau, U of Trento 17 Entropy Based Prediction 1. model features and ratings as variables: 2. introduce a joint distribution of features and ratings to model observations: 3. Generalized Iterative Scaling (GIS) is used to find which satisfies frequent preference patterns 4. use to predict missing ratings:
  18. 18. Siarhei Bykau, U of Trento 18 Max Entropy Intuition
  19. 19. Siarhei Bykau, U of Trento 19 Metrics ● Predictability – Root Mean Square Error (individual rating accuracy) – Normalized Discounted Cumulative Gain (accuracy in order) ● Coverage
  20. 20. Siarhei Bykau, U of Trento 20 Datasets ● Stanford Courses – from 1997 to 2008 – 9799 ratings – 675 courses – 193 instructors – features: title, description, department ● MovieLens – 100K ratings – 1000 users – 1700 movies – 42000 unique features (39 features per movie in average)
  21. 21. Siarhei Bykau, U of Trento 21 Algorithms ● Similarity-based ● Feature-based ● Max entropy ● Linear regression [Park et al. 2009]
  22. 22. Siarhei Bykau, U of Trento 22 Accuracy/Coverage for Varying Training Data Size (Stanford)
  23. 23. Siarhei Bykau, U of Trento 23 Average/Coverage for Varying Density of Features (MovieLens)
  24. 24. Siarhei Bykau, U of Trento 24 Conclusions ● Addressed the new-user new-item cold start problem ● Proposed a number of algorithms: – Similarity-based – Feature-based – Max entropy ● Experimental evaluation showed a high effectiveness of the algorithms (Max entropy is the best)

×