Recommender System:Algorithms & Architecture     xiangliang@hulu.com
Outline    Problem    Data•    Algorithms•    Cold start•    Architecture••                         Recommender           ...
ProblemRecommend items to users to make user, contentpartner, websites happy!
Data• User behaviors data  Page view         All user        Very Large  Behavior          User            Size  Watch vid...
Data• Which data is most  important                     Page view         All user        Very Large                      ...
Data• Data Structure  – User ID  – Item ID  – Behavior Type  – Behavior Content  – Context     • Timestamp     • Location ...
Algorithms                              Recommender                              System Method   Collaborative        Cont...
Neighborhood-based• User-based  – Digg• Item-based  – Amazon, Netflix, YouTube, Hulu, …
User-based• Algorithm  – For user u, find a set of users S(u) have similar    preference as u.  – Recommend popular items ...
User-based CFpui =           ∑        v∈S ( u , K ) ∩ N ( i )                                  wuv rvi            N (u ) ∩...
Item-based• Algorithm  – For user u, get items set N(u) this user like    before.  – Recommend items which are similar to ...
Item-based CFpui =           ∑        j∈S ( i , K ) ∩ N ( u )                                  w ji ruj           N (i ) ∩...
Item-based CFWhy not use w ij =                      ?                     N (i ) ∩ N ( j )                          N (i )
Neighborhood-based• User-based vs. Item-based                    User-based              Item-based  Scalability       Bad...
References• Amazon.com Recommendations item-to-  item Collaborative Filtering.• Empirical Analysis of Predictive Algorithm...
Graph-based• Users’ behaviors on items can be  represented by bi-part graph. A       1   A       1   A       1   A   1 B  ...
Graph-based• Two nodes will have high relevance if  – There are many paths in graph between two    nodes.  – Most of paths...
Graph-based• Advantage  – Heterogeneous data            A   1     • Multiple user behaviors     • Social Network          ...
References• A Graph-based Recommender System for  Digital Library.• Random-walk computation of similarities  between nodes...
Latent Factor Model• Users and items are connect by latent  features.       A                        1                   a...
Latent Factor Model       rui = ∑ puk qik       ˆ                        kScience Fiction   0.5       Science Fiction   0....
Latent Factor Model• How to get p, q? min ∑ (rui − ∑ puk qik ) + λ ( pu       + qi )                         2           2...
Latent Factor Model• How to define rui  – Rating prediction  – Top-N recommendation     • Implicit feedback data: only hav...
Latent Factor Model    1 (Sci-fi)         2 (Crime)         3 (Family)        4 (Horror)                                  ...
Latent Factor Model• Advantage  – High accuracy in rating prediction  – Auto group items  – Scalability is good  – Learnin...
References• http://www.informatik.uni-  trier.de/~ley/db/indices/a-  tree/k/Koren:Yehuda.html
Cold Start• Problems  – User cold start : new users  – Item cold start : new items  – System cold start : new systems
User Cold Start• How to recommend items to new users?  – Non-personalization recommendation     • Most popular items     •...
User Cold Start• Example: Gender and TV shows Data comes from IMDB : http://www.imdb.com/title/tt0412142/ratings
User Cold StartMaleAge : 20-30Theoretical physicistDoctorAmericanIrreligious
How to get user interest quickly• When new user comes, his feedback on  what items can help us better understand  his inte...
Item Cold Start• How to recommend new items to user?  – Do not recommend                       How to recommend news??
Item Cold Start• How to recommend new items to user?  – Using content information                 Machine                 ...
System Cold Start• How to design recommender system when  there is no user?  – Pandora : Music Genome Project  – Jinni : M...
Architecture• Feature-based recommendation framework:      A                      1                  a      B             ...
Architecture    Male   Scientist   Physics
Architecture• Advantage:  – Heterogeneous data  – Reasonable Explanation• Disadvantage:  – Do not support user-based methods
Open Questions• How to weight multiple behaviors?• How to improve diversity, novelty?• How to build feedback loop?
Thanks!
Upcoming SlideShare
Loading in...5
×

Recommender system algorithm and architecture

13,669

Published on

Published in: Technology, Education
3 Comments
47 Likes
Statistics
Notes
No Downloads
Views
Total Views
13,669
On Slideshare
0
From Embeds
0
Number of Embeds
11
Actions
Shares
0
Downloads
742
Comments
3
Likes
47
Embeds 0
No embeds

No notes for slide

Recommender system algorithm and architecture

  1. 1. Recommender System:Algorithms & Architecture xiangliang@hulu.com
  2. 2. Outline Problem Data• Algorithms• Cold start• Architecture•• Recommender System
  3. 3. ProblemRecommend items to users to make user, contentpartner, websites happy!
  4. 4. Data• User behaviors data Page view All user Very Large Behavior User Size Watch video All user Large Favorite Register user Middle Vote Register user Middle Add to playlist Register user Small Facebook like Register user Small Share Register user Small Review Register user Small
  5. 5. Data• Which data is most important Page view All user Very Large Behavior User Size – Main behavior in the Watch video All user Large website Favorite Register user Middle Vote Register user Middle – All user can have such Add to playlist Register user Small behavior Facebook like Register user Small – Cost Share Register user Small Review Register user Small – Reflect user interests on items
  6. 6. Data• Data Structure – User ID – Item ID – Behavior Type – Behavior Content – Context • Timestamp • Location • Mood Sheldon watch Star Trek with his friends at home
  7. 7. Algorithms Recommender System Method Collaborative Content Social …… Filtering Filtering Filtering Latent FactorGraph-based …… ModelNeighborhood …… -based User-based Item-based ……
  8. 8. Neighborhood-based• User-based – Digg• Item-based – Amazon, Netflix, YouTube, Hulu, …
  9. 9. User-based• Algorithm – For user u, find a set of users S(u) have similar preference as u. – Recommend popular items among users in S(u) to user u.
  10. 10. User-based CFpui = ∑ v∈S ( u , K ) ∩ N ( i ) wuv rvi N (u ) ∩ N (v)w uv = N (u ) ∪ N (v)
  11. 11. Item-based• Algorithm – For user u, get items set N(u) this user like before. – Recommend items which are similar to many items in N(u) to user u.
  12. 12. Item-based CFpui = ∑ j∈S ( i , K ) ∩ N ( u ) w ji ruj N (i ) ∩ N ( j )w ij = N (i ) ∪ N ( j )
  13. 13. Item-based CFWhy not use w ij = ? N (i ) ∩ N ( j ) N (i )
  14. 14. Neighborhood-based• User-based vs. Item-based User-based Item-based Scalability Bad when user size is Bad when item size is large large Explanation Bad Good Novelty Bad Good Coverage Bad Good Cold start Bad for new users Bad for new items Performance Need to get many Only need to get users history current user’s history
  15. 15. References• Amazon.com Recommendations item-to- item Collaborative Filtering.• Empirical Analysis of Predictive Algorithms for Collaborative Filtering.
  16. 16. Graph-based• Users’ behaviors on items can be represented by bi-part graph. A 1 A 1 A 1 A 1 B 2 B 2 B 2 B 2 C 3 C 3 C 3 C 3 D 4 D 4 D 4 D 4
  17. 17. Graph-based• Two nodes will have high relevance if – There are many paths in graph between two nodes. – Most of paths between two nodes is short. – Most paths do not go through nodes with high out-degree.
  18. 18. Graph-based• Advantage – Heterogeneous data A 1 • Multiple user behaviors • Social Network B 2 • Context (Time, Location) C 3• Disadvantage D 4 – Statistical-based – High cost for long path
  19. 19. References• A Graph-based Recommender System for Digital Library.• Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation.
  20. 20. Latent Factor Model• Users and items are connect by latent features. A 1 a B 2 b C 3 c D 4
  21. 21. Latent Factor Model rui = ∑ puk qik ˆ kScience Fiction 0.5 Science Fiction 0.9 Universe 0.9 Universe 0.9 Physical 0.8 Physical 0.5 Space Travel 0.8 Space Travel 0.7 Animation 0.3 Animation 0.1 Romance 0.0 Romance 0.0
  22. 22. Latent Factor Model• How to get p, q? min ∑ (rui − ∑ puk qik ) + λ ( pu + qi ) 2 2 2 ( u ,i ) k = α (eui qik − λ puk ) puk += α (eui puk − λ qik )qik +
  23. 23. Latent Factor Model• How to define rui – Rating prediction – Top-N recommendation • Implicit feedback data: only have positive samples and missing values, how to select negative samples?
  24. 24. Latent Factor Model 1 (Sci-fi) 2 (Crime) 3 (Family) 4 (Horror) The Blair WitchThe invisible Man Jaws 101 Dalmatians Project Frankenstein Back to the Meets the Wolf Lethal Weapon Pacific Heights Future Man Godzilla Total Recall Groundhog Day Stir of Echoes Star Wars VI Reservoir Dogs Tarzan Dead CalmThe Terminator Donnie Brasco The Aristocats Phantasm The Jungle Book Alien The Fugitive Sleepy Hollow 2 Alien 2 La shou Shen tan Antz The Faculty
  25. 25. Latent Factor Model• Advantage – High accuracy in rating prediction – Auto group items – Scalability is good – Learning-based• Disadvantage – Incremental updating – Real-time – Explanation
  26. 26. References• http://www.informatik.uni- trier.de/~ley/db/indices/a- tree/k/Koren:Yehuda.html
  27. 27. Cold Start• Problems – User cold start : new users – Item cold start : new items – System cold start : new systems
  28. 28. User Cold Start• How to recommend items to new users? – Non-personalization recommendation • Most popular items • Highly Rated items – Using user register profile (Age, Gender, …)
  29. 29. User Cold Start• Example: Gender and TV shows Data comes from IMDB : http://www.imdb.com/title/tt0412142/ratings
  30. 30. User Cold StartMaleAge : 20-30Theoretical physicistDoctorAmericanIrreligious
  31. 31. How to get user interest quickly• When new user comes, his feedback on what items can help us better understand his interest? – Not very popular – Can represent a group of items – Users who like this item have different preference with users who dislike this item
  32. 32. Item Cold Start• How to recommend new items to user? – Do not recommend How to recommend news??
  33. 33. Item Cold Start• How to recommend new items to user? – Using content information Machine Data Mining Recommendation Learning
  34. 34. System Cold Start• How to design recommender system when there is no user? – Pandora : Music Genome Project – Jinni : Movie Genome Project
  35. 35. Architecture• Feature-based recommendation framework: A 1 a B 2 b C 3 c D 4 User Feature Item
  36. 36. Architecture Male Scientist Physics
  37. 37. Architecture• Advantage: – Heterogeneous data – Reasonable Explanation• Disadvantage: – Do not support user-based methods
  38. 38. Open Questions• How to weight multiple behaviors?• How to improve diversity, novelty?• How to build feedback loop?
  39. 39. Thanks!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×