5. !
Big Data @Rdio!
Tracks metadata!
Signal
Processing!
Millions of hrs of music streamed
every month!
Clicks!
User Demography!
Social Info!
Every single
interaction!
12. !
Baseline -Popularity!
Recommend based on popularity of tracks!
Pros:!
• Again, a very simple model!
• Easy to implement!
• More efficient on Apache Giraph(by exploiting its property)!
• Always a good baseline!
Cons:!
• Not really recommending anything!
• No element of discovery!
18. !
Pros!
• Easy to reason models!
• Easily scaled via Map Reduce.!
• Gives decent performance on test set!
Cons!
• If users and the items space are not stable, then things can and will go
wrong.!
• Lacks serendipity.!
• No guarantee on the number of predictions/user. !
!
19. !
Latent Factor Models!
Approach pioneered during Netflix Prize Competition.!
Key idea is to decompose rating matrix into multiple lower rank
approximations.!
23. !
Pros!
• Tries to learn the underlying concepts!
• User/ item supplementary information can be baked in into learning
algorithm (factorization machines).!
!
Cons:!
• Doesn’t perform as well as simple nearest models!
• Interpretation of latent space is hard.!
!
24. !
Bayesian Personalized Ranking!
• Constructs a preference order for each user!
• Directly optimizes the ranking function!
• Takes into account the order preference.!
• Implemented in scalable fashion on top of Apache
Giraph!
31. !
Current/Future work!
• Build an ensemble model to incorporate other models.!
• Simplify A/B testing framework.!
• Integrate content based recommendations.!
• Experimenting with some deep-learning techniques.!
• Incorporate information from the www.!