Collaborative Filtering at Spotify

64,299 views
62,027 views

Published on

From the NYC Machine Learning meetup on Jan 17, 2013: http://www.meetup.com/NYC-Machine-Learning/events/97871782/

Video is available here: http://vimeo.com/57900625

Published in: Technology
4 Comments
208 Likes
Statistics
Notes
No Downloads
Views
Total views
64,299
On SlideShare
0
From Embeds
0
Number of Embeds
2,886
Actions
Shares
0
Downloads
0
Comments
4
Likes
208
Embeds 0
No embeds

No notes for slide

Collaborative Filtering at Spotify

  1. Music recommendations at Spotify Erik Bernhardsson erikbern@spotify.com
  2. Recommendation stuff at Spotify
  3. Collaborative filtering
  4. Collaborative filteringIdea:- If two movies x, y get similar ratings then they are probably similar- If a lot of users all listen to tracks x, y, z, then those tracks are probably similar
  5. Get data
  6. … lots of data
  7. Aggregate dataThrow away temporal information and just look at the number of times
  8. OK, so now we have a big matrix
  9. … very big matrix
  10. Supervised collaborative filtering is pretty much matrix completion
  11. Supervised learning: Matrix completion
  12. Supervised: evaluating rec quality
  13. Unsupervised learning- Trying to estimate the density- i.e. predict probability of future events
  14. Try to predict the future given the past
  15. How can we find similar items
  16. We can calculate correlation coefficient as an item similarity- Use something like Pearson, Jaccard, …
  17. Amazon did this for “customers who bought this also bought”- US patent 7113917
  18. Parallelization is hard though
  19. Parallelization is hard though
  20. Can speed this up using various LSH tricks- Twitter: Dimension Independent Similarity Computation (DISCO)
  21. Are there other approaches?
  22. Natural Language Processing has a lot of similar problems…matrix factorization is one idea
  23. Matrix factorization
  24. Matrix factorization- Want to get user vectors and item vectors- Assume f latent factors (dimensions) for each user/item
  25. Probabilistic Latent Semantic Analysis (PLSA)- Hofmann, 1999- Also called PLSI
  26. PLSA, cont.+ a bunch of constraints:
  27. PLSA, cont.Optimization problem: maximize log-likelihood
  28. PLSA, cont.
  29. “Collaborative Filtering for Implicit Feedback Datasets”- Hu, Koren, Volinsky (2008)
  30. “Collaborative Filtering for Implicit Feedback Datasets”, cont.
  31. “Collaborative Filtering for Implicit Feedback Datasets”, cont.
  32. “Collaborative Filtering for Implicit Feedback Datasets”, cont.
  33. “Collaborative Filtering for Implicit Feedback Datasets”, cont.
  34. Here is another method we use
  35. What happens each iteration- Assign all latent vectors small random values- Perform gradient ascent to optimize log-likelihood
  36. What happens each iteration- Assign all latent vectors small random values- Perform gradient ascent to optimize log-likelihood
  37. What happens each iteration- Assign all latent vectors small random values- Perform gradient ascent to optimize log-likelihood
  38. Calculate derivative and do gradient ascent- Assign all latent vectors small random values- Perform gradient ascent to optimize log-likelihood
  39. 2D iteration example
  40. Vectors are pretty nice because things are now super fast- User-item score is a dot product:- Item-item similarity score is a cosine similarity:- Both cases have trivial complexity in the number of factors f:
  41. Example: item similarity as a cosine of vectors
  42. Two dimensional example for tracks
  43. We can rank all tracks by the user’s vector
  44. So how do we implement this?
  45. One iteration of a matrix factorization algorithm“Google News personalization: scalable online collaborative filtering”
  46. So now we solved the problem of recommendations right?
  47. Actually what we really want is to apply it to other domains
  48. Radio- Artist radio: find related tracks- Optimize ensemble model based on skip/thumbs data
  49. Learning from feedback is actually pretty hard
  50. A/B testing
  51. A/B testing
  52. A/B testing
  53. A/B testing
  54. More applications!!!
  55. Last but not least: we’re hiring!
  56. Thank you

×