Collaborative Filtering at Spotify

Music recommendations at Spotify

Erik Bernhardsson

erikbern@spotify.com

Recommendation stuff at Spotify

Collaborative filtering

Idea:
- If two movies x, y get similar ratings then they are probably similar
- If a lot of users all listen to tracks x, y, z, then those tracks are
probably similar

Aggregate data

Throw away temporal information and just look at the number of times

OK, so now we have a big matrix

Supervised collaborative filtering is pretty much matrix completion

Supervised learning: Matrix completion

Supervised: evaluating rec quality

Unsupervised learning

- Trying to estimate the density
- i.e. predict probability of future events

Try to predict the future given the past

We can calculate correlation coefficient as an item similarity

- Use something like Pearson, Jaccard, …

Amazon did this for “customers who bought this also bought”

- US patent 7113917

Parallelization is hard though

Can speed this up using various LSH tricks

- Twitter: Dimension Independent Similarity Computation (DISCO)

Natural Language Processing has a lot of similar problems

…matrix factorization is one idea

Matrix factorization

- Want to get user vectors and item vectors
- Assume f latent factors (dimensions) for each user/item

Probabilistic Latent Semantic Analysis (PLSA)

- Hofmann, 1999
- Also called PLSI

PLSA, cont.

+ a bunch of constraints:

PLSA, cont.

Optimization problem: maximize log-likelihood

“Collaborative Filtering for Implicit Feedback Datasets”

- Hu, Koren, Volinsky (2008)

“Collaborative Filtering for Implicit Feedback Datasets”, cont.

What happens each iteration

- Assign all latent vectors small random values
- Perform gradient ascent to optimize log-likelihood

Calculate derivative and do gradient ascent

- Assign all latent vectors small random values
- Perform gradient ascent to optimize log-likelihood

Vectors are pretty nice because things are now super fast

- User-item score is a dot product:

- Item-item similarity score is a cosine similarity:

- Both cases have trivial complexity in the number of factors f:

Example: item similarity as a cosine of vectors

Two dimensional example for tracks

We can rank all tracks by the user’s vector

One iteration of a matrix factorization algorithm

“Google News personalization: scalable online collaborative filtering”

So now we solved the problem of recommendations right?

Actually what we really want is to apply it to other domains

Radio

- Artist radio: find related tracks
- Optimize ensemble model based on skip/thumbs data

Learning from feedback is actually pretty hard

Last but not least: we’re hiring!

Collaborative Filtering at Spotify

More Related Content

What's hot

Viewers also liked

Similar to Collaborative Filtering at Spotify

Recently uploaded

Collaborative Filtering at Spotify