CF Models for Music Recommendations At Spotify

September 27, 2015
Music Recommendations
@
Spotify
Vidhya Murali
@vid052

Vidhya Murali
Areas of Interest: Machine Learning & Big Data
Data Science Engineer @ Spotify
Grad Student from the University of Wisconsin Madison
aka Happy Badger for life!
Who Am I?
2

Spotify’s Big Data
Started in 2006, now available in 58 countries
70+ million active users, 20+ million paid subscribers
30+ million songs in our catalog, ~20K added every day
1.5 billion playlists so far and counting
1 TB of user data logged every day
Hadoop cluster with 1500 nodes
~20,000 Hadoop jobs per day
3

Music Recommendations at Spotify
Features:
Discover
Discover Weekly
Right Now
Radio
Related Artists
4

30 million tracks…
What to recommend?
5

Approaches 6
•Manual curation by Experts
•Editorial Tagging
•Metadata (e.g. Label Provided data, NLP over News,
Blogs)
•Audio Signals
•Collaborative Filtering Model

Collaborative Filtering Model 7
•Find patterns from user’s past behavior to generate
recommendations
•Domain independent
•Scalable
•Accuracy(Collaborative Model) >= Accuracy(Content
Based Model)

Deﬁnition of CF
8
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Legacy Slide of Erik Bernhardsson

The YoLo Problem
9
•YoLo Problem: “You Only Listen Once” to judge recommendations
•Goal: Predict if users will listen to new music (new to user)
•Challenges
•Scale of catalog (30M songs + ~20K added every day)
•Repeated consumption of music is not very uncommon
•Music is niche
•Strong correlation between music consumption and user’s context
•Input: Feedback is implicit through streaming behavior, collection adds,
browse history, search history etc

Big Matrix! 10
Tracks(n)
Users(m)
Vidhya
Burn by Ellie Goulding
Order of 70M x 30M!

Latent Factor Models 11
Vidhya
Burn
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
•Use a “small” representation for each user and items(tracks): f-dimensional
vectors
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
(here, f = 2)
m m
n
m n
User Vector Matrix:
X: (m x f)
Track Vector Matrix:
Y: (n x f)
User Track Matrix:
(m x n)

Implicit Matrix Factorization
8 0 0 0 22 0 0 54
0 0 22 0 0 47 0 0
3 0 76 0 0 0 4 55
0 212 0 0 0 1 0 0
0 0 29 0 0 43 0 0
18 0 0 0 2 0 0 36
•Aggregate all (user, track) streams into a large matrix
•Goal: Approximate binary preference matrix by the inner product of 2 smaller matrices by
minimizing the weighted RMSE (root mean squared error) using a function of total plays as weight
•Why?: Once learned, the top recommendations for a user are the top inner products between
their latent factor vector in X and the track latent factor vectors in Y.
X YUsers
Tracks
• = bias for user
• = bias for item
• = regularization parameter
• = 1 if user streamed track else 0
•
• = user latent factor vector
• = item latent factor vectoryi

Implicit Matrix Factorization 14
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
Tracks
• = bias for user
• = bias for item
•
• = item latent factor vectoryi

Alternating Least Squares 15
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
Tracks
• = bias for user
• = bias for item
•
• = item latent factor vector
Fix tracks
yi

16
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix tracks
Solve for users
Alternating Least Squares
yi
Tracks

17
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
yi
Tracks

18
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Solve for tracks
yi
Tracks

19
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Solve for tracks
Repeat until convergence…
yi
Tracks

20
1 0 0 0 1 0 0 1
0 0 1 0 0 1 0 0
1 0 1 0 0 0 1 1
0 1 0 0 0 1 0 0
0 0 1 0 0 1 0 0
1 0 0 0 1 0 0 1
X YUsers
• = bias for user
• = bias for item
•
Fix users
Solve for tracks
Repeat until convergence…
yi
Tracks

21
• the same for all users so compute only once per iteration
• weighted sum of outer products for item vectors the user
streamed
• weighted sum of item vectors the user streamed
•Key takeaways: requires O(f^2) memory, time complexity linear in number of
unique items the user streamed
f x f f x f f x 1f x f

Vectors
•“Compact” representation for users and items(tracks)

Why Vectors? 23
•Compact musical representation of users’ taste, tracks’ genome
•Vectors encode higher order dependencies so even if users who listen to Rihanna
and don’t necessarily listen to Beyonce, the vectors will understand this dependency
(based on some higher order dependency down the line)
•Item-Item and User-Item scores computed using cosine distance
•Linear complexity based on the number of latent factors
• Easy to scale up

Recommendations via Dot Product!
24

70 Million users x 30
Million tracks. How to
scale?
25

Matrix Factorization with MapReduce
26
Reduce stepMap step
u % K = 0
i % L = 0
u % K = 0
i % L = 1
...
u % K = 0
i % L = L-1
u % K = 1
i % L = 0
u % K = 1
i % L = 1
... ...
... ... ... ...
u % K = K-1
i % L = 0
... ...
u % K = K-1
i % L = L-1
item vectors
item%L=0
item vectors
item%L=1
item vectors
i % L = L-1
user vectors
u % K = 0
user vectors
u % K = 1
user vectors
u % K = K-1
all log entries
u % K = 1
i % L = 1
u % K = 0
u % K = 1
u % K = K-1
•Split the matrix up into K x L blocks.
•Each mapper gets a different block, sums up intermediate terms, then key by
user (or item) to reduce final user (or item) vector.

Matrix Factorization with MapReduce
27
One map task
Distributed
cache:
All user vectors
where u % K = x
Distributed
cache:
All item vectors
where i % L = y
Mapper Emit contributions
Map input:
tuples (u, i, count)
where
u % K = x
and
i % L = y
Reducer New vector!
•Input to Mapper is a list of (user, item, count) tuples
– user modulo K is the same for all users in block
– item modulo L is the same for all items in the block
– Mapper aggregates intermediate contributions for each user (or item)
– Eg: K=4, Mapper #1 gets user 1, 5, 9, 13 etc
– Reducer keys by user (or item), aggregates intermediate mapper sums and solves closed form for final user
(or item) vector

Annoy
70 million users, at least 4 million
tracks for recommendations.
Given user vector and track
vector, still tricky to ﬁnd recs
Brute force approach: O(70M x
4M x 40) = 0(12 peta-operations)!
Approximate Nearest Neighbor
Oh Yeah! : Local Sensitive
Hashing
https://github.com/spotify/annoy
28

Thank You!
You can reach me @
Email: vidhya@spotify.com
Twitter: @vid052

CF Models for Music Recommendations At Spotify

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to CF Models for Music Recommendations At Spotify

Similar to CF Models for Music Recommendations At Spotify (20)

Recently uploaded

Recently uploaded (20)

CF Models for Music Recommendations At Spotify