Music Personalization
@
Spotify
Vidhya Murali
@vid052
RecSys 2016
Spotify’s Big Data
‣ Started in 2006, now available in 58 countries
‣ 100+ million active users, 35+ million paid subscribers
‣ 30+ million songs in our catalog, ~20K added every
day
‣ 2+ billion playlists
‣ 1 TB of log data every day
‣ Hadoop cluster with ~2500 nodes
3
30 Million Tracks…
What to recommend?
What to recommend?
Personalization @ Spotify
Features:
Discover
Discover Weekly
Fresh Finds
Home
Radio
Release Radar
5
Approaches
‣Manual Curation by Experts
‣Metadata (e.g: Label Provided Data, News, Blogs)
‣Audio Signals
‣Collaborative Filtering
‣ Hybrid
Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
Vidhya
Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
mUsers
Songs
Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
Vidhya
Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
mUsers
Songs
User Vector
Matrix: X: (m x f)
Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
Vidhya
Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
mUsers
Songs
User Vector
Matrix: X: (m x f)
Song Vector
Matrix: Y: (n x f)
Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
(here, f = 2)
Vidhya
Rise
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
mUsers
Songs
User Vector
Matrix: X: (m x f)
Song Vector
Matrix: Y: (n x f)
NLP Models on News and Blogs
NLP Models work great on Playlists!
Document : Playlist
NLP Models work great on Playlists!
Document : Playlist
Word : Song
NLP Models work great on Playlists!
[1] http://benanne.github.io/2014/08/05/spotify-cnns.html
Deep Learning on Audio
BlackBoxing Algorithms
Music in Latent Space
Vectors
“COMPACT” representation for users and items musical fingerprint.
Normalized Song Vectors
Vectors
“COMPACT” representation for users and items musical fingerprint.
Normalized Song Vectors
User Vector
Why Vectors?
Encodes higher order dependencies
Users and Items in the same latent space
User - Item recommendations
Item - Item similarities
Easy to scale up
Complexity is linear in order of latent factors
Recommendations
15
Normalized Song Vectors
User Vector
Recommendations
15
Normalized Song Vectors
User Vector
Ranking
Similarity score can be used for ranking
Ranking
Similarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Ranking
Similarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
Ranking
Similarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
MAB
Interactions
Impressions
Clicks
Streams
Music Personalization Data Flow
18
Challenges Unique to Spotify
Scale of catalog
Music is “niche”
Music consumption has heavy correlation to users’ context
Repeated consumption of music is NOT so uncommon.
Challenge Accepted!
Cold start problem for both users and new music/upcoming artists:
Content Based Signals
Real Time Recommendations
Measuring Quality:
Implicit: A/B Test Metrics
Explicit: Feedback from social forums
Scam Attacks:
Rule based model to detect scammers
Humans choices are not always predictable:
Faith in humanity
What Next?
‣Personalization!
‣Content signals such as lyrics, audio, images
‣Expanded Catalog: Shows, Podcasts
‣New Markets
21
We are hiring!
Thank You!
You can reach me @
Email: vidhya@spotify.com
Twitter: @vid052
23

Music Personalization At Spotify