Recommendations 101

Building
Recommendation
Products 101

Esh Kumar
Machine Learning & Data Products @ Spotify NYC
@eshvk

Who am I?
• UT Austin Machine Learning
• Building Recommendation Systems @ Mozilla,
StumbleUpon & Spotify

Products @ Spotify
•Discover … to find new albums
•Discover Weekly … A weekly Playlist
•Editorial Playlist Recommendations
•Radio

Products @ StumbleUpon
• Content extraction and recommendation
pipelines.
• Mobile Recommendations.

Products @ Mozilla
• Grouperfish: Generalized Clustering of
large scale text.

Product Personalization
•Understanding People
➡ User Experience, Cultural Variations
•Understanding Content
➡ Genres, Cultural knowledge
•Models
➡ Collaborative Filtering, Content Based
ML
Content
User

• Machine Learning does not trump a bad idea.  
• Idea -> Data Driven Product Development -> ML
(More like design than coding)
ML
Content
User

•Models
• News, Blogs, NLP

•Models
(http://musicmachinery.com/2014/02/10/gender-
specific-listening/)

•Models
(http://musicmachinery.com/2014/02/13/age-specific-
listening/)

•Models
• Manually tag attributes
• Curation

•Models
(latimes.com)

•Models
(https://research.google.com/bigpicture/music/)

•Models
(http://www.theverge.com/2012/3/18/2882372/netflix-recommended-genres-list)

•Models
• Manually tag attributes
• Curation
• CF

30 Million Songs…
WhatTo Play?
75 Million People … 1 Person Every 3 Secs…

Recommendation Systems
• Predict user response to options.
• Rich field: Matrix completion, ranking, text models,
latent factor models.
• Several conferences annually. RecSys, NIPS, ICML etc
• Industry researchers include NFLX, GOOG, MS and
more…

Similarity
Our problem is to figure out how similar two
items are.
Mathematically, this means modeling a function
Similarity(x,y) for all users and items, if possible.

Collaborative Filtering
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Model you based on songs you played…
Predict your future based on similar users…
Millions of users and billions of streams…
…. so there is someone like you out there

Collaborative Filtering
The Netflix Prize.
A million dollars for beating NFLX’s
best algorithms by ~ 10%.

Neighborhood Models
The Amazon approach…

Matrix Completion
Matrix Completion. A matrix expresses a system. We model the
data in the form of a matrix. For example, play counts for all songs
and all users could be:
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn

Matrix Completion is well studied …
Start with random vectors around the origin. Run alternating least
squares or gradient descent or stochastic gradient descent… All this
is Hadoopable™.
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Users
8
>>>>>><
>>>>>>:
0
B
B
B
B
B
B
@
Song Plays
z }| {
s1,1 s1,2 14 · · · s1,n
s2,1 s2,2 2 · · · s2,n
·
·
·
sm,1 sm,2 1 · · · sm,n
1
C
C
C
C
C
C
A
Call Me Maybe
Esh
Esh listened to call me maybe once…
⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn⇡
0
B
B
B
B
B
B
B
B
B
@
u1
u2
...
...
...
um
1
C
C
C
C
C
C
C
C
C
A
t1 t2 · · · · · · · · · tn

Hands On Coding…
Please point your browser to: 
 
https://github.com/eshwaran/recs101workshop

Language Models
• Language models work well too. For example,
a playlist could be considered as a document
and you could learn the latent vectors for tracks
(words).
• Then represent a User as a linear combination
of their Tracks.

word2vec
Words with similar contexts have similar
meaning

word2vec
Target Word
Context Word

word2vec
Target Words and Corresponding Contexts
shining bright trees dark green
stars 61 50 10 30 1
sun 71 60 5 2 0
cucumber 2 1 15 3 40

word2vec
Playlists CPU Vectors
Read GetVectors & Update

The Record Store…
The List Maker …
How do you scale this?

Tools of the trade
• Build models in Python.
• Jobs in Scalding + Luigi ( https://github.com/spotify/luigi )
• Storm for real time.
• In house RPC for serving requests.

General Tips
• Analyze, prototype and then build.
• Simpler algorithms are easier to test than harder ones.
• Data Science is more art than science. Employthe laugh test of
evaluating your results.

Join the band!
• Machine Learning, Data & Backend Gigs.
• Now touring in New York, Boston & Stockholm!

Recommendations 101

Recommended

Recommended

More Related Content

Similar to Recommendations 101

Similar to Recommendations 101 (20)

Recently uploaded

Recently uploaded (20)

Recommendations 101