SlideShare a Scribd company logo
April 14, 2014
Music discovery at Spotify
Music + ML = ❤
April 14, 2014
I’m Erik Bernhardsson
Engineering Manager at Spotify in NYC
@fulhack
The “Prism” team
Chris Johnson
Andy Sloane
Sam Rozenberg
Ahmad Qamar
Romain Yon
Gandalf Hernandez
Neville Li
Rodrigo Araya
Edward Newett
Emily Samuels
Vidhya Murali
Rohan Agrawal
3
Section name
40 million tracks
... but where to start?
4
Discover page
5
Radio
6
How do you scale this?
7
How do we structure music understanding?
How do you teach music to machines?
!
Editorial tagging
Audio analysis
Metadata
Natural language processing
Collaborative filtering
8
Collaborative filtering
Find patterns in usage
data
!
With millions of users and
billions of streams, lots of
patterns
9
Hey,
I like tracks P, Q, R, S!
Well,
I like tracks Q, R, S, T!
Then you should check out
track P!
Nice! Btw try track T!
Some real data points
36.5% of playlists containing Notorious BIG also contain 2Pac
(6.4% of playlists containing Notorious BIG also contain Justin Bieber)
!
10
Main problem: how similar are two items?
If you understand that well, you can do most other things.
!
So our main problem: how do you model a function similarity(x, y)
!
For item similarity it’s also much easier to acquire good test set data, unlike personal
recommendations. It’s hard to evaluate personal recommendations – most offline metrics like
precision are irrelevant.
!
11
“Essentially, all models are wrong,
but some are useful.”
– George Box
!
!
!
We can’t perfectly model how users choose music. But modeling is a craft not a science and we can
use common sense when building models.
!
For play count, is Poisson or a Normal distribution better?
!
Always check your assumptions. Eg. SVD minimizes squared loss, which assumes the underlying
data is Gaussian. Is it?
12
OK so how do we do it?
There’s a lot of interesting unsupervised language models that work really well for us. Docs =
playlists/users, words=tracks/artists/albums. You could also call it implicit collaborative filtering
because we have no ratings whatsoever.
!
Main approach: matrix factorization (or latent factor methods), historically with bag-of-words on play
counts (but today sequence is also important)
13
Or more generally:
P =
0
B
B
B
@
p11 p12 . . . p1n
p21 p22 . . . p2n
...
...
pm1 pm2 . . . pmn
1
C
C
C
A
The idea with matrix factorization is to represent this probability distribu-
tion like this:
pui = aT
u bi
M0
= AT
B
0
B
B
B
B
B
B
@
1
C
C
C
C
C
C
A
⇡
0
B
B
B
B
B
B
@
1
C
C
C
C
C
C
A
| {z }
f
f
Section name 14
Step 1: Put everything into a big sparse matrix
15
@ . . . 7 . . . . . . . . .
...
...
...
A
a very big matrix too:
M =
0
B
B
B
@
c11 c12 . . . c1n
c21 c22 . . . c2n
...
...
cm1 cm2 . . . cmn
1
C
C
C
A
| {z }
107
items
9
>>>>>>>>>=
>>>>>>>>>;
107
users
Matrix example
Roughly 25 billion nonzero entries
Total size is roughly 25 billion * 12 bytes = 300 GB (“medium data”)
16
Erik
Never gonna give
you up
Erik listened to Never
gonna give you up 1
times
For instance, for PLSA
Probabilistic Latent Semantic Indexing (Hoffman, 1999)
Invented as a method intended for text classification
17
P =
0
B
B
B
B
B
B
@
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1
C
C
C
C
C
C
A
⇡
0
B
B
B
B
B
B
@
. .
. .
. .
. .
. .
. .
1
C
C
C
C
C
C
A
| {z }
user vectors
✓
. . . . . . .
. . . . . . .
◆
| {z }
item vectors
PLSA
0
B
B
B
B
B
B
@
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1
C
C
C
C
C
C
A
| {z }
P (u,i)=
P
z
P (u|z)P (i,z)
⇡
0
B
B
B
B
B
B
@
. .
. .
. .
. .
. .
. .
1
C
C
C
C
C
C
A
| {z }
P (u|z)
✓
. . . . . . .
. . . . . . .
◆
| {z }
P (i,z)
X
Run this for n iterations
Start with random vectors
around the origin.
!
Then run alternating least
squares, gradient
descent, or something
like that.
18
Why are latent factor models nice?
They find vectors which are super small fingerprints of the musical style or the user’s taste
Usually something like 40-1000 elements
19
0.87 1.17 -0.26 0.56 2.21 0.77 -0.03
Latent factor 1
Latent factor 2
track x's vector
Track X:
Why are latent factor models nice? (part 2)
- Fast (linear in input size)
- Do not have a big problem with overfitting
- Have a solid underlying model (i.e. not just a bunch of heuristics)
- Easy to scale (at least compared to other models)
- Gives a compact representation of items
20
Similarity now becomes schoolbook trigonometry
21
Latent factor 1
Latent factor 2
track x
track y
cos(x, y) = HIGH
IPMF item item:
P(i ! j) = exp(bT
j bi)/Zi =
exp(bT
j bi)
P
k exp(bT
k bi)
VECTORS:
pui = aT
u bi
simij = cos(bi, bj) =
bT
i bj
|bi||bj|
O(f)
i j simi,j
2pac 2pac 1.0
2pac Notorious B.I.G. 0.91
2pac Dr. Dre 0.87
2pac Florence + the Machine 0.26
IPMF item item:
P(i ! j) = exp(bT
j bi)/Zi =
exp(bT
j bi)
P
k exp(bT
k bi)
VECTORS:
pui = aT
u bi
simij = cos(bi, bj) =
bT
i bj
|bi||bj|
O(f)
i j simi,j
2pac 2pac 1.0
2pac Notorious B.I.G. 0.91
2pac Dr. Dre 0.87
2pac Florence + the Machine 0.26
Florence + the Machine Lana Del Rey 0.81
IPMF item item MDS:
P(i ! j) = exp(bT
j bi)/Zi =
exp( |bj bi|
2
)
P
k exp( |bk bi|
2
)
Why does cosine make sense?
Intuitively it makes sense, because we’re factoring out popularity and introducing a distance metric.
!
In fact, best result seems to be: train a latent factor model as usual, but normalize all vectors as a
post-processing step.
!
Even for models without any geometric interpretation (like LDA), cosine works
22
It’s still tricky to search for similar tracks though
Locality Sensitive Hashing:
Cut the space recursively by random
plane.
If two points are close, they are more
likely to end up on the same side of
each plane.
!
https://github.com/spotify/annoy
23
Source:
…So what models have we experimented with?
24
Section name
Old school models
- Latent Semantic Analysis (LSA)
- Probabilistic Latent Semantic Analysis (PLSA)
- Latent Dirichlet Allocation (LDA)
!
Bag of words models
Need a lot of topics, and usually not very great for music recs
25
What about scalability of models?
When we started experimenting with latent factor models, PLSA needed at least 400 factors (topics)
to give decent results.
!
That gives at least 10 billion parameters, or way more that you could conveniently fit in RAM.
!
So what to do? We turned to Hadoop.
26
One iteration, one map/reduce job
“Google News Personalization: Scalable Online Collaborative Filtering”
27
Reduce stepMap step
u % K = 0
i % L = 0
u % K = 0
i % L = 1
...
u % K = 0
i % L = L-1
u % K = 1
i % L = 0
u % K = 1
i % L = 1
... ...
... ... ... ...
u % K = K-1
i % L = 0
... ...
u % K = K-1
i % L = L-1
item vectors
item%L=0
item vectors
item%L=1
item vectors
i % L = L-1
user vectors
u % K = 0
user vectors
u % K = 1
user vectors
u % K = K-1
all log entries
u % K = 1
i % L = 1
u % K = 0
u % K = 1
u % K = K-1
Section name
Other MF models
- Collaborative Filtering for Implicit Feedback Datasets (“Koren”)
- “vector_exp”: our own: every stream is a softmax over all tracks
!
Need a much more compact representation of items, typically only say 40 elements.
!
Benefit a lot from handling the zero case separately
28
Section name
New trendy models
- Recurrent neural networks (RNN)
- word2vec
!
Take into account sequence of events
!
Future: Take into account the time – maybe hidden markov models, etc?
29
Power of combining models
All models have their own objective and their own biases. Combining them (with Gradient Boosting
Decision Trees) yields kickass results:
30
Section name
Album cover based models
Just a fun experiment that shows that any signal (weak learner) adds value to the ensemble. Turns
out it probably just works as a classifier for minimal techno. We will most likely never put this in
production :)
31
What happened with Hadoop?
Most newer models don’t need a ton of latent factors, so all parameters fit nicely in RAM.
!
Additionally, you can do more complex things on a single machine. Lately we’ve started focusing on
a combination of non-scalable models (more complex, less data) and scalable models (simple, but
with more data)
!
Hadoop makes things “scale”, but at a ridiculous constant I/O overhead. We are in the process of
moving our models to Spark instead
!
32
Orders of magnitude numbers
Data points Parameters Time to train
Single-machine model 1B 100M 10h
Hadoop model 100B 10B 10h
Spark?? 100B 10B 1h
33
Source:
What are we optimizing for?
... a story of surrogate loss functions
34
We want to optimize Spotify’s “success”
Long term business value or something similar.
Problem: You only get one shot!
35
Let’s run A/B tests
Typically: DAU (daily active users), Day 2 retention, etc
Super inefficient way of collecting roughly 1 bit of information!
36
So let’s do offline testing
Editorial judgement
“Look at the results”
!
37
The “Daft Punk Test”
… why does collaborative filtering always fail?
38
LDA RNN Koren PLSA vector_exp
Daft Punk Daft Punk Daft Punk Daft Punk Daft Punk
Daft Punk - Stardust Rizzle Kicks Coldplay The PURSUIT Gorillaz
Raccoon Daft Funk Gotye Junior Senior deadmau5
Dave Droid La Roux Lana Del Rey Chuckie & LMFAO Macklemore & Ryan
Lewis
The Local Abilities Rudimental Of Monsters And Men Beatbullyz M83
Daft Funk Pacjam The Lumineers Pursuit Gotye
M83 VS Big Black Delta Su Bailey Green Day La Roux The xx
Leandro Dutra Capital Cities John Mayer Fatboy Slim Calvin Harris
Huw Costin YYZ Foster The People Chase & Drive Kavinsky
Jesús Alonso Various, WMGA Florence + The Machine Knivez Out Coldplay
Wait maybe machines can evaluate things?
Sure! We just need a ground truth data set
!
Use things like thumbs, skips, editorial data sets
!
Note that thumbs etc has observation bias
!
Doesn’t have to be as high volume, few thousand data points is enough
!
We can also optimize for this using e.g. GBDT
39
Again, GBDT’s are pretty cool:
40
Ensemble workflow
41
Cross validate ensemble model
Model 1 Model 2 Model 3 ... Model n
Thumbs Gradient boosted decision tree
Combined model Offline metrics
Production
Editorial data sets
This One Weird Trick Sort of Fixes Observation Bias
Augment the data set with lots of random negative data. Works well in practice.
42
parameter 2
parameter 1
current best estimate
+
-
+
+
+
+
+
+
+ -
-
-
--
-
-
data points from earlier batches
What have we learned so far?
- Figuring out what to optimize for is hard
- Combining lots of models really helps
- Large scale algorithms are great, but not everything has to scale
43
So what are we working on now?
Combine even more signals
- Content-based methods: use audio, lyrics, images
- Read about music and understand it
- Personalize everything
- Just acquired Echo Nest in Boston!
44

More Related Content

What's hot

Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
Chris Johnson
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
Chris Johnson
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
Vidhya Murali
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
Chris Johnson
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
Roelof van Zwol
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
Neville Li
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
Alexandros Karatzoglou
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
Chris Johnson
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
Mounia Lalmas-Roelleke
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
Hakka Labs
 
Deep learning for audio-based music recommendation
Deep learning for audio-based music recommendationDeep learning for audio-based music recommendation
Deep learning for audio-based music recommendation
Russia.AI
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
Robin Reni
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
Vidhya Murali
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
Mounia Lalmas-Roelleke
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
Ching-Wei Chen
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Jaya Kawale
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci
 

What's hot (20)

Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial Deep Learning for Recommender Systems RecSys2017 Tutorial
Deep Learning for Recommender Systems RecSys2017 Tutorial
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Deep learning for audio-based music recommendation
Deep learning for audio-based music recommendationDeep learning for audio-based music recommendation
Deep learning for audio-based music recommendation
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Machine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at SpotifyMachine Learning and Big Data for Music Discovery at Spotify
Machine Learning and Big Data for Music Discovery at Spotify
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 

Similar to Music recommendations @ MLConf 2014

Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
Arnaud de Myttenaere
 
Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?
Montreal Python
 
Introduction to Max-SAT and Max-SAT Evaluation
Introduction to Max-SAT and Max-SAT EvaluationIntroduction to Max-SAT and Max-SAT Evaluation
Introduction to Max-SAT and Max-SAT Evaluation
Masahiro Sakai
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Model
guest2a5acfb
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
AnalyticsWeek
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
Apache MXNet
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Keunwoo Choi
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
Steffen Staab
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understanding
Seiichi Uchida
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
Rothamsted Research, UK
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
Charles Martin
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
Valerio Maggio
 
ScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin OderskyScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin Odersky
Typesafe
 
Lecture 06 marco aurelio ranzato - deep learning
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learning
mustafa sarac
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro Calmanovici
Alessandro Calmanovici
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
DataWorks Summit
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Fabian Pedregosa
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
Evgeniy Marinov
 
OXFORD'13 Optimising OWL 2 QL query rewriring
OXFORD'13 Optimising OWL 2 QL query rewriringOXFORD'13 Optimising OWL 2 QL query rewriring
OXFORD'13 Optimising OWL 2 QL query rewriring
Mariano Rodriguez-Muro
 

Similar to Music recommendations @ MLConf 2014 (20)

Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
 
Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?Mp26 : How do you Solve a Problem like Santa Claus?
Mp26 : How do you Solve a Problem like Santa Claus?
 
Introduction to Max-SAT and Max-SAT Evaluation
Introduction to Max-SAT and Max-SAT EvaluationIntroduction to Max-SAT and Max-SAT Evaluation
Introduction to Max-SAT and Max-SAT Evaluation
 
Threading Is Not A Model
Threading Is Not A ModelThreading Is Not A Model
Threading Is Not A Model
 
Using Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigDataUsing Topological Data Analysis on your BigData
Using Topological Data Analysis on your BigData
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, ExpectDeep Learning with Audio Signals: Prepare, Process, Design, Expect
Deep Learning with Audio Signals: Prepare, Process, Design, Expect
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Machine learning for document analysis and understanding
Machine learning for document analysis and understandingMachine learning for document analysis and understanding
Machine learning for document analysis and understanding
 
graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...graph2tab, a library to convert experimental workflow graphs into tabular for...
graph2tab, a library to convert experimental workflow graphs into tabular for...
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Number Crunching in Python
Number Crunching in PythonNumber Crunching in Python
Number Crunching in Python
 
ScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin OderskyScalaDays 2013 Keynote Speech by Martin Odersky
ScalaDays 2013 Keynote Speech by Martin Odersky
 
Lecture 06 marco aurelio ranzato - deep learning
Lecture 06   marco aurelio ranzato - deep learningLecture 06   marco aurelio ranzato - deep learning
Lecture 06 marco aurelio ranzato - deep learning
 
Master's Thesis Alessandro Calmanovici
Master's Thesis Alessandro CalmanoviciMaster's Thesis Alessandro Calmanovici
Master's Thesis Alessandro Calmanovici
 
SociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data AnalysisSociaLite: High-level Query Language for Big Data Analysis
SociaLite: High-level Query Language for Big Data Analysis
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
Factorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender SystemsFactorization Machines and Applications in Recommender Systems
Factorization Machines and Applications in Recommender Systems
 
OXFORD'13 Optimising OWL 2 QL query rewriring
OXFORD'13 Optimising OWL 2 QL query rewriringOXFORD'13 Optimising OWL 2 QL query rewriring
OXFORD'13 Optimising OWL 2 QL query rewriring
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
TIPNGVN2
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Data structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdfData structures and Algorithms in Python.pdf
Data structures and Algorithms in Python.pdf
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 

Music recommendations @ MLConf 2014

  • 1. April 14, 2014 Music discovery at Spotify Music + ML = ❤
  • 2. April 14, 2014 I’m Erik Bernhardsson Engineering Manager at Spotify in NYC @fulhack
  • 3. The “Prism” team Chris Johnson Andy Sloane Sam Rozenberg Ahmad Qamar Romain Yon Gandalf Hernandez Neville Li Rodrigo Araya Edward Newett Emily Samuels Vidhya Murali Rohan Agrawal 3
  • 4. Section name 40 million tracks ... but where to start? 4
  • 7. How do you scale this? 7
  • 8. How do we structure music understanding? How do you teach music to machines? ! Editorial tagging Audio analysis Metadata Natural language processing Collaborative filtering 8
  • 9. Collaborative filtering Find patterns in usage data ! With millions of users and billions of streams, lots of patterns 9 Hey, I like tracks P, Q, R, S! Well, I like tracks Q, R, S, T! Then you should check out track P! Nice! Btw try track T!
  • 10. Some real data points 36.5% of playlists containing Notorious BIG also contain 2Pac (6.4% of playlists containing Notorious BIG also contain Justin Bieber) ! 10
  • 11. Main problem: how similar are two items? If you understand that well, you can do most other things. ! So our main problem: how do you model a function similarity(x, y) ! For item similarity it’s also much easier to acquire good test set data, unlike personal recommendations. It’s hard to evaluate personal recommendations – most offline metrics like precision are irrelevant. ! 11
  • 12. “Essentially, all models are wrong, but some are useful.” – George Box ! ! ! We can’t perfectly model how users choose music. But modeling is a craft not a science and we can use common sense when building models. ! For play count, is Poisson or a Normal distribution better? ! Always check your assumptions. Eg. SVD minimizes squared loss, which assumes the underlying data is Gaussian. Is it? 12
  • 13. OK so how do we do it? There’s a lot of interesting unsupervised language models that work really well for us. Docs = playlists/users, words=tracks/artists/albums. You could also call it implicit collaborative filtering because we have no ratings whatsoever. ! Main approach: matrix factorization (or latent factor methods), historically with bag-of-words on play counts (but today sequence is also important) 13 Or more generally: P = 0 B B B @ p11 p12 . . . p1n p21 p22 . . . p2n ... ... pm1 pm2 . . . pmn 1 C C C A The idea with matrix factorization is to represent this probability distribu- tion like this: pui = aT u bi M0 = AT B 0 B B B B B B @ 1 C C C C C C A ⇡ 0 B B B B B B @ 1 C C C C C C A | {z } f f
  • 15. Step 1: Put everything into a big sparse matrix 15 @ . . . 7 . . . . . . . . . ... ... ... A a very big matrix too: M = 0 B B B @ c11 c12 . . . c1n c21 c22 . . . c2n ... ... cm1 cm2 . . . cmn 1 C C C A | {z } 107 items 9 >>>>>>>>>= >>>>>>>>>; 107 users
  • 16. Matrix example Roughly 25 billion nonzero entries Total size is roughly 25 billion * 12 bytes = 300 GB (“medium data”) 16 Erik Never gonna give you up Erik listened to Never gonna give you up 1 times
  • 17. For instance, for PLSA Probabilistic Latent Semantic Indexing (Hoffman, 1999) Invented as a method intended for text classification 17 P = 0 B B B B B B @ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 C C C C C C A ⇡ 0 B B B B B B @ . . . . . . . . . . . . 1 C C C C C C A | {z } user vectors ✓ . . . . . . . . . . . . . . ◆ | {z } item vectors PLSA 0 B B B B B B @ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 C C C C C C A | {z } P (u,i)= P z P (u|z)P (i,z) ⇡ 0 B B B B B B @ . . . . . . . . . . . . 1 C C C C C C A | {z } P (u|z) ✓ . . . . . . . . . . . . . . ◆ | {z } P (i,z) X
  • 18. Run this for n iterations Start with random vectors around the origin. ! Then run alternating least squares, gradient descent, or something like that. 18
  • 19. Why are latent factor models nice? They find vectors which are super small fingerprints of the musical style or the user’s taste Usually something like 40-1000 elements 19 0.87 1.17 -0.26 0.56 2.21 0.77 -0.03 Latent factor 1 Latent factor 2 track x's vector Track X:
  • 20. Why are latent factor models nice? (part 2) - Fast (linear in input size) - Do not have a big problem with overfitting - Have a solid underlying model (i.e. not just a bunch of heuristics) - Easy to scale (at least compared to other models) - Gives a compact representation of items 20
  • 21. Similarity now becomes schoolbook trigonometry 21 Latent factor 1 Latent factor 2 track x track y cos(x, y) = HIGH IPMF item item: P(i ! j) = exp(bT j bi)/Zi = exp(bT j bi) P k exp(bT k bi) VECTORS: pui = aT u bi simij = cos(bi, bj) = bT i bj |bi||bj| O(f) i j simi,j 2pac 2pac 1.0 2pac Notorious B.I.G. 0.91 2pac Dr. Dre 0.87 2pac Florence + the Machine 0.26 IPMF item item: P(i ! j) = exp(bT j bi)/Zi = exp(bT j bi) P k exp(bT k bi) VECTORS: pui = aT u bi simij = cos(bi, bj) = bT i bj |bi||bj| O(f) i j simi,j 2pac 2pac 1.0 2pac Notorious B.I.G. 0.91 2pac Dr. Dre 0.87 2pac Florence + the Machine 0.26 Florence + the Machine Lana Del Rey 0.81 IPMF item item MDS: P(i ! j) = exp(bT j bi)/Zi = exp( |bj bi| 2 ) P k exp( |bk bi| 2 )
  • 22. Why does cosine make sense? Intuitively it makes sense, because we’re factoring out popularity and introducing a distance metric. ! In fact, best result seems to be: train a latent factor model as usual, but normalize all vectors as a post-processing step. ! Even for models without any geometric interpretation (like LDA), cosine works 22
  • 23. It’s still tricky to search for similar tracks though Locality Sensitive Hashing: Cut the space recursively by random plane. If two points are close, they are more likely to end up on the same side of each plane. ! https://github.com/spotify/annoy 23
  • 24. Source: …So what models have we experimented with? 24
  • 25. Section name Old school models - Latent Semantic Analysis (LSA) - Probabilistic Latent Semantic Analysis (PLSA) - Latent Dirichlet Allocation (LDA) ! Bag of words models Need a lot of topics, and usually not very great for music recs 25
  • 26. What about scalability of models? When we started experimenting with latent factor models, PLSA needed at least 400 factors (topics) to give decent results. ! That gives at least 10 billion parameters, or way more that you could conveniently fit in RAM. ! So what to do? We turned to Hadoop. 26
  • 27. One iteration, one map/reduce job “Google News Personalization: Scalable Online Collaborative Filtering” 27 Reduce stepMap step u % K = 0 i % L = 0 u % K = 0 i % L = 1 ... u % K = 0 i % L = L-1 u % K = 1 i % L = 0 u % K = 1 i % L = 1 ... ... ... ... ... ... u % K = K-1 i % L = 0 ... ... u % K = K-1 i % L = L-1 item vectors item%L=0 item vectors item%L=1 item vectors i % L = L-1 user vectors u % K = 0 user vectors u % K = 1 user vectors u % K = K-1 all log entries u % K = 1 i % L = 1 u % K = 0 u % K = 1 u % K = K-1
  • 28. Section name Other MF models - Collaborative Filtering for Implicit Feedback Datasets (“Koren”) - “vector_exp”: our own: every stream is a softmax over all tracks ! Need a much more compact representation of items, typically only say 40 elements. ! Benefit a lot from handling the zero case separately 28
  • 29. Section name New trendy models - Recurrent neural networks (RNN) - word2vec ! Take into account sequence of events ! Future: Take into account the time – maybe hidden markov models, etc? 29
  • 30. Power of combining models All models have their own objective and their own biases. Combining them (with Gradient Boosting Decision Trees) yields kickass results: 30
  • 31. Section name Album cover based models Just a fun experiment that shows that any signal (weak learner) adds value to the ensemble. Turns out it probably just works as a classifier for minimal techno. We will most likely never put this in production :) 31
  • 32. What happened with Hadoop? Most newer models don’t need a ton of latent factors, so all parameters fit nicely in RAM. ! Additionally, you can do more complex things on a single machine. Lately we’ve started focusing on a combination of non-scalable models (more complex, less data) and scalable models (simple, but with more data) ! Hadoop makes things “scale”, but at a ridiculous constant I/O overhead. We are in the process of moving our models to Spark instead ! 32
  • 33. Orders of magnitude numbers Data points Parameters Time to train Single-machine model 1B 100M 10h Hadoop model 100B 10B 10h Spark?? 100B 10B 1h 33
  • 34. Source: What are we optimizing for? ... a story of surrogate loss functions 34
  • 35. We want to optimize Spotify’s “success” Long term business value or something similar. Problem: You only get one shot! 35
  • 36. Let’s run A/B tests Typically: DAU (daily active users), Day 2 retention, etc Super inefficient way of collecting roughly 1 bit of information! 36
  • 37. So let’s do offline testing Editorial judgement “Look at the results” ! 37
  • 38. The “Daft Punk Test” … why does collaborative filtering always fail? 38 LDA RNN Koren PLSA vector_exp Daft Punk Daft Punk Daft Punk Daft Punk Daft Punk Daft Punk - Stardust Rizzle Kicks Coldplay The PURSUIT Gorillaz Raccoon Daft Funk Gotye Junior Senior deadmau5 Dave Droid La Roux Lana Del Rey Chuckie & LMFAO Macklemore & Ryan Lewis The Local Abilities Rudimental Of Monsters And Men Beatbullyz M83 Daft Funk Pacjam The Lumineers Pursuit Gotye M83 VS Big Black Delta Su Bailey Green Day La Roux The xx Leandro Dutra Capital Cities John Mayer Fatboy Slim Calvin Harris Huw Costin YYZ Foster The People Chase & Drive Kavinsky Jesús Alonso Various, WMGA Florence + The Machine Knivez Out Coldplay
  • 39. Wait maybe machines can evaluate things? Sure! We just need a ground truth data set ! Use things like thumbs, skips, editorial data sets ! Note that thumbs etc has observation bias ! Doesn’t have to be as high volume, few thousand data points is enough ! We can also optimize for this using e.g. GBDT 39
  • 40. Again, GBDT’s are pretty cool: 40
  • 41. Ensemble workflow 41 Cross validate ensemble model Model 1 Model 2 Model 3 ... Model n Thumbs Gradient boosted decision tree Combined model Offline metrics Production Editorial data sets
  • 42. This One Weird Trick Sort of Fixes Observation Bias Augment the data set with lots of random negative data. Works well in practice. 42 parameter 2 parameter 1 current best estimate + - + + + + + + + - - - -- - - data points from earlier batches
  • 43. What have we learned so far? - Figuring out what to optimize for is hard - Combining lots of models really helps - Large scale algorithms are great, but not everything has to scale 43
  • 44. So what are we working on now? Combine even more signals - Content-based methods: use audio, lyrics, images - Read about music and understand it - Personalize everything - Just acquired Echo Nest in Boston! 44