SlideShare a Scribd company logo
1 of 32
Download to read offline
Music: Tuned to you
Mohitdeep Singh
Data Scientist!
Predictive Analytics Innovation Summit
Feb 12-13, 2015
San Diego
!
!
!
http://www.rdio.com/about/!
!
!
Big Data @Rdio!
Tracks metadata!
Signal
Processing!
Millions of hrs of music streamed
every month!
Clicks!
User Demography!
Social Info!
Every single
interaction!
!
Committed to opensource!
!
!
Scenario !
!
Scenario !
!
The answer lies in the matrix!
!
2! 7! 44!
22! 17!
9! 12!
21! 18!
77! 44!
!
Baseline -Popularity!
Recommend based on popularity of tracks!
Pros:!
•  Again, a very simple model!
•  Easy to implement!
•  More efficient on Apache Giraph(by exploiting its property)!
•  Always a good baseline!
Cons:!
•  Not really recommending anything!
•  No element of discovery!
!
Long Tail Problem!
!
Nearest Neighbors!
2! 7! 44!
22! 17!
9! 12!
21! 18!
77! 44!
!
Distance matrix!
1! 0! 0! 0.0873!
0! 1! 0! 0!
0! 1! 0! 0! 0! 0! 0! 0.3603!
0! 0! 1! 0! 1! 0! 0! 0!
0.0873!
0! 0! 1! 0! 0.0873! 0.2621! 0.8967!
0! 0! 1! 0! 1! 0! 0! 0!
..! ..! ..! ..! ..! ..! ..! ..!
!
Top-N Recommendations!
*!
≈!
P = R* D!
!
Top-N Recommendations!
*!
≈!
P = R* D!
!
Pros!
•  Easy to reason models!
•  Easily scaled via Map Reduce.!
•  Gives decent performance on test set!
Cons!
•  If users and the items space are not stable, then things can and will go
wrong.!
•  Lacks serendipity.!
•  No guarantee on the number of predictions/user. !
!
!
Latent Factor Models!
Approach pioneered during Netflix Prize Competition.!
Key idea is to decompose rating matrix into multiple lower rank
approximations.!
!
≈! *! =!
!
≈! *! =!
!
≈! *! =!
!
Pros!
•  Tries to learn the underlying concepts!
•  User/ item supplementary information can be baked in into learning
algorithm (factorization machines).!
!
Cons:!
•  Doesn’t perform as well as simple nearest models!
•  Interpretation of latent space is hard.!
!
!
Bayesian Personalized Ranking!
•  Constructs a preference order for each user!
•  Directly optimizes the ranking function!
•  Takes into account the order preference.!
•  Implemented in scalable fashion on top of Apache
Giraph!
!
Results!
Popularity!
Nearest Neighbors!
Matrix Factorization!
Weighted Matrix Factorization!
Bayesian PR!
100%!50%!0! 150%!
Comparison of algorithms considering popularity as baseline!
Note: Offline metrics tracking
MAP!
!
Candidate Tracks !
Catalogue of around 32M tracks!
!
 P(Relevant | , Artist)!
!
Track Id! Artist similarity! Track popularity! Artist popularity! Track duration! ..! ..! ..! ..! Relevant!
0/1!
“My
December”!
1! 0.992! 0.433! 482! ..! ..! ..! ..! 1!
‘’Shake it Off”! 0.03!
!
0.04!
!
0.88! 329!
!
0!
!
“Sugar”! 0.772!
!
0.95!
!
0.77!
!
220!
!
1!
!
!
!
Many open problems!
!
It’s a tough problem!!!
!
Current/Future work!
•  Build an ensemble model to incorporate other models.!
•  Simplify A/B testing framework.!
•  Integrate content based recommendations.!
•  Experimenting with some deep-learning techniques.!
•  Incorporate information from the www.!
!
Questions
Interested: Checkout https://www.rdio.com/careers/

More Related Content

Similar to Music: Tuned to you

Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesAron Ahmadia
 
Y Pipes Mashup Camp
Y Pipes Mashup CampY Pipes Mashup Camp
Y Pipes Mashup CampJinho Jung
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
SXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
SXSW 2015 Shredding Wireframes: Intro to Rapid PrototypingSXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
SXSW 2015 Shredding Wireframes: Intro to Rapid PrototypingKyle Outlaw
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum
 
Owf 2013 rii moose speaker 2
Owf 2013 rii moose speaker 2Owf 2013 rii moose speaker 2
Owf 2013 rii moose speaker 2Patrick MOREAU
 
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDBBreaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDBMongoDB
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Seo Presentation
Seo PresentationSeo Presentation
Seo PresentationAstuanax
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Ted Husted
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresSerena Villata
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013Ken Mwai
 
Philly ete-2011
Philly ete-2011Philly ete-2011
Philly ete-2011davyjones
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersDenis_infinum
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big DataMiguel Pastor
 
LT-Accelerate 2016: Between Custom and Off-the-shelf NLP
LT-Accelerate 2016: Between Custom and Off-the-shelf NLPLT-Accelerate 2016: Between Custom and Off-the-shelf NLP
LT-Accelerate 2016: Between Custom and Off-the-shelf NLPYves Peirsman
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...multimediaeval
 
The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015Will Gage
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jKevin Watters
 

Similar to Music: Tuned to you (20)

Software Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical SciencesSoftware Carpentry for the Geophysical Sciences
Software Carpentry for the Geophysical Sciences
 
Y Pipes Mashup Camp
Y Pipes Mashup CampY Pipes Mashup Camp
Y Pipes Mashup Camp
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
SXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
SXSW 2015 Shredding Wireframes: Intro to Rapid PrototypingSXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
SXSW 2015 Shredding Wireframes: Intro to Rapid Prototyping
 
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
Infinum Android Talks #03 - Android Design Best Practices - for Designers and...
 
Owf 2013 rii moose speaker 2
Owf 2013 rii moose speaker 2Owf 2013 rii moose speaker 2
Owf 2013 rii moose speaker 2
 
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDBBreaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
Breaking the Oracle Tie; High Performance OLTP and Analytics Using MongoDB
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Seo Presentation
Seo PresentationSeo Presentation
Seo Presentation
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010Open source-secret-sauce-rit-2010
Open source-secret-sauce-rit-2010
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph Stores
 
Just the basics_strata_2013
Just the basics_strata_2013Just the basics_strata_2013
Just the basics_strata_2013
 
Philly ete-2011
Philly ete-2011Philly ete-2011
Philly ete-2011
 
Android Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and DevelopersAndroid Talks #3 Android Design Best Practices - for Designers and Developers
Android Talks #3 Android Design Best Practices - for Designers and Developers
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
LT-Accelerate 2016: Between Custom and Off-the-shelf NLP
LT-Accelerate 2016: Between Custom and Off-the-shelf NLPLT-Accelerate 2016: Between Custom and Off-the-shelf NLP
LT-Accelerate 2016: Between Custom and Off-the-shelf NLP
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015The Big Data Journey at Connexity - Big Data Day LA 2015
The Big Data Journey at Connexity - Big Data Day LA 2015
 
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4jRobotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
Robotics, Search and AI with Solr, MyRobotLab, and Deeplearning4j
 

Music: Tuned to you