SlideShare a Scribd company logo
Playlist Recommendations
@
Nikhil Tibrewal
@nikhil_tibrewal
Who am I?
Nikhil Tibrewal (Nick-hill)
● Data Engineer on Lambda squad (Spotify’s primary ML team)
● Graduated from Carnegie Mellon University in Dec 2013
● B.Sc. in Computer Science + additional major in Econ
● Been part of Spotify band for ~1.5 years
● Worked on a range of projects, primarily Playlist Recommendations
Spotify in numbers
● Started in 2006, 58 markets
● 75M+ active users, 20M+ paying
● 30M+ songs, 20K new per day
● 1.5+ billion playlists
● 1 TB logs per day
● Discover tab
● Radio
● Related Artists
● Discover Weekly
● Playlist recs on “Now” Strip
Recommendations so far on Spotify
For Ellie Goulding
“Now” Strip
Human
curated
playlist
“Now” Strip
Human
curated
playlist
Recommended
playlist
But…
How are playlist recs generated?
Quick Overview!
● Recommend only human
curated playlists (1000+)
○ Well-designed cover images
○ Thorough descriptions
○ Title reflects content
Quick Overview!
● Recommend only human
curated playlists (1000+)
○ Well-designed cover images
○ Thorough descriptions
○ Title reflects content
Good
Quick Overview!
● Recommend only human
curated playlists (1000+)
○ Well-designed cover images
○ Thorough descriptions
○ Title reflects content
Good Bad
Quick Overview!
● Recommendations pipeline: Candidate Generation
○ Generate N dimensional track vectors from collaborative filtering
Quick Overview!
● Recommendations pipeline: Candidate Generation
○ Generate N dimensional track vectors from collaborative filtering
○ Vectorize playlists:
■ Playlist vector derived from track vectors in playlist
Quick Overview!
● Recommendations pipeline: Candidate Generation
○ Generate N dimensional track vectors from collaborative filtering
○ Vectorize playlists:
■ Playlist vector derived from track vectors in playlist
○ Use Annoy to store playlist vectors in N dimensional space
ANNOY (Approximate Nearest Neighbors Oh Yeah)
created at Spotify
https://github.com/spotify/annoy
Quick Overview!
● Recommendations pipeline: Candidate Generation
○ Generate N dimensional track vectors from collaborative filtering
○ Vectorize playlists:
■ Playlist vector derived from track vectors in playlist
○ Use Annoy to store playlist vectors in N dimensional space
○ Vectorize user taste as well:
■ User vector derived from user listening history
Quick Overview!
● Recommendations pipeline: Candidate Generation
○ Generate N dimensional track vectors from collaborative filtering
○ Vectorize playlists:
■ Playlist vector derived from track vectors in playlist
○ Use Annoy to store playlist vectors in N dimensional space
○ Vectorize user taste as well:
■ User vector derived from user listening history
○ User and playlist vectors in same space!
○ Query for nearest playlists to user from Annoy tree
annoyTree.getNearest(seedVector, K)
Quick Overview!
● Recommendations pipeline: Ranking Model
○ Use genre information, demographics data, and playlist popularity
data to further rank recommendations
■ John: 21, USA, likes rock
■ Should get rock playlist recs that are popular in USA and
amongst 21 year olds
○ Apply post-processing steps for shuffling and add variety to avoid
repetitions
Quick Overview!
● Recommendations pipeline: Ranking Model
○ Use genre information, demographics data, and playlist popularity
data to further rank recommendations
■ John: 21, USA, likes rock
■ Should get rock playlist recs that are popular in USA and
amongst 21 year olds
○ Apply post-processing steps for shuffling and add variety to avoid
repetitions
90% DAUs have recs!
Quick Overview!
● Infrastructure
○ Luigi to manage workflow (also built at Spotify)
○ Entire pipeline written in Scalding
○ 1200+ nodes Hadoop cluster to run jobs
○ Cassandra (~dozen nodes for playlist recs)
○ Java backend micro-services serving recs
Quick Overview!
"Scalding is comprised of a DSL (domain-specific language)
that makes MapReduce computations look like Scala’s
collection API and is a wrapper for Cascading to make it easy
to define jobs, test and data sources on an HDFS" (http:
//cascading.io/customer/twitter/)
Scalding w.r.t. Playlist Recs
● Used Python back in the day
○ Inputs and outputs were tab separated
○ Complexity UP => Difficulty to maintain UP
○ Hard to write tests
● Scalding provided compile time error checks
○ Catch errors early
○ Define schemas (e.g. Avro)
● Can use Parquet + Avro for input/output
○ Easy to write and read data
○ Records with a lot of fields!
○ Lesson: Parquet hurts performance w/ fat columns (nested data structs)
+
Scalding w.r.t. Playlist Recs +
Scalding w.r.t. Playlist Recs
● Data quality
○ Hadoop counters wrappers in extended Scalding library code
+
Scalding w.r.t. Playlist Recs
● Data quality
○ Hadoop counters wrappers in extended Scalding library code
○ Verify counters within reasonable ranges
+
Scalding w.r.t. Playlist Recs +
Scalding w.r.t. Playlist Recs
● Pipeline tolerance
○ Job failures are normal, and annoying with big jobs
○ Scalding checkpoints
○ Lesson: checkpoint itself is a map-reduce job and has the same caveats
○ Still very helpful!
+
Scalding w.r.t. Playlist Recs
● Job runtimes
○ Common solutions: more reducers and code optimizations
○ Speculative execution for larger jobs
○ Caveat: can take up unnecessary resources
+
Scalding w.r.t. Playlist Recs
● Memory issues
○ Used Sparkey indices in Python (developed at Spotify, now open source)
■ “Simple constant key/value storage lib for read-heavy systems with
infrequent large bulk inserts”
■ Replicated to all mappers
○ Complex jobs in Scalding => higher memory config for jobs with Sparkey
+
https://github.com/spotify/sparkey
Scalding w.r.t. Playlist Recs
● Memory issues
○ Used Sparkey indices in Python (developed at Spotify, now open source)
■ “Simple constant key/value storage lib for read-heavy systems with
infrequent large bulk inserts”
■ Replicated to all mappers
○ Complex jobs in Scalding => higher memory config for jobs with Sparkey
○ Lesson: trade memory resources for MAYBE a little more time with joins
+
bigPipe.join(exSparkeyPipe)
https://github.com/spotify/sparkey
Scalding w.r.t. Playlist Recs
● Driven
○ “A sophisticated tool that collects telemetry data from running Scalding /
Cascading jobs on a cluster and presenting them in an intriguing User
Interface."
○ http://cascading.io/
+
Scalding w.r.t. Playlist Recs +
Scalding w.r.t. Playlist Recs
● Other awesome benefits
+
Scalding w.r.t. Playlist Recs
● Other awesome benefits
○ Active community + big players
+
Scalding w.r.t. Playlist Recs
● Other awesome benefits
○ Active community + big players
○ Data pipeline flows naturally follow the functional paradigm - essentially
writing Scala code
+
Scalding w.r.t. Playlist Recs +
Scalding w.r.t. Playlist Recs
Productivity without sacrificing performance!
+
Status: Completed
Spotify is hiring!
Nikhil Tibrewal
@nikhil_tibrewal

More Related Content

What's hot

Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyChris Johnson
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSophia Ciocca
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyVidhya Murali
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsErik Bernhardsson
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Erik Bernhardsson
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at SpotifyNeville Li
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At SpotifyVidhya Murali
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Esh Vckay
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Mounia Lalmas-Roelleke
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyChris Johnson
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at SpotifyErik Bernhardsson
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...Hakka Labs
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupAndy Sloane
 

What's hot (20)

Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Interactive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 

Similar to Playlist Recommendations @ Spotify

Spotify cassandra london
Spotify cassandra londonSpotify cassandra london
Spotify cassandra londonNoa Resare
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Seattle Apache Flink Meetup
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...Bowen Li
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxChris Mungall
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify StoryNeville Li
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Neo4j
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceDenis Shestakov
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015Robbie Strickland
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixC4Media
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101 Esh Vckay
 
GDSC NYCU | 如何建立自己的開源專案
 GDSC NYCU | 如何建立自己的開源專案 GDSC NYCU | 如何建立自己的開源專案
GDSC NYCU | 如何建立自己的開源專案秀吉(Hsiu-Chi) 蔡(Tsai)
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
Clouds are Not Free: Guide to Observability-Driven Efficiency Optimizations
Clouds are Not Free: Guide to Observability-Driven Efficiency OptimizationsClouds are Not Free: Guide to Observability-Driven Efficiency Optimizations
Clouds are Not Free: Guide to Observability-Driven Efficiency OptimizationsScyllaDB
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware ac.uk
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019Karthik Murugesan
 

Similar to Playlist Recommendations @ Spotify (20)

Spotify cassandra london
Spotify cassandra londonSpotify cassandra london
Spotify cassandra london
 
Hive at Last.fm
Hive at Last.fmHive at Last.fm
Hive at Last.fm
 
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
Approximate Queries and Graph Streams on Apache Flink - Theodore Vasiloudis -...
 
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...Approximate queries and graph streams on Flink, theodore vasiloudis,  seattle...
Approximate queries and graph streams on Flink, theodore vasiloudis, seattle...
 
Ontology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptxOntology Access Kit_ Workshop Intro Slides.pptx
Ontology Access Kit_ Workshop Intro Slides.pptx
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Cassandra nyc
Cassandra nycCassandra nyc
Cassandra nyc
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
Terabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practiceTerabyte-scale image similarity search: experience and best practice
Terabyte-scale image similarity search: experience and best practice
 
New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015New Analytics Toolbox DevNexus 2015
New Analytics Toolbox DevNexus 2015
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
Sound soft hackday-100905
Sound soft hackday-100905Sound soft hackday-100905
Sound soft hackday-100905
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
GDSC NYCU | 如何建立自己的開源專案
 GDSC NYCU | 如何建立自己的開源專案 GDSC NYCU | 如何建立自己的開源專案
GDSC NYCU | 如何建立自己的開源專案
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Clouds are Not Free: Guide to Observability-Driven Efficiency Optimizations
Clouds are Not Free: Guide to Observability-Driven Efficiency OptimizationsClouds are Not Free: Guide to Observability-Driven Efficiency Optimizations
Clouds are Not Free: Guide to Observability-Driven Efficiency Optimizations
 
SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue SnappyData Overview Slidedeck for Big Data Bellevue
SnappyData Overview Slidedeck for Big Data Bellevue
 
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
SoundSoftware.ac.uk: Sustainable software for audio and music research (DMRN 5+)
 
The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019The Evolution of Spotify Home Architecture - Qcon 2019
The Evolution of Spotify Home Architecture - Qcon 2019
 

Recently uploaded

Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf884710SadaqatAli
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edgePaco Orozco
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfAyahmorsy
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdfKamal Acharya
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopEmre Günaydın
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineJulioCesarSalazarHer1
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturingssuser0811ec
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageRCC Institute of Information Technology
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdfKamal Acharya
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdfKamal Acharya
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdfKamal Acharya
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdfKamal Acharya
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationRobbie Edward Sayers
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientistgettygaming1
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Krakówbim.edu.pl
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfAbrahamGadissa
 

Recently uploaded (20)

Explosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdfExplosives Industry manufacturing process.pdf
Explosives Industry manufacturing process.pdf
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
Electrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission lineElectrostatic field in a coaxial transmission line
Electrostatic field in a coaxial transmission line
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturing
 
Scaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltageScaling in conventional MOSFET for constant electric field and constant voltage
Scaling in conventional MOSFET for constant electric field and constant voltage
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
Fruit shop management system project report.pdf
Fruit shop management system project report.pdfFruit shop management system project report.pdf
Fruit shop management system project report.pdf
 
Hall booking system project report .pdf
Hall booking system project report  .pdfHall booking system project report  .pdf
Hall booking system project report .pdf
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 

Playlist Recommendations @ Spotify

  • 2. Who am I? Nikhil Tibrewal (Nick-hill) ● Data Engineer on Lambda squad (Spotify’s primary ML team) ● Graduated from Carnegie Mellon University in Dec 2013 ● B.Sc. in Computer Science + additional major in Econ ● Been part of Spotify band for ~1.5 years ● Worked on a range of projects, primarily Playlist Recommendations
  • 3. Spotify in numbers ● Started in 2006, 58 markets ● 75M+ active users, 20M+ paying ● 30M+ songs, 20K new per day ● 1.5+ billion playlists ● 1 TB logs per day
  • 4. ● Discover tab ● Radio ● Related Artists ● Discover Weekly ● Playlist recs on “Now” Strip Recommendations so far on Spotify For Ellie Goulding
  • 7. But… How are playlist recs generated?
  • 8. Quick Overview! ● Recommend only human curated playlists (1000+) ○ Well-designed cover images ○ Thorough descriptions ○ Title reflects content
  • 9. Quick Overview! ● Recommend only human curated playlists (1000+) ○ Well-designed cover images ○ Thorough descriptions ○ Title reflects content Good
  • 10. Quick Overview! ● Recommend only human curated playlists (1000+) ○ Well-designed cover images ○ Thorough descriptions ○ Title reflects content Good Bad
  • 11. Quick Overview! ● Recommendations pipeline: Candidate Generation ○ Generate N dimensional track vectors from collaborative filtering
  • 12. Quick Overview! ● Recommendations pipeline: Candidate Generation ○ Generate N dimensional track vectors from collaborative filtering ○ Vectorize playlists: ■ Playlist vector derived from track vectors in playlist
  • 13. Quick Overview! ● Recommendations pipeline: Candidate Generation ○ Generate N dimensional track vectors from collaborative filtering ○ Vectorize playlists: ■ Playlist vector derived from track vectors in playlist ○ Use Annoy to store playlist vectors in N dimensional space ANNOY (Approximate Nearest Neighbors Oh Yeah) created at Spotify https://github.com/spotify/annoy
  • 14. Quick Overview! ● Recommendations pipeline: Candidate Generation ○ Generate N dimensional track vectors from collaborative filtering ○ Vectorize playlists: ■ Playlist vector derived from track vectors in playlist ○ Use Annoy to store playlist vectors in N dimensional space ○ Vectorize user taste as well: ■ User vector derived from user listening history
  • 15. Quick Overview! ● Recommendations pipeline: Candidate Generation ○ Generate N dimensional track vectors from collaborative filtering ○ Vectorize playlists: ■ Playlist vector derived from track vectors in playlist ○ Use Annoy to store playlist vectors in N dimensional space ○ Vectorize user taste as well: ■ User vector derived from user listening history ○ User and playlist vectors in same space! ○ Query for nearest playlists to user from Annoy tree annoyTree.getNearest(seedVector, K)
  • 16. Quick Overview! ● Recommendations pipeline: Ranking Model ○ Use genre information, demographics data, and playlist popularity data to further rank recommendations ■ John: 21, USA, likes rock ■ Should get rock playlist recs that are popular in USA and amongst 21 year olds ○ Apply post-processing steps for shuffling and add variety to avoid repetitions
  • 17. Quick Overview! ● Recommendations pipeline: Ranking Model ○ Use genre information, demographics data, and playlist popularity data to further rank recommendations ■ John: 21, USA, likes rock ■ Should get rock playlist recs that are popular in USA and amongst 21 year olds ○ Apply post-processing steps for shuffling and add variety to avoid repetitions 90% DAUs have recs!
  • 18. Quick Overview! ● Infrastructure ○ Luigi to manage workflow (also built at Spotify) ○ Entire pipeline written in Scalding ○ 1200+ nodes Hadoop cluster to run jobs ○ Cassandra (~dozen nodes for playlist recs) ○ Java backend micro-services serving recs
  • 19. Quick Overview! "Scalding is comprised of a DSL (domain-specific language) that makes MapReduce computations look like Scala’s collection API and is a wrapper for Cascading to make it easy to define jobs, test and data sources on an HDFS" (http: //cascading.io/customer/twitter/)
  • 20. Scalding w.r.t. Playlist Recs ● Used Python back in the day ○ Inputs and outputs were tab separated ○ Complexity UP => Difficulty to maintain UP ○ Hard to write tests ● Scalding provided compile time error checks ○ Catch errors early ○ Define schemas (e.g. Avro) ● Can use Parquet + Avro for input/output ○ Easy to write and read data ○ Records with a lot of fields! ○ Lesson: Parquet hurts performance w/ fat columns (nested data structs) +
  • 22. Scalding w.r.t. Playlist Recs ● Data quality ○ Hadoop counters wrappers in extended Scalding library code +
  • 23. Scalding w.r.t. Playlist Recs ● Data quality ○ Hadoop counters wrappers in extended Scalding library code ○ Verify counters within reasonable ranges +
  • 25. Scalding w.r.t. Playlist Recs ● Pipeline tolerance ○ Job failures are normal, and annoying with big jobs ○ Scalding checkpoints ○ Lesson: checkpoint itself is a map-reduce job and has the same caveats ○ Still very helpful! +
  • 26. Scalding w.r.t. Playlist Recs ● Job runtimes ○ Common solutions: more reducers and code optimizations ○ Speculative execution for larger jobs ○ Caveat: can take up unnecessary resources +
  • 27. Scalding w.r.t. Playlist Recs ● Memory issues ○ Used Sparkey indices in Python (developed at Spotify, now open source) ■ “Simple constant key/value storage lib for read-heavy systems with infrequent large bulk inserts” ■ Replicated to all mappers ○ Complex jobs in Scalding => higher memory config for jobs with Sparkey + https://github.com/spotify/sparkey
  • 28. Scalding w.r.t. Playlist Recs ● Memory issues ○ Used Sparkey indices in Python (developed at Spotify, now open source) ■ “Simple constant key/value storage lib for read-heavy systems with infrequent large bulk inserts” ■ Replicated to all mappers ○ Complex jobs in Scalding => higher memory config for jobs with Sparkey ○ Lesson: trade memory resources for MAYBE a little more time with joins + bigPipe.join(exSparkeyPipe) https://github.com/spotify/sparkey
  • 29. Scalding w.r.t. Playlist Recs ● Driven ○ “A sophisticated tool that collects telemetry data from running Scalding / Cascading jobs on a cluster and presenting them in an intriguing User Interface." ○ http://cascading.io/ +
  • 31. Scalding w.r.t. Playlist Recs ● Other awesome benefits +
  • 32. Scalding w.r.t. Playlist Recs ● Other awesome benefits ○ Active community + big players +
  • 33. Scalding w.r.t. Playlist Recs ● Other awesome benefits ○ Active community + big players ○ Data pipeline flows naturally follow the functional paradigm - essentially writing Scala code +
  • 35. Scalding w.r.t. Playlist Recs Productivity without sacrificing performance! +
  • 36. Status: Completed Spotify is hiring! Nikhil Tibrewal @nikhil_tibrewal