SlideShare a Scribd company logo
How to make
sense of 150
TB of data,
every day
Sebastian Widlund
widlund@spotify.com
M
27M
53M
80M
2009 2010 2011 2012 2013 May'14 May'15
Spotify 75 million users
20 million paying users
Founded in 2008
3+ billion dollars paid to rights holders
30+ million tracks
1.5+ billion playlists
1500+ employees
“Big data is a broad term for data sets
so large or complex that traditional
data processing applications are
inadequate.”
What is big data?
14 TB of user/service-related log data per day
Streams/clicks/interactions are being logged
Expands to 150 TB every day
Combining data sources
We utilise a cluster of 1600 nodes
Hadoop
Data
Center
European
Data
Center
American
Data
Center
Internet Internet7 TB/day 7 TB/day
Client Client
Spotify data architecture
Approximate 60M users x 4M songs with
40 latent factors, ALS
In short, minimise the cost function:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Time of day
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday

More Related Content

Viewers also liked

Ads Personalization at Spotify - NYC Data Engineering 10/23
Ads Personalization at Spotify - NYC Data Engineering 10/23Ads Personalization at Spotify - NYC Data Engineering 10/23
Ads Personalization at Spotify - NYC Data Engineering 10/23
Kinshuk Mishra
 
Cross platform web app development
Cross platform web app developmentCross platform web app development
Cross platform web app development
tomasperezv
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
Idan Tohami
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
Neville Li
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
Danielle Jabin
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
DataWorks Summit/Hadoop Summit
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
Ricardo Vice Santos
 

Viewers also liked (7)

Ads Personalization at Spotify - NYC Data Engineering 10/23
Ads Personalization at Spotify - NYC Data Engineering 10/23Ads Personalization at Spotify - NYC Data Engineering 10/23
Ads Personalization at Spotify - NYC Data Engineering 10/23
 
Cross platform web app development
Cross platform web app developmentCross platform web app development
Cross platform web app development
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 
From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...From stream to recommendation using apache beam with cloud pubsub and cloud d...
From stream to recommendation using apache beam with cloud pubsub and cloud d...
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 
Spotify: behind the scenes
Spotify: behind the scenesSpotify: behind the scenes
Spotify: behind the scenes
 

Similar to Spotify Teknikdagarna

The Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean ProtocolThe Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean Protocol
Trent McConaghy
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
Eva Tse
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
Eli White
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
Kiran Donepudi
 
Sound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT InternetSound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT Internet
AT Internet
 
MAPCI at Skåne:Sthlm 2013
MAPCI at Skåne:Sthlm 2013MAPCI at Skåne:Sthlm 2013
MAPCI at Skåne:Sthlm 2013
Björn Ekelund
 
Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...
Emanuele Della Valle
 
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação ExtremaA Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
Amazon Web Services LATAM
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
Swapnaja Tandale
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
MIT College Of Engineering,Pune
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
Amazon Web Services
 
Big Data: Come usare le Machine Learning per migliorare il business
Big Data: Come usare le Machine Learning per migliorare il businessBig Data: Come usare le Machine Learning per migliorare il business
Big Data: Come usare le Machine Learning per migliorare il business
Stefano Dindo
 
Big Data
Big DataBig Data
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
MIT College Of Engineering,Pune
 
Itunes
ItunesItunes
Itunes
shane121
 
Itunes
ItunesItunes
Itunes
shane121
 
Big Data
Big DataBig Data
Big Data
TUSHAR GARG
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
TUSHAR GARG
 
Getting your head around big data
Getting your head around big dataGetting your head around big data
Getting your head around big data
Glenn Block
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
Damien Dallimore
 

Similar to Spotify Teknikdagarna (20)

The Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean ProtocolThe Web3 Data Economy: Ocean Protocol
The Web3 Data Economy: Ocean Protocol
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011Big data and APIs for PHP developers - SXSW 2011
Big data and APIs for PHP developers - SXSW 2011
 
Data Monetization
Data MonetizationData Monetization
Data Monetization
 
Sound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT InternetSound cloud - User & Partner Conference - AT Internet
Sound cloud - User & Partner Conference - AT Internet
 
MAPCI at Skåne:Sthlm 2013
MAPCI at Skåne:Sthlm 2013MAPCI at Skåne:Sthlm 2013
MAPCI at Skåne:Sthlm 2013
 
Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...Stream Reasoning: a summary of ten years of research and a vision for the nex...
Stream Reasoning: a summary of ten years of research and a vision for the nex...
 
A Empresa na Era da Informação Extrema
A Empresa na Era da Informação ExtremaA Empresa na Era da Informação Extrema
A Empresa na Era da Informação Extrema
 
Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja Presentation on BigData by Swapnaja
Presentation on BigData by Swapnaja
 
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
Big Data Analytics(Intro,Hadoop Map Reduce,Mahout,K-means clustering,H-base)
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
Big Data: Come usare le Machine Learning per migliorare il business
Big Data: Come usare le Machine Learning per migliorare il businessBig Data: Come usare le Machine Learning per migliorare il business
Big Data: Come usare le Machine Learning per migliorare il business
 
Big Data
Big DataBig Data
Big Data
 
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
Big Data Analytics(concepts of hadoop mapreduce,mahout,k-means clustering,hbase)
 
Itunes
ItunesItunes
Itunes
 
Itunes
ItunesItunes
Itunes
 
Big Data
Big DataBig Data
Big Data
 
Big data use cases in the cloud presentation
Big data use cases in the cloud presentationBig data use cases in the cloud presentation
Big data use cases in the cloud presentation
 
Getting your head around big data
Getting your head around big dataGetting your head around big data
Getting your head around big data
 
A Brief History Of Data
A Brief History Of DataA Brief History Of Data
A Brief History Of Data
 

Spotify Teknikdagarna

Editor's Notes

  1. Svårt att definiera exakt hur många låtar. Deduplicering, vi har ett eget team som jobbar med det 30 miljoner faktiska låtar, när man räknat bort dubbletter.
  2. High Volume, High Velocity, High Variety Katalog, 30 miljoner rader (inte så mycket). 30^2 (n^2) jämförelse.
  3. billions of lines of data every day Anonymizing data, making sure that all data is according to privacy concerns. One machine, 160mb/s, 10 days to read in 150 TB of data So how do we do it?
  4. 60 PB of disk space 68 TB of RAM (42gb per server) 30k CPU cores
  5. Logs sent from clients - Sent to EU/US data centre
  6. Very complex Lots of different services App is a small part of everything “The app does not just work”
  7. Example of the “Discovery” Logs come from the client, pass through hadoop into a service that recommends music and surfaces back to user
  8. Example of the “Discovery” Logs come from the client, pass through hadoop into a service that recommends music and surfaces back to user
  9. Track usage, but important for breakdown. The reason we save a lot of data