SlideShare a Scribd company logo
June 12, 2014
Danielle Jabin
Data Engineer, A/B Testing
Data at Spotify
I’m Danielle Jabin
•  Data Engineer in the Stockholm office
•  A/B testing infrastructure
•  California born & raised
•  If I can survive a Swedish winter, so can you!
•  Studied Computer Science, Statistics, and Real Estate
through the M&T program at the University of
Pennsylvania
3
Over 40 million active
users
As of June 9, 2014	
  
4
Access to more than 20
million songs
As of June 9, 2014	
  
Big Data
•  40 million Monthly Active Users
•  20+ million tracks
•  1.5 TB of compressed data from users per day
•  64 TB of data generated in Hadoop each day (including
replication factor of 3)
As of June 9, 2014	
  
6
So how much data is that?
Let’s compare: 64 TB
•  293, 203, 072 books (200 pages or 240,000
characters)
•  16,777,216 MP3 files (with 4MB average file size)
•  22,369,600 images (with 3MB average file size)
8
That’s a lot of selfies
9
How do we use this data?
Use Cases
•  Reporting
•  Business Analytics
•  Operational Analytics
•  Product Features
Reporting
•  Reporting to labels, licensors, partners, and advertisers
•  We support our partners
Business Analytics
•  Analyzing growth, user behavior, sign-up funnels, etc
•  Company KPIs
•  NPS analysis
Operational Metrics
•  Root cause analysis
•  Latency analysis
•  Better capacity planning (servers, people, bandwidth)
Product Features
•  Discover and Radio
•  Top lists
•  Personalized recommendations
•  A/B Testing
15
How do we collect this
data?
The three pillars of our Data Infrastructure:
Kafka
Collection
Hadoop
Processing
Databases
Analytics/Visualization
This is Dave. Data Engineer at
Spotify by day…
…chiptune DJ Demoscene Time
Machine by night.
Let’s listen to Dave’s song
Kafka
•  High volume pub-sub
system
•  “Producers publish messages to
Kafka topics, and consumers
subscribe to these topics and
consume the messages.”
Kafka
•  Robust and scalable solution for collection of logs
•  Fast data transfer
•  Low CPU overhead
•  Built-in partitioning, replication, and fault-tolerance
•  Consumers can pull data at different rates
•  Able to handle extremely high volumes
Other people listened too!
Hadoop
•  Process and store massive amounts of unstructured data
across a distributed cluster
•  One cluster with 37 nodes to 690 nodes today
•  28 PB of storage
•  The largest Hadoop cluster in Europe
Hadoop
•  Entering the land of optimizations
•  Data retention policy
•  Move to JVM-based languages
•  MapReduce languages
•  Moving to Crunch, JVM-based, for speed and scalability
•  Python with Hadoop Streaming, Java, Hive, PIG, Scala
•  Sprunch: Crunch wrapper for Scala, open sourced by Spotify
•  Spotify open-sourced scheduler, Luigi, written in Python
•  Simple and easy way to chain jobs
What if we want to know more?
vs
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Core data can be used and manipulated for various needs
•  Ad hoc queries
•  Dashboards
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Ad hoc queries
•  Dashboards
Databases
•  Aggregates from Hadoop put into PostgreSQL or
Cassandra
•  Sqoop
•  Ad hoc queries
•  Dashboards
Questions?
A/B testing questions? Find me!
Contr
ol
vs
Thank you!

More Related Content

What's hot

Social Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifySocial Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifyValeria Aguerri
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
Mounia Lalmas-Roelleke
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
Vidhya Murali
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
Vidhya Murali
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
Neville Li
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
Chris Johnson
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotify
pdicorpo
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
Erik Bernhardsson
 
Spotify
SpotifySpotify
Spotify
Petr Bela
 
Analysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature IdeasAnalysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature Ideas
Sarah L. Miller
 
Spotify Company Presentation
Spotify Company PresentationSpotify Company Presentation
Spotify Company Presentation
Erik Forkin
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
Mounia Lalmas-Roelleke
 
Spotify presentation
Spotify presentationSpotify presentation
Spotify presentation
Ashla Lucy
 
Spotify
SpotifySpotify
Spotify
cagankoc
 
Digital Marketing - Spotify
Digital Marketing - SpotifyDigital Marketing - Spotify
Digital Marketing - Spotify
Laura Sorrentino
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
Karthik Murugesan
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
Andy Sloane
 
Digital strategy for spotify
Digital strategy for spotifyDigital strategy for spotify
Digital strategy for spotify
Rishabh Heer
 
Spotify presentation
Spotify presentationSpotify presentation
Spotify presentation
Uttaravalli Abhinav
 
Spotify Company presentation
Spotify Company presentationSpotify Company presentation
Spotify Company presentation
alifost
 

What's hot (20)

Social Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of SpotifySocial Media Monitoring: The Case of Spotify
Social Media Monitoring: The Case of Spotify
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Product Owner presentation for Spotify
Product Owner presentation for SpotifyProduct Owner presentation for Spotify
Product Owner presentation for Spotify
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Spotify
SpotifySpotify
Spotify
 
Analysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature IdeasAnalysis of Spotify & New Feature Ideas
Analysis of Spotify & New Feature Ideas
 
Spotify Company Presentation
Spotify Company PresentationSpotify Company Presentation
Spotify Company Presentation
 
Personalizing the listening experience
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experience
 
Spotify presentation
Spotify presentationSpotify presentation
Spotify presentation
 
Spotify
SpotifySpotify
Spotify
 
Digital Marketing - Spotify
Digital Marketing - SpotifyDigital Marketing - Spotify
Digital Marketing - Spotify
 
Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Digital strategy for spotify
Digital strategy for spotifyDigital strategy for spotify
Digital strategy for spotify
 
Spotify presentation
Spotify presentationSpotify presentation
Spotify presentation
 
Spotify Company presentation
Spotify Company presentationSpotify Company presentation
Spotify Company presentation
 

Viewers also liked

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
Danielle Jabin
 
NAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to SuccessNAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to Success
Devon Smith
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trust
Pedro Marques
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
Justin Basilico
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
Janessa Lantz
 
Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)
Vlad Mysla
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
Peter Antman
 

Viewers also liked (7)

Making Better Mistakes Tomorrow
Making Better Mistakes TomorrowMaking Better Mistakes Tomorrow
Making Better Mistakes Tomorrow
 
NAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to SuccessNAMP Conference - A/B Testing Your Way to Success
NAMP Conference - A/B Testing Your Way to Success
 
A/B Testing - In data we trust
A/B Testing - In data we trustA/B Testing - In data we trust
A/B Testing - In data we trust
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing4 Steps Toward Scientific A/B Testing
4 Steps Toward Scientific A/B Testing
 
Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)Scaling Agile at Spotify (representation)
Scaling Agile at Spotify (representation)
 
Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved Growing up with agile - how the Spotify 'model' has evolved
Growing up with agile - how the Spotify 'model' has evolved
 

Similar to Data at Spotify

The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
Josh Baer
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
Miguel Pastor
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
Adaryl "Bob" Wakefield, MBA
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
6535ANURAGANURAG
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
Miguel Pastor
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
Amir Shaikh
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
Adaryl "Bob" Wakefield, MBA
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
Adel Rahimi
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
Dr.K.Sreenivas Rao
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
Sina Sheikholeslami
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
Softweb Solutions
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data miningAnkit Solanki
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Lucidworks
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
KMS Technology
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
RightScale
 

Similar to Data at Spotify (20)

The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Liferay and Big Data
Liferay and Big DataLiferay and Big Data
Liferay and Big Data
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Big data
Big dataBig data
Big data
 
Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014Liferay & Big Data Dev Con 2014
Liferay & Big Data Dev Con 2014
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Big Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = AwesomeBig Data + Sentiment Analysis = Awesome
Big Data + Sentiment Analysis = Awesome
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
From Big Data to Fast Data
From Big Data to Fast DataFrom Big Data to Fast Data
From Big Data to Fast Data
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Big Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and moreBig Data in Action : Operations, Analytics and more
Big Data in Action : Operations, Analytics and more
 
Streaming data mining
Streaming data miningStreaming data mining
Streaming data mining
 
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...Searching The Enterprise Data Lake With Solr  - Watch Us Do It!: Presented by...
Searching The Enterprise Data Lake With Solr - Watch Us Do It!: Presented by...
 
An Introduction of Apache Hadoop
An Introduction of Apache HadoopAn Introduction of Apache Hadoop
An Introduction of Apache Hadoop
 
Getting Started with Big Data in the Cloud
Getting Started with Big Data in the CloudGetting Started with Big Data in the Cloud
Getting Started with Big Data in the Cloud
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Data at Spotify

  • 1. June 12, 2014 Danielle Jabin Data Engineer, A/B Testing Data at Spotify
  • 2. I’m Danielle Jabin •  Data Engineer in the Stockholm office •  A/B testing infrastructure •  California born & raised •  If I can survive a Swedish winter, so can you! •  Studied Computer Science, Statistics, and Real Estate through the M&T program at the University of Pennsylvania
  • 3. 3 Over 40 million active users As of June 9, 2014  
  • 4. 4 Access to more than 20 million songs As of June 9, 2014  
  • 5. Big Data •  40 million Monthly Active Users •  20+ million tracks •  1.5 TB of compressed data from users per day •  64 TB of data generated in Hadoop each day (including replication factor of 3) As of June 9, 2014  
  • 6. 6 So how much data is that?
  • 7. Let’s compare: 64 TB •  293, 203, 072 books (200 pages or 240,000 characters) •  16,777,216 MP3 files (with 4MB average file size) •  22,369,600 images (with 3MB average file size)
  • 8. 8 That’s a lot of selfies
  • 9. 9 How do we use this data?
  • 10. Use Cases •  Reporting •  Business Analytics •  Operational Analytics •  Product Features
  • 11. Reporting •  Reporting to labels, licensors, partners, and advertisers •  We support our partners
  • 12. Business Analytics •  Analyzing growth, user behavior, sign-up funnels, etc •  Company KPIs •  NPS analysis
  • 13. Operational Metrics •  Root cause analysis •  Latency analysis •  Better capacity planning (servers, people, bandwidth)
  • 14. Product Features •  Discover and Radio •  Top lists •  Personalized recommendations •  A/B Testing
  • 15. 15 How do we collect this data?
  • 16. The three pillars of our Data Infrastructure: Kafka Collection Hadoop Processing Databases Analytics/Visualization
  • 17. This is Dave. Data Engineer at Spotify by day…
  • 18. …chiptune DJ Demoscene Time Machine by night.
  • 19. Let’s listen to Dave’s song
  • 20. Kafka •  High volume pub-sub system •  “Producers publish messages to Kafka topics, and consumers subscribe to these topics and consume the messages.”
  • 21. Kafka •  Robust and scalable solution for collection of logs •  Fast data transfer •  Low CPU overhead •  Built-in partitioning, replication, and fault-tolerance •  Consumers can pull data at different rates •  Able to handle extremely high volumes
  • 23. Hadoop •  Process and store massive amounts of unstructured data across a distributed cluster •  One cluster with 37 nodes to 690 nodes today •  28 PB of storage •  The largest Hadoop cluster in Europe
  • 24. Hadoop •  Entering the land of optimizations •  Data retention policy •  Move to JVM-based languages •  MapReduce languages •  Moving to Crunch, JVM-based, for speed and scalability •  Python with Hadoop Streaming, Java, Hive, PIG, Scala •  Sprunch: Crunch wrapper for Scala, open sourced by Spotify •  Spotify open-sourced scheduler, Luigi, written in Python •  Simple and easy way to chain jobs
  • 25. What if we want to know more? vs
  • 26. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Core data can be used and manipulated for various needs •  Ad hoc queries •  Dashboards
  • 27. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Ad hoc queries •  Dashboards
  • 28. Databases •  Aggregates from Hadoop put into PostgreSQL or Cassandra •  Sqoop •  Ad hoc queries •  Dashboards
  • 30. A/B testing questions? Find me! Contr ol vs