SlideShare a Scribd company logo
1 of 45
Download to read offline
The Netflix way to deal
with real-time data
How we built a 1t/day stream processing cloud platform in a year
What should I expect
Keystone Season 1 - Who, What, How and Why
Keystone Season 2 - Preview Trailer
1.
The Who?
hello!
I am Peter Bakas
I lead the Real-Time Data Infrastructure team @ Netflix
You can find me at @peter_bakas
2.
The What?
Publish +
Collect +
Process +
Move Data
1,000,000,000,000
Whoa! That’s a big number.
Events Processed Every Day
Daily Averages
700B unique events ingested
1T events processed
1.4 PB
By the numbers
Peak
1T unique events ingested/day
12.5M/sec
35 GB/sec
Trending on Netflix
80B/day
1/2014
300B/day
1/2015
1T/day
1/2016
Growth has
its season
In the beginning - Chukwa
Q4 2014 - Chukwa/Suro
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Kafka
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Keystone Kafka Footprint
Fronting
Kafka
Consumer Kafka
Number of Clusters 24 8
Number of Instances 3000+ 900+
Retention Period 8 to 24 hrs 2 to 4 hrs
Q4 2015 - Keystone
Internal
Routing
Service
EMR
Fronting
Kafka
Event
Producer
Consumer
Kafka
Control Plane
HTTP
PROXY
Stream
Consumers
Keystone Internal Routing Service
+
Checkpointing
Cluster
+
Keystone Internal Routing Service
Keystone Internal Routing Service Footprint
S3 ElasticSearch
Consumer
Kafka
Number of
containers
7000 1500 4500
Keystone avg end-to-end metrics
S3 ElasticSearch
Consumer
Kafka
1 sec 13 sec 800 ms
3.
The How?
Netflix Culture
Freedom and Responsibility
“It may well be the most
important document to ever
come out of the Valley
Sheryl Sandberg, COO @ Facebook
What does
culture
have to do
with how?
Sounds easy
Build
Team
Build
Product
A true story
Keystone went live 10/27/15
2 days later...
Place your screenshot here
80% “loss” over
6 hour period
A true story
Lessons learned
There are times when things can go wrong… and no turning back
Reduce complexity
Minimize blast radius
Find a way to start over fresh
Failover
Cold standby Kafka cluster with different instance type
Different ZooKeeper cluster with no state
Fully automated
Place your screenshot here
Time is of the essence
Failover as fast as
5 minutes
Fully
Automated
Failover
Best Practices
Full Automation
Self Healing
Kafka Kong
4.
The Why?
“We didn’t do anything wrong, but
somehow, we lost...
Stephen Elop, Nokia CEO
Global Launch 1/6/16
125,000,000 hrs/day
That’s a lot of hours!
37 %
of North America internet traffic @ peak!
81,000,000 members
and a lot of members
If you don’t change,
you will be eliminated from the
competition
5.
Coming Soon
Our philosophy
Create Duplo R
Blocks :
Let reusability drive new value
Evolution
Keystone
Management
Keystone
Messaging
Keystone
Stream
Processing
Keystone
Unified event publishing, collection,
routing for batch and stream
processing
85% of data volume
Keystone Messaging
Ad-hoc Messaging
15% of data volume
Consumers weary of
Complexity of self-managed
infrastructure
Multiple runtimes across different
platforms
Keystone Stream Processing
Consumers want
Simple unified model/API/UI/system
Simple and intuitive interface to
manage all Keystone services
Keystone Management
thanks!
Any questions?
You can find me at
@peter_bakas
pbakas@netflix.com
Credits
Special thanks to all the people who made and released
these awesome resources for free:
Presentation template by SlidesCarnival
Photographs by Unsplash

More Related Content

What's hot

What's hot (20)

Distributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystemDistributed architecture in a cloud native microservices ecosystem
Distributed architecture in a cloud native microservices ecosystem
 
Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014Netflix viewing data architecture evolution - EBJUG Nov 2014
Netflix viewing data architecture evolution - EBJUG Nov 2014
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Engineering Leader opportunity @ Netflix - Playback Data Systems
Engineering Leader opportunity @ Netflix - Playback Data SystemsEngineering Leader opportunity @ Netflix - Playback Data Systems
Engineering Leader opportunity @ Netflix - Playback Data Systems
 
#lspe Q1 2013 dynamically scaling netflix in the cloud
#lspe Q1 2013   dynamically scaling netflix in the cloud#lspe Q1 2013   dynamically scaling netflix in the cloud
#lspe Q1 2013 dynamically scaling netflix in the cloud
 
(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise(DVO204) Monitoring Strategies: Finding Signal in the Noise
(DVO204) Monitoring Strategies: Finding Signal in the Noise
 
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
(DVO205) Monitoring Evolution: Flying Blind to Flying by Instrument
 
Operational challenges behind Serverless architectures
Operational challenges behind Serverless architecturesOperational challenges behind Serverless architectures
Operational challenges behind Serverless architectures
 
netflix-real-time-data-strata-talk
netflix-real-time-data-strata-talknetflix-real-time-data-strata-talk
netflix-real-time-data-strata-talk
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
(ISM309) Efficient Innovation:High-Velocity Cost Management at Netflix
 
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
Surge 2013: Maximizing Scalability, Resiliency, and Engineering Velocity in t...
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014Netflix at-disney-09-26-2014
Netflix at-disney-09-26-2014
 

Viewers also liked

Como utilizar las redes sociales
Como utilizar las redes socialesComo utilizar las redes sociales
Como utilizar las redes sociales
mberaliz
 
11. Huccet I Imaniye
11. Huccet I  Imaniye11. Huccet I  Imaniye
11. Huccet I Imaniye
Ahmet Türkan
 

Viewers also liked (20)

BDX 2016- Monal daxini @ Netflix
BDX 2016-  Monal daxini  @ NetflixBDX 2016-  Monal daxini  @ Netflix
BDX 2016- Monal daxini @ Netflix
 
Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1Hdfs 2016-hadoop-summit-dublin-v1
Hdfs 2016-hadoop-summit-dublin-v1
 
Como utilizar las redes sociales
Como utilizar las redes socialesComo utilizar las redes sociales
Como utilizar las redes sociales
 
Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)Matt Franklin - Apache Software (Geekfest)
Matt Franklin - Apache Software (Geekfest)
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Rootconf
RootconfRootconf
Rootconf
 
From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove From Config Management Sucks to #cfgmgmtlove
From Config Management Sucks to #cfgmgmtlove
 
Mesoscon 2015
Mesoscon 2015Mesoscon 2015
Mesoscon 2015
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
How To Make Dev Ops Work @ Netlight Edge X Berlin
How To Make Dev Ops Work @ Netlight Edge X BerlinHow To Make Dev Ops Work @ Netlight Edge X Berlin
How To Make Dev Ops Work @ Netlight Edge X Berlin
 
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
906702 Enhancing Business Processes Using Enterprise Information Systems
906702 Enhancing Business Processes Using Enterprise Information Systems906702 Enhancing Business Processes Using Enterprise Information Systems
906702 Enhancing Business Processes Using Enterprise Information Systems
 
11. Huccet I Imaniye
11. Huccet I  Imaniye11. Huccet I  Imaniye
11. Huccet I Imaniye
 
Building Product from ground up using Open Source Technologies
Building Product from ground up using Open Source TechnologiesBuilding Product from ground up using Open Source Technologies
Building Product from ground up using Open Source Technologies
 
IT_FOR_BUSINESS_30NOV15
IT_FOR_BUSINESS_30NOV15IT_FOR_BUSINESS_30NOV15
IT_FOR_BUSINESS_30NOV15
 
Data science team, a practice to setup
Data science team, a practice to setupData science team, a practice to setup
Data science team, a practice to setup
 
Send that (damn) elevator down !
Send that (damn) elevator down !Send that (damn) elevator down !
Send that (damn) elevator down !
 
Dev ops
Dev opsDev ops
Dev ops
 
The Rise of the Container: The Dev/Ops Technology That Accelerates Ops/Dev
The Rise of the Container:  The Dev/Ops Technology That Accelerates Ops/DevThe Rise of the Container:  The Dev/Ops Technology That Accelerates Ops/Dev
The Rise of the Container: The Dev/Ops Technology That Accelerates Ops/Dev
 

Similar to Keystone - Leverage Big Data 2016

Similar to Keystone - Leverage Big Data 2016 (20)

Liveperson DLD 2015
Liveperson DLD 2015 Liveperson DLD 2015
Liveperson DLD 2015
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
Resilient Event Driven Systems With Kafka
Resilient Event Driven Systems With KafkaResilient Event Driven Systems With Kafka
Resilient Event Driven Systems With Kafka
 
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
 
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
Scaling your Kafka streaming pipeline can be a pain - but it doesn’t have to ...
 
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?Scylla Summit 2018: OLAP or OLTP? Why Not Both?
Scylla Summit 2018: OLAP or OLTP? Why Not Both?
 
Apache Beam (incubating)
Apache Beam (incubating)Apache Beam (incubating)
Apache Beam (incubating)
 
KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)KubeCon 2019 Recap (Parts 1-3)
KubeCon 2019 Recap (Parts 1-3)
 
Evolving the Netflix API
Evolving the Netflix APIEvolving the Netflix API
Evolving the Netflix API
 
Apache Kafka® Delivers a Single Source of Truth for The New York Times
Apache Kafka® Delivers a Single Source of Truth for The New York TimesApache Kafka® Delivers a Single Source of Truth for The New York Times
Apache Kafka® Delivers a Single Source of Truth for The New York Times
 
Monal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ NetflixMonal Daxini - Beaming Flink to the Cloud @ Netflix
Monal Daxini - Beaming Flink to the Cloud @ Netflix
 
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
8 Lessons Learned from Using Kafka in 1000 Scala microservices - Scale by the...
 
James Watters Kafka Summit NYC 2019 Keynote
James Watters Kafka Summit NYC 2019 KeynoteJames Watters Kafka Summit NYC 2019 Keynote
James Watters Kafka Summit NYC 2019 Keynote
 
"Leveraging the Event Loop for Blazing-Fast Applications!", Michael Di Prisco
"Leveraging the Event Loop for Blazing-Fast Applications!",  Michael Di Prisco"Leveraging the Event Loop for Blazing-Fast Applications!",  Michael Di Prisco
"Leveraging the Event Loop for Blazing-Fast Applications!", Michael Di Prisco
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
How Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet TrafficHow Netflix Directs 1/3rd of Internet Traffic
How Netflix Directs 1/3rd of Internet Traffic
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 

Recently uploaded

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 

Recently uploaded (20)

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdfSimplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
Simplified FDO Manufacturing Flow with TPMs _ Liam at Infineon.pdf
 

Keystone - Leverage Big Data 2016