SlideShare a Scribd company logo
1 of 9
Download to read offline
Moving from 99.9 to 99.99
availability using Kafka
Tejas Chopra (Netflix, Inc.)
Agenda
- Introduction
- Problems with Cloud Storage, and ways around it
- What is availability?
- Uploads with 99.9% availability
- Uploads with 99.99% availability
- Takeaways & Lessons
Introduction
- Senior Software Engineer, Netflix
- Keynote Speaker: Cloud, Distributed
Systems, Blockchain
- Senior Software Engineer, Box.
- Datrium, Samsung, Tensilica, Apple,
Inc.
Cloud Conundrums
- Cheap to put data into cloud
- Pay to store it, pay even more to read it
- Solution:
- What if we can store a copy of data on
premise?
- Saves on reads
- Hot data can be on-premise, archival on
cloud
- Security, latency,
- Save millions of dollars per year
- Box: petabytes, Netflix: exabytes
Availability
- What is it?
- For on-premise
- For Cloud
- Gartner: Avg cost of downtime:
$5600/min.
- 99.9% : $2.8M
- 99.99%: $291K
99.9% solution
- Upload to on-premise (availability =
99.9%)
- Use kafka events to trigger uploads to
cloud
- Reads served from on-premise if
present, else fetched from cloud
99.99% solution
- Split the incoming stream to both
cloud and on-premise
- Queue failed on-premise requests
using Kafka
- Use cloud to hydrate failed uploads
on-premise
- 99.99% availability
Takeaways and Lessons
- Millions of customers, and billions of files uploaded: Kafka scales without
downtime. Kafka throughput: thousands of messages per second
- Kafka persistence compared to Kinesis - very critical in designing rehydration
strategies
- Batch handling abilities of Kafka - very useful for non-critical data - thumbnails.
- Tracking of offsets in a partition: left to the consumers.
- Kafka cluster management
Thank you
- Email: chopratejas@gmail.com
- LinkedIn: https://www.linkedin.com/in/chopratejas
- Twitter: @chopra_tejas

More Related Content

What's hot

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
confluent
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
HostedbyConfluent
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
confluent
 

What's hot (20)

Solving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache CassandraSolving Hybrid Cloud Data Replication with Apache Cassandra
Solving Hybrid Cloud Data Replication with Apache Cassandra
 
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
ClickHouse Paris Meetup. Pragma Analytics Software Suite w/ClickHouse, by Mat...
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform Putting Kafka Together with the Best of Google Cloud Platform
Putting Kafka Together with the Best of Google Cloud Platform
 
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
Should You Read Kafka as a Stream or in Batch? Should You Even Care? | Ido Na...
 
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, DigitalisCapacity Planning Your Kafka Cluster | Jason Bell, Digitalis
Capacity Planning Your Kafka Cluster | Jason Bell, Digitalis
 
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
Use ksqlDB to migrate core-banking processing from batch to streaming | Mark ...
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 
Introduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OKIntroduction to Data Engineer and Data Pipeline at Credit OK
Introduction to Data Engineer and Data Pipeline at Credit OK
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
Moving 150 TB of data resiliently on Kafka With Quorum Controller on Kubernet...
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift Creating an Elastic Platform Using Kafka and Microservices in OpenShift
Creating an Elastic Platform Using Kafka and Microservices in OpenShift
 
O'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data PipelinesO'Reilly Media Webcast: Building Real-Time Data Pipelines
O'Reilly Media Webcast: Building Real-Time Data Pipelines
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 

Similar to Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distribution
Jerome Lecat
 
Open stack in action enovance - cloudwatt - european ambitions for openstack
Open stack in action   enovance - cloudwatt - european ambitions for openstackOpen stack in action   enovance - cloudwatt - european ambitions for openstack
Open stack in action enovance - cloudwatt - european ambitions for openstack
eNovance
 

Similar to Going from three nines to four nines using Kafka | Tejas Chopra, Netflix (20)

stackArmor - Security MicroSummit - McAfee
stackArmor - Security MicroSummit - McAfeestackArmor - Security MicroSummit - McAfee
stackArmor - Security MicroSummit - McAfee
 
File Server and Storage Consolidation in the Cloud
File Server and Storage Consolidation in the CloudFile Server and Storage Consolidation in the Cloud
File Server and Storage Consolidation in the Cloud
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
quicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentationquicloud Apr 20 2010 Boulder New Tech Presentation
quicloud Apr 20 2010 Boulder New Tech Presentation
 
Dok Talks #122 - Operationalizing a Data Infrastructure Stack on Kubernetes
Dok Talks #122 - Operationalizing a Data Infrastructure Stack on KubernetesDok Talks #122 - Operationalizing a Data Infrastructure Stack on Kubernetes
Dok Talks #122 - Operationalizing a Data Infrastructure Stack on Kubernetes
 
CHARM(A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability)
CHARM(A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability)CHARM(A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability)
CHARM(A Cost-Efficient Multi-Cloud Data Hosting Scheme with High Availability)
 
Sdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distributionSdc2010 scality cloud storage vs object storage for distribution
Sdc2010 scality cloud storage vs object storage for distribution
 
Y. Tsesmelis, Uni Systems: Quarkus use cases and business value
Y. Tsesmelis, Uni Systems: Quarkus use cases and business valueY. Tsesmelis, Uni Systems: Quarkus use cases and business value
Y. Tsesmelis, Uni Systems: Quarkus use cases and business value
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Architecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big DataArchitecting Virtualized Infrastructure for Big Data
Architecting Virtualized Infrastructure for Big Data
 
SIGGRAPH Presentation 2016 Slides
SIGGRAPH Presentation 2016 SlidesSIGGRAPH Presentation 2016 Slides
SIGGRAPH Presentation 2016 Slides
 
Webinar: Five Reasons Modern Data Centers Need Tape
Webinar: Five Reasons Modern Data Centers Need TapeWebinar: Five Reasons Modern Data Centers Need Tape
Webinar: Five Reasons Modern Data Centers Need Tape
 
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamComputeIn-Ceph-tion: Deploying a Ceph cluster on DreamCompute
In-Ceph-tion: Deploying a Ceph cluster on DreamCompute
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Kafka spark cassandra webinar feb 16 2016
Kafka spark cassandra   webinar feb 16 2016 Kafka spark cassandra   webinar feb 16 2016
Kafka spark cassandra webinar feb 16 2016
 
Cluster computing
Cluster computingCluster computing
Cluster computing
 
OSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open SourceOSCON 15 Building Opensource wtih Open Source
OSCON 15 Building Opensource wtih Open Source
 
Widespread Cloud Adoption: What's Taking So Long?
Widespread Cloud Adoption: What's Taking So Long?Widespread Cloud Adoption: What's Taking So Long?
Widespread Cloud Adoption: What's Taking So Long?
 
Open stack in action enovance - cloudwatt - european ambitions for openstack
Open stack in action   enovance - cloudwatt - european ambitions for openstackOpen stack in action   enovance - cloudwatt - european ambitions for openstack
Open stack in action enovance - cloudwatt - european ambitions for openstack
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Going from three nines to four nines using Kafka | Tejas Chopra, Netflix

  • 1. Moving from 99.9 to 99.99 availability using Kafka Tejas Chopra (Netflix, Inc.)
  • 2. Agenda - Introduction - Problems with Cloud Storage, and ways around it - What is availability? - Uploads with 99.9% availability - Uploads with 99.99% availability - Takeaways & Lessons
  • 3. Introduction - Senior Software Engineer, Netflix - Keynote Speaker: Cloud, Distributed Systems, Blockchain - Senior Software Engineer, Box. - Datrium, Samsung, Tensilica, Apple, Inc.
  • 4. Cloud Conundrums - Cheap to put data into cloud - Pay to store it, pay even more to read it - Solution: - What if we can store a copy of data on premise? - Saves on reads - Hot data can be on-premise, archival on cloud - Security, latency, - Save millions of dollars per year - Box: petabytes, Netflix: exabytes
  • 5. Availability - What is it? - For on-premise - For Cloud - Gartner: Avg cost of downtime: $5600/min. - 99.9% : $2.8M - 99.99%: $291K
  • 6. 99.9% solution - Upload to on-premise (availability = 99.9%) - Use kafka events to trigger uploads to cloud - Reads served from on-premise if present, else fetched from cloud
  • 7. 99.99% solution - Split the incoming stream to both cloud and on-premise - Queue failed on-premise requests using Kafka - Use cloud to hydrate failed uploads on-premise - 99.99% availability
  • 8. Takeaways and Lessons - Millions of customers, and billions of files uploaded: Kafka scales without downtime. Kafka throughput: thousands of messages per second - Kafka persistence compared to Kinesis - very critical in designing rehydration strategies - Batch handling abilities of Kafka - very useful for non-critical data - thumbnails. - Tracking of offsets in a partition: left to the consumers. - Kafka cluster management
  • 9. Thank you - Email: chopratejas@gmail.com - LinkedIn: https://www.linkedin.com/in/chopratejas - Twitter: @chopra_tejas