The document discusses using Apache Kafka to improve data upload availability from 99.9% to 99.99% when moving data between on-premise and cloud storage. It describes using Kafka to trigger uploads to the cloud from on-premise storage with 99.9% availability and using Kafka to split uploads between cloud and on-premise storage as well as rehydrating failed on-premise uploads from the cloud to achieve 99.99% availability. The presentation concludes that Kafka provides high throughput and persistence needed to design effective data rehydration strategies across cloud and on-premise storage for very high availability.
Going from three nines to four nines using Kafka | Tejas Chopra, Netflix
1. Moving from 99.9 to 99.99
availability using Kafka
Tejas Chopra (Netflix, Inc.)
2. Agenda
- Introduction
- Problems with Cloud Storage, and ways around it
- What is availability?
- Uploads with 99.9% availability
- Uploads with 99.99% availability
- Takeaways & Lessons
4. Cloud Conundrums
- Cheap to put data into cloud
- Pay to store it, pay even more to read it
- Solution:
- What if we can store a copy of data on
premise?
- Saves on reads
- Hot data can be on-premise, archival on
cloud
- Security, latency,
- Save millions of dollars per year
- Box: petabytes, Netflix: exabytes
5. Availability
- What is it?
- For on-premise
- For Cloud
- Gartner: Avg cost of downtime:
$5600/min.
- 99.9% : $2.8M
- 99.99%: $291K
6. 99.9% solution
- Upload to on-premise (availability =
99.9%)
- Use kafka events to trigger uploads to
cloud
- Reads served from on-premise if
present, else fetched from cloud
7. 99.99% solution
- Split the incoming stream to both
cloud and on-premise
- Queue failed on-premise requests
using Kafka
- Use cloud to hydrate failed uploads
on-premise
- 99.99% availability
8. Takeaways and Lessons
- Millions of customers, and billions of files uploaded: Kafka scales without
downtime. Kafka throughput: thousands of messages per second
- Kafka persistence compared to Kinesis - very critical in designing rehydration
strategies
- Batch handling abilities of Kafka - very useful for non-critical data - thumbnails.
- Tracking of offsets in a partition: left to the consumers.
- Kafka cluster management