Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
When One Data Center is not Enough
Guozhang Wang Strata San Jose, 2016
Building large-scale stream infrastructure across m...
2
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
3
Why across Data Centers?
4
Why across Data Centers
• Catastrophic / expected failures
• Routine maintenance
• Geo-locality (Example: CDNs)
5
Why NOT across Data Centers
• Low bandwidth (10Mbps - 1Gbps)
• High latency (50ms - 450ms)
• Much More $$$
6
Why NOT across Data Centers
• … is hard and expensive
7
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
8
Why NOT across Data Centers
• … is hard and expensive
• … with real-time writes? Harder
• … consistently? Oh My!
9
Consistency
• Weak
• Eventual
• Strong
Latency Guarantee
10
Weak No Consistency
• Now you see my writes, now you don’t
• Best effort only, data can be stale
• Examples: think of “...
11
Eventual Consistency
• You will see my writes, … eventually
• May need to resolve conflicts (manually)
• Examples: thin...
12
Strong Consistency
• You get what you write, for sure
• External > Sequential > Causal (Session)
• Examples: RDBMS, fil...
13
• LAN: consistency over latency
• WAN: latency over consistency
Latency vs. Consistency
14
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
15
Option I: Don’t do it
• Bunkerize the single data center
• Expect data loss at failures
• Examples: ??
16
Option II: Primary with Hot Standby
• Failover to hot standby (maybe inconsistent)
• Window of data loss at failures
• ...
17
Option III: Active-Active
• Accepts writes in multi-DC
• Resolve conflicts (strong / week consistency)
• Examples: Amaz...
18
Ordering is the Key!
19
Ordering is Key
• Vector clocks: partial ordering
• Paxos, 2PC: global ordering
• Log shipping: logical ordering (per-p...
21
Apache Kafka
• A distributed messaging system
..that store messages as a log!
22
Store Messages as a Log
4 5 5 7 8 9 10 11 12...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
M...
23
Partition the Log across Machines
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
24
ACK mode Latency On Failures
“no" no network delay some data loss
“leader" 1 network roundtrip a few data loss
“all" ~2...
25
• Why across Data Centers?
• Design patterns for Multi-DC
• Kafka for Multi-DC
• Conclusion
Agenda
26
Option I: Active-Passive Replication
Kafka
local
producers
consumer consumer
DC 1
MirrorMaker
DC 2
Kafka
replica
27
Option I: Active-Passive Replication
• Async- replication across DC
• May lose data on failover
• Example: ETL to data ...
28
Option II: Active-Active Replication
Kafka
local
Kafka
aggregate
Kafka
aggregate
producers producers
consumer consumer
...
29
Option II: Active-Active Replication
• Global view on agg. cluster
• Require offsets to resume
• Example: store materia...
30
• Offsets not identical between Kafka clusters
• Duplicates during failover
• Partition selection may be different
• So...
31
Option III: Deploy across DCs
Kafka
producers producers
consumer consumer
DC 1 DC 2
32
Option III: Deploy across DCs
• Multi-tenancy support
• Security (0.9)
• Quota Management (0.9)
• Latency optimization
...
33
• Same region: essentially same network
• asymmetric partitioning is rare, low latency
• Need at least 3 DCs for Zookee...
34
Take-aways
• Multi-DC: trade-off between latency and consistency
• Kafka: replicated log streams for multihoming
Thank you
Guozhang | guozhang@confluent.io | @guozhangwang
Meet Confluent in booth #838 

Confluent University ~ Kafka tra...
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Upcoming SlideShare
Loading in …5
×

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

6,516 views

Published on

To manage the ever-increasing volume and velocity of data within your company, you have successfully made the transition from single machines and one-off solutions to large distributed stream infrastructures in your data center, powered by Apache Kafka. But what if one data center is not enough? I will describe building resilient data pipelines with Apache Kafka that span multiple data centers and points of presence, and provide an overview of best practices and common patterns while covering key areas such as architecture guidelines, data replication, and mirroring as well as disaster scenarios and failure handling.

Published in: Engineering
  • ⇒ www.WritePaper.info ⇐ is a good website if you’re looking to get your essay written for you. You can also request things like research papers or dissertations. It’s really convenient and helpful.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Great projects with clear instructions. I've gotten an enormous feeling of accomplishment pride from making my own bookshelves and side tables. Thanks Ted! ✱✱✱ https://t.cn/A62Ygslz
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating direct: ❤❤❤ http://bit.ly/369VOVb ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Sex in your area is here: ❶❶❶ http://bit.ly/369VOVb ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • @Richard Rose As for Kafka replication to avoid split brain, You can find some info in my other slide deck: http://www.slideshare.net/GuozhangWang/building-a-replicated-logging-system-with-apache-kafka
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Building Stream Infrastructure across Multiple Data Centers with Apache Kafka

  1. 1. When One Data Center is not Enough Guozhang Wang Strata San Jose, 2016 Building large-scale stream infrastructure across multiple data centers with Apache Kafka
  2. 2. 2 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  3. 3. 3 Why across Data Centers?
  4. 4. 4 Why across Data Centers • Catastrophic / expected failures • Routine maintenance • Geo-locality (Example: CDNs)
  5. 5. 5 Why NOT across Data Centers • Low bandwidth (10Mbps - 1Gbps) • High latency (50ms - 450ms) • Much More $$$
  6. 6. 6 Why NOT across Data Centers • … is hard and expensive
  7. 7. 7 Why NOT across Data Centers • … is hard and expensive • … with real-time writes? Harder
  8. 8. 8 Why NOT across Data Centers • … is hard and expensive • … with real-time writes? Harder • … consistently? Oh My!
  9. 9. 9 Consistency • Weak • Eventual • Strong Latency Guarantee
  10. 10. 10 Weak No Consistency • Now you see my writes, now you don’t • Best effort only, data can be stale • Examples: think of “caches”, VoIP
  11. 11. 11 Eventual Consistency • You will see my writes, … eventually • May need to resolve conflicts (manually) • Examples: think of “emails”, SMTP
  12. 12. 12 Strong Consistency • You get what you write, for sure • External > Sequential > Causal (Session) • Examples: RDBMS, file systems
  13. 13. 13 • LAN: consistency over latency • WAN: latency over consistency Latency vs. Consistency
  14. 14. 14 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  15. 15. 15 Option I: Don’t do it • Bunkerize the single data center • Expect data loss at failures • Examples: ??
  16. 16. 16 Option II: Primary with Hot Standby • Failover to hot standby (maybe inconsistent) • Window of data loss at failures • Examples: MySQL binlog
  17. 17. 17 Option III: Active-Active • Accepts writes in multi-DC • Resolve conflicts (strong / week consistency) • Examples: Amazon DynamoDB (vector clock) Google Spanner (2PC), Mesa (Paxos)
  18. 18. 18 Ordering is the Key!
  19. 19. 19 Ordering is Key • Vector clocks: partial ordering • Paxos, 2PC: global ordering • Log shipping: logical ordering (per-partition)
  20. 20. 21 Apache Kafka • A distributed messaging system ..that store messages as a log!
  21. 21. 22 Store Messages as a Log 4 5 5 7 8 9 10 11 12... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3
  22. 22. 23 Partition the Log across Machines Topic 1 Topic 2 Partitions Producers Producers Consumers Consumers Brokers
  23. 23. 24 ACK mode Latency On Failures “no" no network delay some data loss “leader" 1 network roundtrip a few data loss “all" ~2 network roundtrips no data loss Configurable ISR Commits
  24. 24. 25 • Why across Data Centers? • Design patterns for Multi-DC • Kafka for Multi-DC • Conclusion Agenda
  25. 25. 26 Option I: Active-Passive Replication Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  26. 26. 27 Option I: Active-Passive Replication • Async- replication across DC • May lose data on failover • Example: ETL to data warehouse / HDFS Kafka local producers consumer consumer DC 1 MirrorMaker DC 2 Kafka replica
  27. 27. 28 Option II: Active-Active Replication Kafka local Kafka aggregate Kafka aggregate producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  28. 28. 29 Option II: Active-Active Replication • Global view on agg. cluster • Require offsets to resume • Example: store materialization, index updates Kafka local Kafka agg Kafka agg producers producers consumer consumer MirrorMaker Kafka local on DC1 failure DC 1 DC 2
  29. 29. 30 • Offsets not identical between Kafka clusters • Duplicates during failover • Partition selection may be different • Solutions • Resume from log end offset (suitable for real-time apps) • Resume from a timestamp (ListOffsets, offset index: KIP-33) Caveats: offsets across DCs
  30. 30. 31 Option III: Deploy across DCs Kafka producers producers consumer consumer DC 1 DC 2
  31. 31. 32 Option III: Deploy across DCs • Multi-tenancy support • Security (0.9) • Quota Management (0.9) • Latency optimization • Rack-aware partition assignment (0.10) • Read affinity (future?) Kafka producers producers consumer consumer DC 1 DC 2
  32. 32. 33 • Same region: essentially same network • asymmetric partitioning is rare, low latency • Need at least 3 DCs for Zookeeper • Reserved instance to reduce churns • EIP for external clients, private IPs for internal communication • Reserved instance, local storage Example: EC2 multi-AZ Deployment
  33. 33. 34 Take-aways • Multi-DC: trade-off between latency and consistency • Kafka: replicated log streams for multihoming
  34. 34. Thank you Guozhang | guozhang@confluent.io | @guozhangwang Meet Confluent in booth #838 
 Confluent University ~ Kafka training ~ confluent.io/training Join the Stream Data Hackathon Apr 25, SF
 kafka-summit.org/hackathon/ Download Apache Kafka & Confluent Platform confluent.io/download

×