Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
When One Data
Center Is Not Enough
Building Large-scale Stream Infrastructures Across
Multiple Data Centers
with Apache Ka...
There’s a book on that!
Actually… a chapter
Outline
Kafka overview
Common multi data center patterns
Future stuff
What is Kafka?
▪ It’s like a message queue, right?
-Actually, it’s a “distributed commit log”
-Or “streaming data platform...
Topics and Partitions
▪ Messages are organized into topics, and each topic is split into partitions.
- Each partition is a...
Scalable consumption model
Topic T1
Partition 0
Partition 1
Partition 2
Partition 3
Consumer Group 1
Consumer 1
Topic T1
P...
Kafka usage
Common use case
Large scale real time data integration
Other use cases
Scaling databases
Messaging
Stream processing
…
Important things to remember:
1. Consumers offset commits
2. Within a cluster – each partition has replicas
3. Inter-clust...
Why multiple data centers (DC)
Offload work from main cluster
Disaster recovery
Geo-localization
• Saving cross-DC bandwid...
Why is this difficult?
1. It isn’t, really – you consume data from one cluster and produce to another
2. Network between t...
Pattern #1: stretched cluster
Typically done on AWS in a single region
• Deploy Zookeeper and broker across 3 availability...
On DC failure
Producer/consumer fail over to new DCs
• Existing data preserved by intra-cluster replication
• Consumer res...
When DC comes back
Intra cluster replication auto re-replicates all missing data
When re-replication completes, switch pro...
Be careful with replica assignment
Don’t want all replicas in same AZ
Rack-aware support in 0.10.0
• Configure brokers in ...
Stretched cluster NOT recommended across regions
Asymmetric network partitioning
Longer network latency => longer produce/...
Pattern #2: active/passive
Producers in active DC
Consumers in either active or passive DC
Kafka
producers
consumer
s
DC 1...
Cross Datacenter Replication
Consumer & Producer: read from a source cluster and write to a target cluster
Per-key orderin...
On active DC failure
Fail over producers/consumers to passive cluster
Challenge: which offset to resume consumption
• Offs...
Solutions for switching consumers
Resume from smallest offset
• Duplicates
Resume from largest offset
• May miss some mess...
When DC comes back
Need to reverse replication
• Same challenge: determining the offsets
Kafka
producers
consumer
s
DC 1
R...
Limitations
Reconfiguration of replication after failover
Resources in passive DC under utilized
Pattern #3: active/active
Local  aggregate replication to avoid cycles
Producers/consumers in both DCs
• Producers only w...
On DC failure
Same challenge on moving consumers on aggregate cluster
• Offsets in the 2 aggregate cluster not identical
•...
SF
Kafka
Cluster
Houston
Kafka
Cluster
All
apps
All
apps
West coast
Users
South Central
Users
When DC comes back
No need to reconfigure replication
Kafka
local
Kafka
aggregat
e
Kafka
aggregat
e
producers producer
s
c...
Alternative: avoid aggregate clusters
Prefix topic names with DC tag
Configure replication to replicate remote topics only...
Beyond 2 DCs
More DCs  better resource utilization
• With 2 DCs, each DC needs to provision 100% traffic
• With 3 DCs, ea...
Comparison
Pros Cons
Stretched • Better utilization of resources
• Easy failover for consumers
• Still need cross region s...
Multi-DC beyond Kafka
Kafka often used together with other data stores
Need to make sure multi-DC strategy is consistent
Example application
Consumer reads from Kafka and computes 1-min count
Counts need to be stored in DB and available in eve...
Independent database per DC
Run same consumer concurrently in both DCs
• No consumer failover needed
Kafka
local
Kafka
agg...
Stretched database across DCs
Only run one consumer per DC at any given point of time
Kafka
local
Kafka
aggregat
e
Kafka
a...
Practical tips
• Consume remote, produce local
• Unless you need encrypted data on the wire
• Monitor!
• Burrow for replic...
Future work
Offset reset tool
Offset preservation
“Remote Replicas”
2-DC stretch cluster
Other cool Kafka future:
• Exactl...
THANK YOU!
Gwen Shapira| gwen@confluent.io | @gwenshap
Kafka Training with Confluent University
• Kafka Developer and Oper...
Discount code: kafstrata
Special Strata Attendee discount code = 25% off
www.kafka-summit.org
Kafka Summit New York: May 8...
Multi-Datacenter Kafka - Strata San Jose 2017
Upcoming SlideShare
Loading in …5
×

Multi-Datacenter Kafka - Strata San Jose 2017

1,600 views

Published on

Strategies and Tips for Running Kafka in Multiple Data-Centers.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Multi-Datacenter Kafka - Strata San Jose 2017

  1. 1. When One Data Center Is Not Enough Building Large-scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka Gwen Shapira
  2. 2. There’s a book on that! Actually… a chapter
  3. 3. Outline Kafka overview Common multi data center patterns Future stuff
  4. 4. What is Kafka? ▪ It’s like a message queue, right? -Actually, it’s a “distributed commit log” -Or “streaming data platform” 0 1 2 3 4 5 6 7 8 Data Source Data Consumer A Data Consumer B
  5. 5. Topics and Partitions ▪ Messages are organized into topics, and each topic is split into partitions. - Each partition is an immutable, time-sequenced log of messages on disk. - Note that time ordering is guaranteed within, but not across, partitions. 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Partition 0 Partition 1 Partition 2 Data Source Topic
  6. 6. Scalable consumption model Topic T1 Partition 0 Partition 1 Partition 2 Partition 3 Consumer Group 1 Consumer 1 Topic T1 Partition 0 Partition 1 Partition 2 Partition 3 Consumer Group 1 Consumer 1 Consumer 2 Consumer 3 Consumer 4
  7. 7. Kafka usage
  8. 8. Common use case Large scale real time data integration
  9. 9. Other use cases Scaling databases Messaging Stream processing …
  10. 10. Important things to remember: 1. Consumers offset commits 2. Within a cluster – each partition has replicas 3. Inter-cluster replication, producer and consumer defaults – all tuned for LAN
  11. 11. Why multiple data centers (DC) Offload work from main cluster Disaster recovery Geo-localization • Saving cross-DC bandwidth • Better performance by being closer to users • Some activity is just local • Security / regulations Cloud Special case: Producers with network issues
  12. 12. Why is this difficult? 1. It isn’t, really – you consume data from one cluster and produce to another 2. Network between two data centers can get tricky 3. Consumers have state (offsets) – syncing this between clusters get tough • And leads to some counter intuitive results
  13. 13. Pattern #1: stretched cluster Typically done on AWS in a single region • Deploy Zookeeper and broker across 3 availability zones Rely on intra-cluster replication to replica data across DCs Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  14. 14. On DC failure Producer/consumer fail over to new DCs • Existing data preserved by intra-cluster replication • Consumer resumes from last committed offsets and will see same data Kafka producers consumer s DC 1 DC 3DC 2 producers consumer s
  15. 15. When DC comes back Intra cluster replication auto re-replicates all missing data When re-replication completes, switch producer/consumer back Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  16. 16. Be careful with replica assignment Don’t want all replicas in same AZ Rack-aware support in 0.10.0 • Configure brokers in same AZ with same broker.rack Manual assignment pre 0.10.0
  17. 17. Stretched cluster NOT recommended across regions Asymmetric network partitioning Longer network latency => longer produce/consume time Cross region bandwidth: no read affinity in Kafka region 1 Kafk a ZK region 2 Kafk a ZK region 3 Kafk a ZK
  18. 18. Pattern #2: active/passive Producers in active DC Consumers in either active or passive DC Kafka producers consumer s DC 1 Replication DC 2 Kafka consumer s Critical Apps Nice Reports
  19. 19. Cross Datacenter Replication Consumer & Producer: read from a source cluster and write to a target cluster Per-key ordering preserved Asynchronous: target always slightly behind Offsets not preserved • Source and target may not have same # partitions • Retries for failed writes Options: • Confluent Multi-Datacenter Replication • MirrorMaker
  20. 20. On active DC failure Fail over producers/consumers to passive cluster Challenge: which offset to resume consumption • Offsets not identical across clusters Kafka producers consumer s DC 1 Replication DC 2 Kafka
  21. 21. Solutions for switching consumers Resume from smallest offset • Duplicates Resume from largest offset • May miss some messages (likely acceptable for real time consumers) Replicate offsets topic • May miss some messages, may get duplicates Set offset based on timestamp • Old API hard to use and not precise • Better and more precise API in Apache Kafka 0.10.1 (Confluent 3.1) • Nice tool coming up! Preserve offsets during replication • Harder to do
  22. 22. When DC comes back Need to reverse replication • Same challenge: determining the offsets Kafka producers consumer s DC 1 Replication DC 2 Kafka
  23. 23. Limitations Reconfiguration of replication after failover Resources in passive DC under utilized
  24. 24. Pattern #3: active/active Local  aggregate replication to avoid cycles Producers/consumers in both DCs • Producers only write to local clusters Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  25. 25. On DC failure Same challenge on moving consumers on aggregate cluster • Offsets in the 2 aggregate cluster not identical • Unless the consumers are continuously running in both clusters Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  26. 26. SF Kafka Cluster Houston Kafka Cluster All apps All apps West coast Users South Central Users
  27. 27. When DC comes back No need to reconfigure replication Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  28. 28. Alternative: avoid aggregate clusters Prefix topic names with DC tag Configure replication to replicate remote topics only Consumers need to subscribe to topics with both DC tags Kafka producers consumers DC 1 Replication DC 2 Kafka producers consumers
  29. 29. Beyond 2 DCs More DCs  better resource utilization • With 2 DCs, each DC needs to provision 100% traffic • With 3 DCs, each DC only needs to provision 50% traffic Setting up replication with many DCs can be daunting • Only set up aggregate clusters in 2-3
  30. 30. Comparison Pros Cons Stretched • Better utilization of resources • Easy failover for consumers • Still need cross region story Active/passive • Needed for global ordering • Harder failover for consumers • Reconfiguration during failover • Resource under-utilization Active/active • Better utilization of resources • Can be used to avoid consumer failover • Can be challenging to manage • More replication bandwidth
  31. 31. Multi-DC beyond Kafka Kafka often used together with other data stores Need to make sure multi-DC strategy is consistent
  32. 32. Example application Consumer reads from Kafka and computes 1-min count Counts need to be stored in DB and available in every DC
  33. 33. Independent database per DC Run same consumer concurrently in both DCs • No consumer failover needed Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer Replication Kafka local DC 1 DC 2 DB DB
  34. 34. Stretched database across DCs Only run one consumer per DC at any given point of time Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer Replication Kafka local DC 1 DC 2 DB DB on failover
  35. 35. Practical tips • Consume remote, produce local • Unless you need encrypted data on the wire • Monitor! • Burrow for replication lag • Confluent Control Center for end-to-end • JMX metrics for rates and “busy-ness” • Tune! • Producer / Consumer tuning • Number of consumers, producers • TCP tuning for WAN • Don’t forget to replicate configuration • Separate critical topics from nice-to-have topics
  36. 36. Future work Offset reset tool Offset preservation “Remote Replicas” 2-DC stretch cluster Other cool Kafka future: • Exactly Once • Transactions • Headers
  37. 37. THANK YOU! Gwen Shapira| gwen@confluent.io | @gwenshap Kafka Training with Confluent University • Kafka Developer and Operations Courses • Visit www.confluent.io/training Want more Kafka? • Download Confluent Platform Enterprise at http://www.confluent.io/product • Apache Kafka 0.10.2 upgrade documentation at http://docs.confluent.io/3.2.0/upgrade.html • Kafka Summit recordings now available at http://kafka-summit.org/schedule/
  38. 38. Discount code: kafstrata Special Strata Attendee discount code = 25% off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by

×