Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multi-Datacenter Kafka - Strata San Jose 2017

2,838 views

Published on

Strategies and Tips for Running Kafka in Multiple Data-Centers.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Multi-Datacenter Kafka - Strata San Jose 2017

  1. 1. When One Data Center Is Not Enough Building Large-scale Stream Infrastructures Across Multiple Data Centers with Apache Kafka Gwen Shapira
  2. 2. There’s a book on that! Actually… a chapter
  3. 3. Outline Kafka overview Common multi data center patterns Future stuff
  4. 4. What is Kafka? ▪ It’s like a message queue, right? -Actually, it’s a “distributed commit log” -Or “streaming data platform” 0 1 2 3 4 5 6 7 8 Data Source Data Consumer A Data Consumer B
  5. 5. Topics and Partitions ▪ Messages are organized into topics, and each topic is split into partitions. - Each partition is an immutable, time-sequenced log of messages on disk. - Note that time ordering is guaranteed within, but not across, partitions. 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8 Partition 0 Partition 1 Partition 2 Data Source Topic
  6. 6. Scalable consumption model Topic T1 Partition 0 Partition 1 Partition 2 Partition 3 Consumer Group 1 Consumer 1 Topic T1 Partition 0 Partition 1 Partition 2 Partition 3 Consumer Group 1 Consumer 1 Consumer 2 Consumer 3 Consumer 4
  7. 7. Kafka usage
  8. 8. Common use case Large scale real time data integration
  9. 9. Other use cases Scaling databases Messaging Stream processing …
  10. 10. Important things to remember: 1. Consumers offset commits 2. Within a cluster – each partition has replicas 3. Inter-cluster replication, producer and consumer defaults – all tuned for LAN
  11. 11. Why multiple data centers (DC) Offload work from main cluster Disaster recovery Geo-localization • Saving cross-DC bandwidth • Better performance by being closer to users • Some activity is just local • Security / regulations Cloud Special case: Producers with network issues
  12. 12. Why is this difficult? 1. It isn’t, really – you consume data from one cluster and produce to another 2. Network between two data centers can get tricky 3. Consumers have state (offsets) – syncing this between clusters get tough • And leads to some counter intuitive results
  13. 13. Pattern #1: stretched cluster Typically done on AWS in a single region • Deploy Zookeeper and broker across 3 availability zones Rely on intra-cluster replication to replica data across DCs Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  14. 14. On DC failure Producer/consumer fail over to new DCs • Existing data preserved by intra-cluster replication • Consumer resumes from last committed offsets and will see same data Kafka producers consumer s DC 1 DC 3DC 2 producers consumer s
  15. 15. When DC comes back Intra cluster replication auto re-replicates all missing data When re-replication completes, switch producer/consumer back Kafka producers consumer s DC 1 DC 3DC 2 producersproducers consumer s consumer s
  16. 16. Be careful with replica assignment Don’t want all replicas in same AZ Rack-aware support in 0.10.0 • Configure brokers in same AZ with same broker.rack Manual assignment pre 0.10.0
  17. 17. Stretched cluster NOT recommended across regions Asymmetric network partitioning Longer network latency => longer produce/consume time Cross region bandwidth: no read affinity in Kafka region 1 Kafk a ZK region 2 Kafk a ZK region 3 Kafk a ZK
  18. 18. Pattern #2: active/passive Producers in active DC Consumers in either active or passive DC Kafka producers consumer s DC 1 Replication DC 2 Kafka consumer s Critical Apps Nice Reports
  19. 19. Cross Datacenter Replication Consumer & Producer: read from a source cluster and write to a target cluster Per-key ordering preserved Asynchronous: target always slightly behind Offsets not preserved • Source and target may not have same # partitions • Retries for failed writes Options: • Confluent Multi-Datacenter Replication • MirrorMaker
  20. 20. On active DC failure Fail over producers/consumers to passive cluster Challenge: which offset to resume consumption • Offsets not identical across clusters Kafka producers consumer s DC 1 Replication DC 2 Kafka
  21. 21. Solutions for switching consumers Resume from smallest offset • Duplicates Resume from largest offset • May miss some messages (likely acceptable for real time consumers) Replicate offsets topic • May miss some messages, may get duplicates Set offset based on timestamp • Old API hard to use and not precise • Better and more precise API in Apache Kafka 0.10.1 (Confluent 3.1) • Nice tool coming up! Preserve offsets during replication • Harder to do
  22. 22. When DC comes back Need to reverse replication • Same challenge: determining the offsets Kafka producers consumer s DC 1 Replication DC 2 Kafka
  23. 23. Limitations Reconfiguration of replication after failover Resources in passive DC under utilized
  24. 24. Pattern #3: active/active Local  aggregate replication to avoid cycles Producers/consumers in both DCs • Producers only write to local clusters Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  25. 25. On DC failure Same challenge on moving consumers on aggregate cluster • Offsets in the 2 aggregate cluster not identical • Unless the consumers are continuously running in both clusters Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  26. 26. SF Kafka Cluster Houston Kafka Cluster All apps All apps West coast Users South Central Users
  27. 27. When DC comes back No need to reconfigure replication Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer s consumer s Replication Kafka local DC 1 DC 2 consumer s consumer s
  28. 28. Alternative: avoid aggregate clusters Prefix topic names with DC tag Configure replication to replicate remote topics only Consumers need to subscribe to topics with both DC tags Kafka producers consumers DC 1 Replication DC 2 Kafka producers consumers
  29. 29. Beyond 2 DCs More DCs  better resource utilization • With 2 DCs, each DC needs to provision 100% traffic • With 3 DCs, each DC only needs to provision 50% traffic Setting up replication with many DCs can be daunting • Only set up aggregate clusters in 2-3
  30. 30. Comparison Pros Cons Stretched • Better utilization of resources • Easy failover for consumers • Still need cross region story Active/passive • Needed for global ordering • Harder failover for consumers • Reconfiguration during failover • Resource under-utilization Active/active • Better utilization of resources • Can be used to avoid consumer failover • Can be challenging to manage • More replication bandwidth
  31. 31. Multi-DC beyond Kafka Kafka often used together with other data stores Need to make sure multi-DC strategy is consistent
  32. 32. Example application Consumer reads from Kafka and computes 1-min count Counts need to be stored in DB and available in every DC
  33. 33. Independent database per DC Run same consumer concurrently in both DCs • No consumer failover needed Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer Replication Kafka local DC 1 DC 2 DB DB
  34. 34. Stretched database across DCs Only run one consumer per DC at any given point of time Kafka local Kafka aggregat e Kafka aggregat e producers producer s consumer consumer Replication Kafka local DC 1 DC 2 DB DB on failover
  35. 35. Practical tips • Consume remote, produce local • Unless you need encrypted data on the wire • Monitor! • Burrow for replication lag • Confluent Control Center for end-to-end • JMX metrics for rates and “busy-ness” • Tune! • Producer / Consumer tuning • Number of consumers, producers • TCP tuning for WAN • Don’t forget to replicate configuration • Separate critical topics from nice-to-have topics
  36. 36. Future work Offset reset tool Offset preservation “Remote Replicas” 2-DC stretch cluster Other cool Kafka future: • Exactly Once • Transactions • Headers
  37. 37. THANK YOU! Gwen Shapira| gwen@confluent.io | @gwenshap Kafka Training with Confluent University • Kafka Developer and Operations Courses • Visit www.confluent.io/training Want more Kafka? • Download Confluent Platform Enterprise at http://www.confluent.io/product • Apache Kafka 0.10.2 upgrade documentation at http://docs.confluent.io/3.2.0/upgrade.html • Kafka Summit recordings now available at http://kafka-summit.org/schedule/
  38. 38. Discount code: kafstrata Special Strata Attendee discount code = 25% off www.kafka-summit.org Kafka Summit New York: May 8 Kafka Summit San Francisco: August 28 Presented by

×