Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

More Datacenters, More Problems

3,887 views

Published on

Presented at Kafka Summit 2016

Operating out of multiple datacenters is a large part of most disaster recovery plans, but it brings extra complications to our data pipelines. Instead of having a straight path from front to back, it now has forks and dead ends and odd little use cases that don’t match up with a perfect view of the world. This talk will focus on how to best utilize Apache Kafka in this world, including basic architectures for multi-datacenter and multi-tier clusters. We will also touch on how to assure messages make it from producer to consumer, and how to monitor the entire ecosystem.

Published in: Data & Analytics
  • The "Magical" Transformation That Happens When You Combine Two Of The Best Brain Reprogramming Technologies  http://ishbv.com/manifmagic/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

More Datacenters, More Problems

  1. 1. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. More Datacenters, More Problems
  2. 2. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Todd Palino
  3. 3. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Who Am I?  Kafka, Samza, and Zookeeper SRE at LinkedIn  Site Reliability Engineering – Administrators – Architects – Developers  Keep the site running, always 3
  4. 4. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Data @ LinkedIn is Hiring!  Streams Infrastructure – Kafka pub/sub ecosystem – Stream Processing Platform built on Apache Samza – Next Generation change capture technology (incubating)  Join us in working on cutting edge stream processing infrastructures – Please contact kparamasivam@linkedin.com – Software developers and Site Reliability Engineers at all levels  LinkedIn Data Infrastructure Meetup – Where: LinkedIn @ 2061 Stierlin Court, Mountain View – When: May 11th at 6:30 PM – Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup 4
  5. 5. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Kafka At LinkedIn  1100+ Kafka brokers  Over 32,000 topics  350,000+ Partitions  875 Billion messages per day  185 Terabytes In  675 Terabytes Out  Peak Load (whole site) – 10.5 Million messages/sec – 18.5 Gigabits/sec Inbound – 70.5 Gigabits/sec Outbound 5  1800+ Kafka brokers  Over 95,000 topics  1,340,000+ Partitions  1.3 Trillion messages per day  330 Terabytes In  1.2 Petabytes Out  Peak Load (single cluster) – 2 Million messages/sec – 4.7 Gigabits/sec Inbound – 15 Gigabits/sec Outbound
  6. 6. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. What Will We Talk About?  Tiered Cluster Architecture  Multiple Datacenters  Performance Concerns  Conclusion 6
  7. 7. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Tiered Cluster Architecture 7
  8. 8. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. One Kafka Cluster 8
  9. 9. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Single Cluster – Remote Clients 9
  10. 10. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Single Cluster – Spanning Datacenters 10
  11. 11. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Multiple Clusters – Local and Remote Clients 11
  12. 12. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Multiple Clusters – Message Aggregation 12
  13. 13. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Why Not Direct?  Network Concerns – Bandwidth – Network partitioning – Latency  Security Concerns – Firewalls and ACLs – Encrypting data in transit  Resource Concerns – A misbehaving application can swamp production resources 13
  14. 14. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. What Do We Lose?  You may lose message ordering – Mirror maker breaks apart message batches and redistributes them  You may lose key to partition affinity – Mirror maker will partition based on the key – Differing partition counts in source and target will result in differing distribution – Mirror maker does not (without work) honor custom partitioning 14
  15. 15. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Multiple Datacenters 15
  16. 16. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Why Multiple Datacenters?  Disaster Recovery 16  Geolocalization  Legal / Political
  17. 17. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Planning For the Worst  Keep your datacenters identical – Snowflake services are hard to fail over  Services that need to run only once in the infrastructure need to coordinate – Zookeeper across sites can be used for this (at least 3 sites) – Moving from one Kafka cluster to another is hard – KIP-33: Time-based log indexing  What about AWS (or Google Cloud)? – Disaster recovery can be accomplished via availiability zones 17
  18. 18. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Planning For the Best  Not much different than designing for disasters  Consider having a limited number of aggregate clusters – Every copy of a message costs you money – Have 2 downstream (or super) sites that process aggregate messages – Push results back out to all sites – Good place for stream processing, like Samza  What about AWS (or Google Cloud)? – Geolocalization can be accomplished via regions 18
  19. 19. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Segregating Data  All the data, available everywhere  The right data, only where we need it  Topics are (usually) the smallest unit we want to deal with – Mirroring a topic between clusters is easy – Filtering that mirror is much harder  Have enough topics so consumers do not throw away messages 19
  20. 20. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Retaining Data 20  Retain data only as long as you have to  Move data away from the front lines as quickly as possible.
  21. 21. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Performance Concerns 21
  22. 22. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Buy The Book! 22 Early Access available now. Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting. Also discusses stream processing and other use cases.
  23. 23. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Kafka Cluster Sizing  How big for your local cluster? – How much disk space do you have? – How much network bandwidth do you have? – CPU, memory, disk I/O  How big for your aggregate cluster? – In general, multiple the number of brokers by the number of local clusters – May have additional concerns with lots of consumers 23
  24. 24. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Topic Configuration  Partition Counts for Local – Many theories on how to do this correctly, but the answer is “it depends” – How many consumers do you have? – Do you have specific partition requirements? – Keeping partition sizes manageable  Partition Counts for Aggregate – Multiply the number of partitions in a local cluster by the number of local clusters – Periodically review partition counts in all clusters  Message Retention – If aggregate is where you really need the messages, only retain it in local for long enough to cover mirror maker problems 24
  25. 25. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Where Do I Put Mirror Makers?  Best practice was to keep the mirror maker local to the target cluster – TLS (can) make this a new game  In the datacenter with the produce cluster – Fewer issues from networking – Significant performance hit when using TLS on consume  In the datacenter with the consume cluster – Highest performance TLS – Potential to consume messages and drop on produce 25
  26. 26. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Mirror Maker Sizing  Number of servers and streams – Size the number of servers based on the peak bytes per second – Co-locate mirror makers – Run more mirror makers in an instance than you need – Use multiple consumer and producer streams  Other tunables to look at – Partition assignment strategy – In flight requests per connection – Linger time 26
  27. 27. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Segregation of Topics  Not all topics are created equal  High Priority Topics – Topics that change search results – Topics used for hourly or daily reporting  Low Latency Topics  Run a separate mirror maker for these topics – One bloated topic won’t affect reporting – Restarting the mirror maker takes less time – Less time to catch up when you fall behind 27
  28. 28. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Conclusion 28
  29. 29. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Broker Improvements  Namespaces – Namespace topics by datacenter – Eliminate local clusters and just have aggregate – Significant hardware savings  JBOD Fixes – Intelligent partition assignment – Admin tools to move partitions between mount points – Broker should not fail completely with a single disk failure 29
  30. 30. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Mirror Maker Improvements  Identity Mirror Maker – Messages in source partition 0 get produced directly to partition 0 – No decompression of message batches – Keeps key to partition affinity, supporting custom partitioning – Requires mirror maker to maintain downstream partition counts  Multi-Consumer Mirror Maker – A single mirror maker that consumes from more than one cluster – Reduces the number of copies of mirror maker that need to be running – Forces a produce-local architecture, however 30
  31. 31. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Administrative Improvements  Multiple cluster management – Topic management across clusters – Visualization of mirror maker paths  Better client monitoring – Burrow for consumer monitoring – No open source solution for producer monitoring (audit)  End-to-end availability monitoring 31
  32. 32. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Getting Involved With Kafka  http://kafka.apache.org  Join the mailing lists – users@kafka.apache.org – dev@kafka.apache.org  irc.freenode.net - #apache-kafka  Meetups – Apache Kafka - http://www.meetup.com/http-kafka-apache-org – Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/  Contribute code 32
  33. 33. SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. Data @ LinkedIn is Hiring!  Streams Infrastructure – Kafka pub/sub ecosystem – Stream Processing Platform built on Apache Samza – Next Generation change capture technology (incubating)  Join us in working on cutting edge stream processing infrastructures – Please contact kparamasivam@linkedin.com – Software developers and Site Reliability Engineers at all levels  LinkedIn Data Infrastructure Meetup – Where: LinkedIn @ 2061 Stierlin Court, Mountain View – When: May 11th at 6:30 PM – Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup 33

×