Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Kafka at scale facebook israel


Published on

Notes on Running Kafka at Scale

Published in: Software

Kafka at scale facebook israel

  1. 1. 1Confidential Running Kafka At Scale Gwen Shapira
  2. 2. 2Confidential About Me System Architect @ Confluent Committer @ Apache Kafka Previously: Software Engineer @ Cloudera Senior Consultant @ Pythian Find me: @gwenshap
  3. 3. 3Confidential Kafka High Throughput Scalable Low Latency Real-time Centralized Awesome So, we are done, right?
  4. 4. 4Confidential When it comes to critical production systems – Never trust a vendor. 4
  5. 5. 5Confidential Strong Foundations Building a Kafka cluster from the hardware up
  6. 6. 6Confidential What’s Important To You? Message Retention - Disk size Message Throughput - Network capacity Producer Performance - Disk I/O Consumer Performance - Memory
  7. 7. 7Confidential Go Wide RAIS - Redundant Array of Inexpensive Servers Kafka is well-suited to horizontal scaling Also helps with CPU utilization • Kafka needs to decompress and recompress every message batch • KIP-31 will help with this by eliminating recompression Don’t co-locate Kafka
  8. 8. 8Confidential Disk Layout RAID • Can survive a single disk failure (not RAID 0) • Provides the broker with a single log directory • Eats up disk I/O JBOD • Gives Kafka all the disk I/O available • Broker is not smart about balancing partitions • If one disk fails, the entire broker stops Amazon EBS performance works!
  9. 9. 9Confidential Operating System Tuning Filesystem Options • EXT or XFS • Using unsafe mount options Virtual Memory • Swappiness • Dirty Pages Networking
  10. 10. 10Confidential Java Only use JDK 8 now Keep heap size small • Even our largest brokers use a 6 GB heap • Save the rest for page cache Garbage Collection - G1 all the way • Basic tuning only • Watch for humongous allocations
  11. 11. 11Confidential Monitoring the Foundation CPU Load Network inbound and outbound Filehandle usage for Kafka Disk • Free space - where you write logs, and where Kafka stores messages • Free inodes • I/O performance - at least average wait and percent utilization Garbage Collection
  12. 12. 12Confidential Broker Ground Rules Tuning • Stick (mostly) with the defaults • Set default cluster retention as appropriate • Default partition count should be at least the number of brokers Monitoring • Watch the right things • Don’t try to alert on everything Triage and Resolution • Solve problems, don’t mask them
  13. 13. 13Confidential Too Much Information! Monitoring teams hate Kafka • Per-Topic metrics • Per-Partition metrics • Per-Client metrics Capture as much as you can • Many metrics are useful while triaging an issue Clients want metrics on their own topics Only alert on what is needed to signal a problem
  14. 14. 14Confidential Broker Monitoring Bytes In and Out, Messages In • Why not messages out? Partitions • Count and Leader Count • Under Replicated and Offline Threads • Network pool, Request pool • Max Dirty Percent Requests • Rates and times - total, queue, local, and send
  15. 15. 15Confidential Topic Monitoring Bytes In, Bytes Out Messages In, Produce Rate, Produce Failure Rate Fetch Rate, Fetch Failure Rate Partition Bytes Quota Throttling Log End Offset • Why bother? • KIP-32 will make this unnecessary Provide this to your customers for them to alert on
  16. 16. 16Confidential Staying out of Trouble
  17. 17. 17Confidential Anticipating Trouble Trend cluster utilization and growth over time Use default configurations for quotas and retention to require customers to talk to you Monitor request times • If you are able to develop a consistent baseline, this is early warning
  18. 18. 18Confidential Under Replicated Partitions Count of number of partitions which are not fully replicated within the cluster Also referred to as “replica lag” Primary indicator of problems within the cluster
  19. 19. 19Confidential Appropriately Sizing Topics Topics are “Logical” – data modeling is based on data and consumers Number of partitions: • How many brokers do you have in the cluster? • How many consumers do you have? • Do you have specific partition requirements? Keeping partition sizes manageable Don’t have too many partitions
  20. 20. 20Confidential Choosing Topics/Partitions More partitions  higher throughput • t: target throughput, p: producer throughput per partition, c: consumer throughput per partition • max(t/p, t/c) Downside with more partitions • requires more open file handle • may increase unavailability • may increase end-to-end latency • may require more memory in the client Rule of thumb • 2-4 K partitions per broker • 10s K partitions per cluster
  21. 21. 21Confidential Tuning 101
  22. 22. 22Confidential Broker Performance Checks Are all the brokers in the cluster working? Are the network interfaces saturated? • Reelect partition leaders • Rebalance partitions in the cluster • Spread out traffic more (increase partitions or brokers) Is the CPU utilization high? (especially iowait) • Is another process competing for resources? • Look for a bad disk Are you still running 0.8? Do you have really big messages?
  23. 23. 23Confidential Anatomy of a Produce Request Network Network Threads IO Thread s PageCache Purgatory (map) Request Queue Respons e Queue Other BrokersOther BrokersOther Brokers Do other replicas need to confirm?
  24. 24. 24Confidential Anatomy of a Fetch Request Network Network Threads IO Thread s PageCache Purgatory Map Request Queue Respon se Queue Has consumer property or fetch.min.bytes been exceeded?
  25. 25. 25Confidential Monitoring Requests Network Network Threads IO Thread s PageCache Purgatory Request Queue Respons e Queue Other BrokersOther BrokersOther Brokers Response Queue Time Response Send Time Request Local Time Request Queue Time Response Remote Time
  26. 26. 26Confidential Configuring Requests Network Networ k Thread s IO Thread s PageCache Purgatory Request Queue Respon se Queue Other Brokers Other Brokers Other Brokers log.flush.interval.messages queued.max.requests replica.fetch.min.bytes socket.request.max.byt es socket.receive.buffer.bytes socket.send.buffer.bytes
  27. 27. 27Confidential Client Tuning
  28. 28. 28Confidential The basics App Client Broker Broker Broker Broker
  29. 29. 29Confidential How do we know it’s the app? Try Perf tool Slow? OK, actually? Try Perf tool on the Broker Probably the app Slow? Either the broker or Max capacity or Configuration OK, actually? Network
  30. 30. 30Confidential Application Threads Producer Batch 1 Batch 2 Batch 3 Broker Broker Broker Broker Fail ? Broker Broker Send(Record) Metadata / Exception
  31. 31. 31Confidential Application Threads Producer Batch 1 Batch 2 Batch 3 Broker Broker Broker Broker Fail ? Broker Broker Send(Record) Metadata / Exception waiting-threads Request-latency Batch-size Compression-rate Record-queue-time Record-send-rate Records-per-request Record-retry-rate Record-error-rate
  32. 32. 32Confidential Application Threads Producer Batch 1 Batch 2 Batch 3 Broker Broker Broker Broker Fail ? Broker Broker Send(Record) Metadata / Exception Add threads Async Add producers Batch.size Send.buffer.bytes Receive.buffer.bytes compression acks
  33. 33. 33Confidential Send() API Sync = Slow producer.send(record).get(); Async producer.send(record); Or producer.send( record, new Callback() );
  34. 34. 34Confidential Batch.size vs • Batch will be sent as soon as it is full • Therefore small batch size can decrease throughput • Increase batch size if the producer is running near saturation • If consistently sending near-empty batchs – increase to will add a bit of latency, but improve throughput
  35. 35. 35Confidential Reminder! Consumers typically live in “consumer groups” Partitions in topics are balanced between consumers in groups Topic T1 Partition 0 Partition 1 Partition 2 Partition 3 Consumer Group 1 Consumer 1 Consumer 2
  36. 36. 36Confidential My Consumer is not just slow – it is hanging! • There are no messages available (try perf consumer) • Next message is too large • Perpetual rebalance • Not polling enough • Multiple consumers in same group in same thread
  37. 37. 37Confidential Rebalances are the consumer performance killer Consumers must keep polling Or they die. When consumers die, the group rebalances. When the group rebalances, it does not consume.
  38. 38. 38Confidential Min.fetch.bytes vs. max.wait • What if the topic doesn’t have much data? • “Are we there yet?” “and now?” • Reduce load on broker by letting fetch requests wait a bit for data • Add latency to increase throughput • Careful! Don’t fetch more than you can process!
  39. 39. 39Confidential Commits take time • Commit less often • Commit async
  40. 40. 40Confidential Add partitions • Consumer throughput is often limited by target • i.e. you can only write to HDFS so fast (and it aint fast) • My SLA is 1GB/s but single-client HDFS writes are 20MB/s • If each consumer writes to HDFS – you need 50 consumers • Which means you need 50 partitions • Except sometimes adding partitions is a bitch • So do the math first
  41. 41. 41Confidential I need to get data from Dallas to AWS • Put the consumer far from Kafka • Because failure to pull data is safer than failure to push • Tune network parameters in Client, Kafka and both OS •Send buffer -> bandwidth X delay •Receive buffer •Fetch.min.bytes This will maximize use of bandwidth. Note that cheap AWS nodes have low bandwidth
  42. 42. 42Confidential Monitor • records-lag-max • Burrow is useful here • fetch-rate • fetch-latency • records-per-request / bytes-per-request Apologies on behalf of Kafka community. We forgot to document metrics for the new consumer
  43. 43. 43Confidential