Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
KAFKA TO THE MAXKA By Matt Andruff
Kafka Performance Tuning
Welcome!
Matt Andruff - Hortonworks Practice lead @ Yoppworks
@MattAndruff
Because I get asked a
lot...Yoppworks
Because I get asked a
lot...Yoppworks
Because I get asked a
lot...Yoppworks
Performance Tuning...
Agenda
• Performance tuning - Just some quick points
• What you can change
• Simple changes
• Kafka Configuration Changes
...
Perfomance Tuning
What do you need to make changes?
Performance tuning
There is no magic bullet
Guesses are just Guesses
Empirical fact requires testing
Requires hardware, SM...
Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Everyone (Every...
Performance tuning
The better your load tests are the better your tuning will be.
Garbage in, Garbage out.
Everyone client...
Beyond Tuning
What your boss understands:
Beyond Tuning
What you understand:
First a minor detour to the OS
I promise to move fast but it can’t be ignored.
To be complete we need to cover some of the...
Which OS to use?
The basics
● Noatime
○ removes last access time from files
○ Save’s a write on read.
The basics
● Ext 4 is widely in use
● XFS has shown better performance
metrics
https://kafka.apache.org/documentation.html...
JVM settings
export $KAFKA_JVM_PERFORMANCE_OPTS = ‘...’
Java 1.8
-Xmx6g -Xms6g -XX:MetaspaceSize=96m -XX:+UseG1GC
-XX:MaxG...
The basics
● File descriptor limits
○ Per broker Partitions * segments +
Overhead
■ Watch this when you upgrade to 0.10
● ...
The basics
● Kafka Data should be on its own disks
● If you encounter read/write issues add
more disks
● Each data folder ...
Latest is the Greatest
● Have you upgraded to 0.10
● Add 8 bytes of time stamp
○ Not great for small messages.
● No longer...
Defaults are your friends
Defaults are your friends
The default when you drive is to put on your seatbelt.
If you are going to change the default to...
The Producer
Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from
the server at all.
(...
Default Example
Acks:
Setting Description Risk of Data loss Performance
Acks=0 No acknowledgment from
the server at all.
(...
Definitions:
Latency: The length of time for one message to be processed.
Throughput: The number of messages processed
Bat...
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data” “data”
Batch -Partition 1- TopicA
Batc...
Batch Management
Batch.size
- How many messages define the maximum batch size?
Linger.ms
- What is the maximum amount of t...
Batch Management
Producer
Broker
Partition 1 - TopicA
Batch -Partition 1- TopicA
Batch -Partition 1- TopicB
“data” “data” ...
Batch Management
Default Message size is 2048 (If linger.ms is large)
Buffer.memory / Batch.size > Message size
33554432 /...
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data” “data”
Segment
Batch Management
Default Message size is 2048 (If linger.ms is small)
Buffer.memory / Batch.size > Message size
33554432 /...
Batch Management
Producer
Batch -Partition 1- TopicA
Broker
Partition
“data” “data”
Segment
Batch -Partition 1- TopicB
“da...
Batch Management
Tune your Batch.size/linger.ms
batch.size + linger.ms = latency + through put
batch.size + linger.ms = la...
Compression
Compression.type = none
Compression can introduce performance due to transferring less
data over the network. ...
Batch Management
Producer
Batch -Partition 1- TopicA
“data” “data”
Batch -Partition 1- TopicB
“data” “data”
Serializer Par...
Did we stick with the Defaults?
Custom Class written for performance?
● Partitioner
○ - Create a custom key based on data ...
Tuning
To tune performance you need to experiment with different
settings.
Data and throughput are different with every pr...
kafka-run-class.sh
bin/kafka-run-class.sh 
org.apache.kafka.clients.tools.ProducerPerformance 
test 50000000 100 -1 acks=1...
Time for a quick walkthrough
Monitoring
Ops Clarity
- Now owned by Lightbend - Cadillac of monitoring.
Burrow
- A little Resource heavy, (Kafka client ...
Where did they get the name Kafka?
My Guess
Putting Apache Kafka to Use for Event Streams,
https://www.youtube.com/watch?v...
Where did they get the name Kafka?
My Guess
Where did they get the name Kafka?
My Guess
Where did they get the name Kafka?
Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for
writing using a writer's name wo...
Where did they get the name Kafka?
“I thought that since Kafka was a system optimized for writing
using a writer's name wo...
The Broker
Broker Disk Usage
● What your rate of growth and when
will you need to expand?
● Try and make sure the number of
partions ...
Broker Disk Usage
● Log.retention.bytes
■ Default is unlimited (-1)
● Log.retention.[time interval]
■ Default is 7 days (1...
Broker
● num.io.threads
■ Default is 8 - should match physical
disks
Beyond Tuning
How do we optimize writing:
Beyond Tuning
Measure the throughput:
Beyond Tuning
The Consumer
replica.high.watermark.checkpoint.interval.ms
- You might think that the high water mark ensures
reliability. It also has ...
Beyond Tuning
Beyond Tuning
The future Consumers ability to scale is constrained by the number of partitions.
Beyond Tuning
> # of Partitions means:
> Level of parallelism
> # files open
( Partitions * Segment count * Replication) /...
Beyond Tuning
How do I calculate the number of partitions to have on a broker?
What’s the rule of thumb to start testing a...
Beyond Tuning
Can I move an existing partition around? I just added a new broker, and it’s not sharing the load.
Use: bin/...
Thanks!
Matt Andruff - Hortonworks Practice lead @ Yoppworks
@MattAndruff
I’m not an expert I just sound like one.
Upcoming SlideShare
Loading in …5
×

Kafka to the Maxka - (Kafka Performance Tuning)

3,127 views

Published on

Kafka is becoming an ever more popular choice for users to help enable fast data and Streaming. Kafka provides a wide landscape of configuration to allow you to tweak its performance profile. Understanding the internals of Kafka is critical for picking your ideal configuration. Depending on your use case and data needs, different settings will perform very differently. Lets walk through performance essentials of Kafka. Let's talk about how your Consumer configuration, can speed up or slow down the flow of messages to Brokers. Lets talk about message keys, their implications and their impact on partition performance. Lets talk about how to figure out how many partitions and how many Brokers you should have. Let's discuss consumers and what effects their performance. How do you combine all of these choices and develop the best strategy moving forward? How do you test performance of Kafka? I will attempt a live demo with the help of Zeppelin to show in real time how to tune for performance.

Published in: Technology

Kafka to the Maxka - (Kafka Performance Tuning)

  1. 1. KAFKA TO THE MAXKA By Matt Andruff
  2. 2. Kafka Performance Tuning
  3. 3. Welcome! Matt Andruff - Hortonworks Practice lead @ Yoppworks @MattAndruff
  4. 4. Because I get asked a lot...Yoppworks
  5. 5. Because I get asked a lot...Yoppworks
  6. 6. Because I get asked a lot...Yoppworks
  7. 7. Performance Tuning...
  8. 8. Agenda • Performance tuning - Just some quick points • What you can change • Simple changes • Kafka Configuration Changes • Brief Canned Demo • Beware Kafka settings are not exciting for everyone • Architectural changes
  9. 9. Perfomance Tuning What do you need to make changes?
  10. 10. Performance tuning There is no magic bullet Guesses are just Guesses Empirical fact requires testing Requires hardware, SME’s, time, effort It’s non-trivial to do performance testing.
  11. 11. Performance tuning The better your load tests are the better your tuning will be. Garbage in, Garbage out.
  12. 12. Performance tuning The better your load tests are the better your tuning will be. Garbage in, Garbage out.
  13. 13. Performance tuning The better your load tests are the better your tuning will be. Garbage in, Garbage out. Everyone (Every client) is different Has a unique signature of data/hardware/topics
  14. 14. Performance tuning The better your load tests are the better your tuning will be. Garbage in, Garbage out. Everyone client is different Has a unique signature of data/hardware/topics Tune for bottlenecks found through testing. Yes, There is always some low hanging fruit.
  15. 15. Beyond Tuning What your boss understands:
  16. 16. Beyond Tuning What you understand:
  17. 17. First a minor detour to the OS I promise to move fast but it can’t be ignored. To be complete we need to cover some of the basics.
  18. 18. Which OS to use?
  19. 19. The basics ● Noatime ○ removes last access time from files ○ Save’s a write on read.
  20. 20. The basics ● Ext 4 is widely in use ● XFS has shown better performance metrics https://kafka.apache.org/documentation.html#filesystems
  21. 21. JVM settings export $KAFKA_JVM_PERFORMANCE_OPTS = ‘...’ Java 1.8 -Xmx6g -Xms6g -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 Java 1.7 beware of older versions -Xms4g -Xmx4g -XX:PermSize=48m -XX:MaxPermSize=48m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
  22. 22. The basics ● File descriptor limits ○ Per broker Partitions * segments + Overhead ■ Watch this when you upgrade to 0.10 ● set vm.swappiness = 0
  23. 23. The basics ● Kafka Data should be on its own disks ● If you encounter read/write issues add more disks ● Each data folder you add to config will be written to in round robin
  24. 24. Latest is the Greatest ● Have you upgraded to 0.10 ● Add 8 bytes of time stamp ○ Not great for small messages. ● No longer does broker decompression ○ Better performance when you use compression. ● File descriptor limits ○ Segments indexing changed
  25. 25. Defaults are your friends
  26. 26. Defaults are your friends The default when you drive is to put on your seatbelt. If you are going to change the default to not wearing a seatbelt I hope you have thought through your choice. Kafka’s defaults are setup to help keep you safe. If you are going to change the default to something else I hope you have thought through your choice.
  27. 27. The Producer
  28. 28. Default Example Acks: Setting Description Risk of Data loss Performance Acks=0 No acknowledgment from the server at all. (Set it and forget it.) Highest Highest Acks=1 Leader completes write of data. Medium Medium Acks=all All leaders and followers have written the data. Lowest Lowest
  29. 29. Default Example Acks: Setting Description Risk of Data loss Performance Acks=0 No acknowledgment from the server at all. (Set it and forget it.) Highest Highest Acks=1 Leader completes write of data. Medium Medium Acks=all All leaders and followers have written the data. Lowest Lowest
  30. 30. Definitions: Latency: The length of time for one message to be processed. Throughput: The number of messages processed Batch: • “Message 1” - Time 1 • “Message 2” - Time 2 • “Message 3” - Time 3 ← Worst Latency ← Best Latency
  31. 31. Batch Management Producer Batch -Partition 1- TopicA Broker Partition “data” “data” “data” Batch -Partition 1- TopicA Batch -Partition 1- TopicB “data” “data” “data” “data” Segment
  32. 32. Batch Management Batch.size - How many messages define the maximum batch size? Linger.ms - What is the maximum amount of time to wait before sending a batch? Other: - Same Broker Sending (Piggy Back) - flush() or close() is called
  33. 33. Batch Management Producer Broker Partition 1 - TopicA Batch -Partition 1- TopicA Batch -Partition 1- TopicB “data” “data” “data” “data” Segment Partition 1 - TopicB Segment
  34. 34. Batch Management Default Message size is 2048 (If linger.ms is large) Buffer.memory / Batch.size > Message size 33554432 / 16384 > 2048
  35. 35. Batch Management Producer Batch -Partition 1- TopicA Broker Partition “data” “data” “data” Segment
  36. 36. Batch Management Default Message size is 2048 (If linger.ms is small) Buffer.memory / Batch.size > Message size 33554432 / (< 16384) > (>2048)
  37. 37. Batch Management Producer Batch -Partition 1- TopicA Broker Partition “data” “data” Segment Batch -Partition 1- TopicB “data” Partition 1 - TopicB Segment “data” ← Linger is triggering Before batch is full. ← Using bigger messages to fill the batch
  38. 38. Batch Management Tune your Batch.size/linger.ms batch.size + linger.ms = latency + through put batch.size + linger.ms = latency + through put Once tuned, do not forget to size your buffer.memory
  39. 39. Compression Compression.type = none Compression can introduce performance due to transferring less data over the network. (Cost of additional CPU) Generalization: Use snappy *** *** You should do real performance tests.
  40. 40. Batch Management Producer Batch -Partition 1- TopicA “data” “data” Batch -Partition 1- TopicB “data” “data” Serializer Partitioner
  41. 41. Did we stick with the Defaults? Custom Class written for performance? ● Partitioner ○ - Create a custom key based on data - help prevent Skew ● Serializer ○ - Pluggable ● Interceptors ○ - Allows manipulation of records into Kafka ○ - Are they being used? Should they? How are they written?
  42. 42. Tuning To tune performance you need to experiment with different settings. Data and throughput are different with every project. There is no one size fits all. Luckily there is a tool to help test configurations.
  43. 43. kafka-run-class.sh bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 buffer.memory=67108864 batch.size=8196 Or use the short cut: bin/kafka-producer-perf-test.sh test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.yoppworks.rules.com:9092 buffer.memory=67108864 batch.size=8196 There is also one for the consumer: bin/kafka-consumer-perf-test.sh
  44. 44. Time for a quick walkthrough
  45. 45. Monitoring Ops Clarity - Now owned by Lightbend - Cadillac of monitoring. Burrow - A little Resource heavy, (Kafka client per partition) - Health monitor has some false positives Yahoo Kafka-manager Confluent Control Center - Confluent distro Roll your own Kafka JMX & MBeans
  46. 46. Where did they get the name Kafka? My Guess Putting Apache Kafka to Use for Event Streams, https://www.youtube.com/watch?v=el-SqcZLZlI ~ Jay Kreps
  47. 47. Where did they get the name Kafka? My Guess
  48. 48. Where did they get the name Kafka? My Guess
  49. 49. Where did they get the name Kafka?
  50. 50. Where did they get the name Kafka? “I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messagi ng-system
  51. 51. Where did they get the name Kafka? “I thought that since Kafka was a system optimized for writing using a writer's name would make sense. I had taken a lot of lit classes in college and liked Franz Kafka. Plus the name sounded cool for an open source project.” ~ Jay Kreps https://www.quora.com/What-is-the-relation-between-Kafka-the-writer-and-Apache-Kafka-the-distributed-messagi ng-system
  52. 52. The Broker
  53. 53. Broker Disk Usage ● What your rate of growth and when will you need to expand? ● Try and make sure the number of partions you select covers that growth
  54. 54. Broker Disk Usage ● Log.retention.bytes ■ Default is unlimited (-1) ● Log.retention.[time interval] ■ Default is 7 days (168 hours)
  55. 55. Broker ● num.io.threads ■ Default is 8 - should match physical disks
  56. 56. Beyond Tuning How do we optimize writing:
  57. 57. Beyond Tuning Measure the throughput:
  58. 58. Beyond Tuning
  59. 59. The Consumer
  60. 60. replica.high.watermark.checkpoint.interval.ms - You might think that the high water mark ensures reliability. It also has has implications on performance. - Whatch our for consumer lag
  61. 61. Beyond Tuning
  62. 62. Beyond Tuning The future Consumers ability to scale is constrained by the number of partitions.
  63. 63. Beyond Tuning > # of Partitions means: > Level of parallelism > # files open ( Partitions * Segment count * Replication) / Brokers ~= # of open files per machine 10’s of Thousands of files is manageable on appropriate hardware. > Memory usage (Broker and Zookeeper) > Leader fail over time (Can be mitigated by increased # brokers)
  64. 64. Beyond Tuning How do I calculate the number of partitions to have on a broker? What’s the rule of thumb to start testing at? [# partitions per broker] = c x [# brokers] x [replication factor] c ~ Your machine's awesomeness c ~ Your appetite for risk c ~ 100 a good safe starting point
  65. 65. Beyond Tuning Can I move an existing partition around? I just added a new broker, and it’s not sharing the load. Use: bin/kafka-reassign-partitions.sh 1) Create a JSON file of the topics you want to redistribute topics.json 2) Use kafka-reassign-partitions.sh … --generate to suggest partition reassignment 3) Copy proposed assignment to a JSON file. 4) Use kafka-reassign-partitions.sh … --execute to start the redistirbution process. a) Can take several hours, depending on data. 5) Use kafka-reassign-partitions.sh … --verify to check progress of the redistirbution process. Link to documentation from conference sponsor. topics.json: {"topics": [{"topic": "weather"}, {"topic": "sensors"}], "version":1 }
  66. 66. Thanks! Matt Andruff - Hortonworks Practice lead @ Yoppworks @MattAndruff I’m not an expert I just sound like one.

×