SlideShare a Scribd company logo
Speedtest: Benchmark Your Apache
Kafka
01. Introduction
Understand and tune
• Producers
• Consumers
• Brokers
Producer tuning is key
• Efficient batching is essential
for overall performance
Focus on fundamentals
• Large impact & gains
• Advanced topics e.g. in
• Tail Latency at Scale with
Apache Kafka
Where to begin?
3
Service goals and
tradeoffs
4
Non-performance objectives
• Business requirements take
priority
• Durability, availability and
ordering?
Performance objectives
• Trade off between throughput
and latency
Example approach
• Set configuration to ensure data
durability
• Optimize for throughput
Throughput Latency
Availability
Durability
payments
logging
Next Best
Offer
Centralized
Kafka
Agenda
5
01. Introduction
Setting the scene & review of relevant terminology
02. Producers
Deep dive into producer internals.
Why is producer behavior key for cluster performance?
03. Consumers
Understand fetching and consumer group behavior.
04. Brokers, Zookeepers and Topics
How are requests handled? Why does Zookeeper matter?
05. Optimising and Tuning Client Applications
Key parameters to consider for different service goals.
06. Summary
Summary and outlook.
Identify your
service goal
Throughput, latency,
durability, or availability
Understand
Kafka
internals
Producer, Consumer
and Broker behavior
Configure
cluster and
clients
Ensure service goals are
met
Benchmark,
monitor, and
tune
Iterative procedure to
drive performance
It is a journey...
02. Producers
Producer
8
acks=1
enable.idempotence=false
max.request.size=1MB
retries=MAX_INT
delivery.timeout.ms=2min
max.in.flight.requests.
per.connection=5
Serializer
● Retrieves and
caches schemas
from Schema
Registry
Partitioner
● Java client uses
murmur2 for
hashing
● If key not
provided
performs round
robin
● If keys
unbalanced it will
overload one
leader
Sender thread
● Batches grouped
by destination
broker into
requests
● Multiple batches
to different
partitions
potentially in the
same producer
request
Record accumulator
● Buffer per partition,
seldom used partitions
may not achieve high
batching
● If many producers are in
the same JVM, memory
and GC could become
important
● Sticky partitioner could
be used to increase
batches in the case of
round robin
(KIP-408/KIP-794)
Compression
● At batch level
● Allows faster transfer to
the broker
● Reduces the inter
broker replication load
● Reduces page cache &
disk space utilization on
brokers
● Gzip is more CPU
intensive, Snappy is
lighter, LZ4/ZStd are a
good balance*
compress.type=none
batch.size=16KB
buffer.memory=32MB
max.block.ms=60s
record batch request
batch.size=16KB
linger.ms=0
buffer.memory=32MB
max.block.ms=60s
compress.type=none
Batching is key
to overall performance
9
Benefits to batching
● Reduced network bandwidth
○ producer to broker
○ broker to broker (replication)
○ broker to consumer
● Less storage requirements on broker disks
● Reduced CPU requirement due to fewer
requests
From Tail Latency at Scale with Apache Kafka
“Batching reduces the cost of each record by
amortizing costs on both the clients and
brokers.
Generally, bigger batches reduce processing
overhead and reduce network and disk IO, which
improves network and disk utilization.”
Start the demo
environment
10
in docker-compose (on my mac)
1 * zookeeper
5 * brokers
1 * Squid proxy (sends JMX metrics to Health+)
Not starting:
schema registry
connect
ksqlDB
REST Proxy
Confluent Control Center
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 11
Kafka performance
test tools
12
kafka-producer-perf-test 
--num-records 1000000 
--record-size 1000 
--topic demo-perf-topic 
--throughput 10000 
--print-metrics 
--producer-props bootstrap.servers=kafka:9092
acks=all batch.size=300000 linger.ms=100
compression.type=lz4
Overview
● CLI tools to write & read sample data
to/from topics
● Helpful to enhance understanding of
parameters & impact
Disclaimer
● Performance numbers are not
representative for specific customer use
cases!
○ Random test data is reused
● Use case specific performance testing is
required
kafka-consumer-perf-test
kafka-producer-perf-test
Most significant producer performance metrics
Metric Meaning MBean
record-size-avg Avg record size kafka.producer:type=producer-metrics,client-id=([-.w]+)
batch-size-avg
Avg number of bytes sent per partition
per-request
kafka.producer:type=producer-metrics,client-id=([-.w]+)
bufferpool-wait-ratio
Faction of time an appender waits for
space allocation
kafka.producer:type=producer-metrics,client-id=([-.w]+)
compression-rate-avg
Avg compression rate for a topic.
Compressed / uncompressed batch size
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),to
pic=([-.w]+)
record-queue-time-avg
Avg time (ms) record batches spent in
the send buffer
kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-latency-avg Avg request latency (ms) kafka.producer:type=producer-metrics,client-id=([-.w]+)
produce-throttle-time-avg
Avg time (ms) a request was throttled
by a broker
kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-retry-rate
Avg per-second number of retried record
sends for a topic
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),to
pic=([-.w]+)
Overview Java metrics & librdkafka statistics
03. Consumers
Consumer application
Kafka consumers
fetch batches of
events!
Embrace
at-least-once
semantics!
Consumers
Partitions
● Basis for scalability
● No partition will be assigned to more than one consumer in the same group
Key parameters
# of partitions
fetch.min.bytes=1
fetch.max.wait.ms=500ms
max.partition.fetch.bytes=10MB
fetch.max.bytes=50MB
max.poll.records=500
max.poll.interval.ms=5min
auto.commit.interval.ms=5s (if being used)
Key positions in each
partition
17
Log end offset
• Latest data added to the partition
• Position of the producer
• Not accessible to consumers
High watermark
• Offsets up to the watermark can be
consumed
• Data has been replicated to all insync
replicas
Current position
• Specific to consumer instances
• Current message being processed in
poll-loop
Last committed offset
• Last position persisted in the
__consumer_offsets topic
0 1 2 3 4 5 6 7 8 9 10 11 12
Last
committed
offset
Current
position of
consumer
High
watermark
Log end
offset
Consumer groups
Consumer
Any Broker
(bootstrap)
Coordinator
Broker
Find coordinator
Coordinator details
Join consum
er group
Leader details
Sync group
Partition assignm
ent
Rebalances
● Every time a new consumer joins or
leaves (fails) the group
● Until Kafka 2.4 “stop the world” event
(solved in KIP-429)
● Consider setting group.instance.id
to minimize rebalances (KIP-345)
Partition assignment
● Based on
partition.assignment.strategy
● Options: Range (default), round robin,
sticky, cooperative sticky
● Is customizable
Heartbeat
heartbeat.interval.ms=3s
session.timeout.ms=10s
group.initial.
rebalance.delay.ms=3s
Selected consumer performance metrics
Metric Meaning MBean
fetch-latency-avg Avg time taken for a fetch request kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-
.w]+)
fetch-size-avg Avg number of bytes fetched per request kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-
.w]+)
commit-latency-avg Avg time commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w
]+)
rebalance-latency-total Total time taken for group rebalances kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.
w]+)
fetch-throttle-time-avg Avg throttle time (ms) kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-
.w]+)
Overview Java metrics and librdkafka statistics
Consumer
Benchmarking
20
(1) Start with most simple test: Without any
tuning, we get extremely good results
Highlights:
● 10M messages in less than 30 seconds
● 1Gb data retrieved
● 325 Mb/s
Conclusion:
● Tuning producer is key, if it is correctly
tuned, there (can be) almost no tuning
required on consumer side
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 21
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 22
04. Brokers, Zookeepers
and Topics
Overview
Brokers and Zookeeper
24
Request lifecycle in broker
● How are produce & fetch requests
handled?
● How can inefficient batching impact
performance?
● How to identify where time is spent during
request handling?
Controller, leaders, and Zookeeper
● How is the Controller elected?
● How are broker failures detected?
● Why does the partition count matter for
the recovery time after a controller failure?
(Next 8 slides skipped)
04. Optimizing and Tuning
Client Applications
https://docs.confluent.io/cloud/current/client-apps/optimizing/index.html#optimizing-and-tuning
04. Recommendations &
Conclusions
Recommendations
27
Benchmarking
● Benchmark all applications with a significant & representative load
● Consider a test cluster with
the applications requirements configured (either it is durability, availability or any other)
real data (size, schema, serialization format, ...)
● Test the different parameters to see the impact in the test data (throughput, latency, ...) considering
different configurations (batch size, compression, linger, ...)
● Evaluate the traffic and leave space for growth when determining the number of partitions
● Low volume applications may need care too
● Re-evaluate after major changes in application or message content (JSON size, ...) and volume
Monitoring
● Should be used to identify bottlenecks in running clusters
● Client monitoring is as important as broker monitoring
Conclusion
28
Resources
● Optimizing Your Apache Kafka®
Deployment
● Optimizing and Tuning
● White paper
Optimization approach
● Determine service goals
● Understand Kafka’s internals
● Configure clients & cluster
● Benchmark, monitor & tune
Continue the conversation
● How to monitor the cluster & clients?
● Integration with external systems
● Tuning of Kafka Streams & ksqlDB
applications?
29
https://www.confluent.io/get-started/ https://www.confluent.io/get-started/
Tokyo AK Meetup Speedtest - Share.pdf

More Related Content

Similar to Tokyo AK Meetup Speedtest - Share.pdf

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vm
Charlie Cler
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
BIOVIA
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6Sravanthi N
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
ScyllaDB
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
Amazon Web Services
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
HostedbyConfluent
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
EDB
 
Enable business continuity and high availability through active active techno...
Enable business continuity and high availability through active active techno...Enable business continuity and high availability through active active techno...
Enable business continuity and high availability through active active techno...
Qian Li Jin
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
confluent
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
R3
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
Matt Lucas
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Clemens Vasters
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
Amita Mirajkar
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
Fabricio Epaminondas
 

Similar to Tokyo AK Meetup Speedtest - Share.pdf (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Architecting with power vm
Architecting with power vmArchitecting with power vm
Architecting with power vm
 
(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development(ATS4-PLAT03) Balancing Security with access for Development
(ATS4-PLAT03) Balancing Security with access for Development
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
Learnings from the Field. Lessons from Working with Dozens of Small & Large D...
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
 
Enable business continuity and high availability through active active techno...
Enable business continuity and high availability through active active techno...Enable business continuity and high availability through active active techno...
Enable business continuity and high availability through active active techno...
 
Performance testing material
Performance testing materialPerformance testing material
Performance testing material
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LMESet your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
Set your Data in Motion with Confluent & Apache Kafka Tech Talk Series LME
 
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick ParkerDevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
DevDay: Corda Enterprise: Journey to 1000 TPS per node, Rick Parker
 
IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0IBM Blockchain Platform - Architectural Good Practices v1.0
IBM Blockchain Platform - Architectural Good Practices v1.0
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Centrifuge
CentrifugeCentrifuge
Centrifuge
 
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
 
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging PatternsBeyond REST and RPC: Asynchronous Eventing and Messaging Patterns
Beyond REST and RPC: Asynchronous Eventing and Messaging Patterns
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Designing Scalable Applications
Designing Scalable ApplicationsDesigning Scalable Applications
Designing Scalable Applications
 

Recently uploaded

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Tokyo AK Meetup Speedtest - Share.pdf

  • 3. Understand and tune • Producers • Consumers • Brokers Producer tuning is key • Efficient batching is essential for overall performance Focus on fundamentals • Large impact & gains • Advanced topics e.g. in • Tail Latency at Scale with Apache Kafka Where to begin? 3
  • 4. Service goals and tradeoffs 4 Non-performance objectives • Business requirements take priority • Durability, availability and ordering? Performance objectives • Trade off between throughput and latency Example approach • Set configuration to ensure data durability • Optimize for throughput Throughput Latency Availability Durability payments logging Next Best Offer Centralized Kafka
  • 5. Agenda 5 01. Introduction Setting the scene & review of relevant terminology 02. Producers Deep dive into producer internals. Why is producer behavior key for cluster performance? 03. Consumers Understand fetching and consumer group behavior. 04. Brokers, Zookeepers and Topics How are requests handled? Why does Zookeeper matter? 05. Optimising and Tuning Client Applications Key parameters to consider for different service goals. 06. Summary Summary and outlook.
  • 6. Identify your service goal Throughput, latency, durability, or availability Understand Kafka internals Producer, Consumer and Broker behavior Configure cluster and clients Ensure service goals are met Benchmark, monitor, and tune Iterative procedure to drive performance It is a journey...
  • 8. Producer 8 acks=1 enable.idempotence=false max.request.size=1MB retries=MAX_INT delivery.timeout.ms=2min max.in.flight.requests. per.connection=5 Serializer ● Retrieves and caches schemas from Schema Registry Partitioner ● Java client uses murmur2 for hashing ● If key not provided performs round robin ● If keys unbalanced it will overload one leader Sender thread ● Batches grouped by destination broker into requests ● Multiple batches to different partitions potentially in the same producer request Record accumulator ● Buffer per partition, seldom used partitions may not achieve high batching ● If many producers are in the same JVM, memory and GC could become important ● Sticky partitioner could be used to increase batches in the case of round robin (KIP-408/KIP-794) Compression ● At batch level ● Allows faster transfer to the broker ● Reduces the inter broker replication load ● Reduces page cache & disk space utilization on brokers ● Gzip is more CPU intensive, Snappy is lighter, LZ4/ZStd are a good balance* compress.type=none batch.size=16KB buffer.memory=32MB max.block.ms=60s record batch request batch.size=16KB linger.ms=0 buffer.memory=32MB max.block.ms=60s compress.type=none
  • 9. Batching is key to overall performance 9 Benefits to batching ● Reduced network bandwidth ○ producer to broker ○ broker to broker (replication) ○ broker to consumer ● Less storage requirements on broker disks ● Reduced CPU requirement due to fewer requests From Tail Latency at Scale with Apache Kafka “Batching reduces the cost of each record by amortizing costs on both the clients and brokers. Generally, bigger batches reduce processing overhead and reduce network and disk IO, which improves network and disk utilization.”
  • 10. Start the demo environment 10 in docker-compose (on my mac) 1 * zookeeper 5 * brokers 1 * Squid proxy (sends JMX metrics to Health+) Not starting: schema registry connect ksqlDB REST Proxy Confluent Control Center
  • 11. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 11
  • 12. Kafka performance test tools 12 kafka-producer-perf-test --num-records 1000000 --record-size 1000 --topic demo-perf-topic --throughput 10000 --print-metrics --producer-props bootstrap.servers=kafka:9092 acks=all batch.size=300000 linger.ms=100 compression.type=lz4 Overview ● CLI tools to write & read sample data to/from topics ● Helpful to enhance understanding of parameters & impact Disclaimer ● Performance numbers are not representative for specific customer use cases! ○ Random test data is reused ● Use case specific performance testing is required kafka-consumer-perf-test kafka-producer-perf-test
  • 13. Most significant producer performance metrics Metric Meaning MBean record-size-avg Avg record size kafka.producer:type=producer-metrics,client-id=([-.w]+) batch-size-avg Avg number of bytes sent per partition per-request kafka.producer:type=producer-metrics,client-id=([-.w]+) bufferpool-wait-ratio Faction of time an appender waits for space allocation kafka.producer:type=producer-metrics,client-id=([-.w]+) compression-rate-avg Avg compression rate for a topic. Compressed / uncompressed batch size kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),to pic=([-.w]+) record-queue-time-avg Avg time (ms) record batches spent in the send buffer kafka.producer:type=producer-metrics,client-id=([-.w]+) request-latency-avg Avg request latency (ms) kafka.producer:type=producer-metrics,client-id=([-.w]+) produce-throttle-time-avg Avg time (ms) a request was throttled by a broker kafka.producer:type=producer-metrics,client-id=([-.w]+) record-retry-rate Avg per-second number of retried record sends for a topic kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),to pic=([-.w]+) Overview Java metrics & librdkafka statistics
  • 15. Consumer application Kafka consumers fetch batches of events! Embrace at-least-once semantics!
  • 16. Consumers Partitions ● Basis for scalability ● No partition will be assigned to more than one consumer in the same group Key parameters # of partitions fetch.min.bytes=1 fetch.max.wait.ms=500ms max.partition.fetch.bytes=10MB fetch.max.bytes=50MB max.poll.records=500 max.poll.interval.ms=5min auto.commit.interval.ms=5s (if being used)
  • 17. Key positions in each partition 17 Log end offset • Latest data added to the partition • Position of the producer • Not accessible to consumers High watermark • Offsets up to the watermark can be consumed • Data has been replicated to all insync replicas Current position • Specific to consumer instances • Current message being processed in poll-loop Last committed offset • Last position persisted in the __consumer_offsets topic 0 1 2 3 4 5 6 7 8 9 10 11 12 Last committed offset Current position of consumer High watermark Log end offset
  • 18. Consumer groups Consumer Any Broker (bootstrap) Coordinator Broker Find coordinator Coordinator details Join consum er group Leader details Sync group Partition assignm ent Rebalances ● Every time a new consumer joins or leaves (fails) the group ● Until Kafka 2.4 “stop the world” event (solved in KIP-429) ● Consider setting group.instance.id to minimize rebalances (KIP-345) Partition assignment ● Based on partition.assignment.strategy ● Options: Range (default), round robin, sticky, cooperative sticky ● Is customizable Heartbeat heartbeat.interval.ms=3s session.timeout.ms=10s group.initial. rebalance.delay.ms=3s
  • 19. Selected consumer performance metrics Metric Meaning MBean fetch-latency-avg Avg time taken for a fetch request kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([- .w]+) fetch-size-avg Avg number of bytes fetched per request kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([- .w]+) commit-latency-avg Avg time commit request kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w ]+) rebalance-latency-total Total time taken for group rebalances kafka.consumer:type=consumer-coordinator-metrics,client-id=([-. w]+) fetch-throttle-time-avg Avg throttle time (ms) kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([- .w]+) Overview Java metrics and librdkafka statistics
  • 20. Consumer Benchmarking 20 (1) Start with most simple test: Without any tuning, we get extremely good results Highlights: ● 10M messages in less than 30 seconds ● 1Gb data retrieved ● 325 Mb/s Conclusion: ● Tuning producer is key, if it is correctly tuned, there (can be) almost no tuning required on consumer side
  • 21. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 21
  • 22. Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 22
  • 24. Overview Brokers and Zookeeper 24 Request lifecycle in broker ● How are produce & fetch requests handled? ● How can inefficient batching impact performance? ● How to identify where time is spent during request handling? Controller, leaders, and Zookeeper ● How is the Controller elected? ● How are broker failures detected? ● Why does the partition count matter for the recovery time after a controller failure? (Next 8 slides skipped)
  • 25. 04. Optimizing and Tuning Client Applications https://docs.confluent.io/cloud/current/client-apps/optimizing/index.html#optimizing-and-tuning
  • 27. Recommendations 27 Benchmarking ● Benchmark all applications with a significant & representative load ● Consider a test cluster with the applications requirements configured (either it is durability, availability or any other) real data (size, schema, serialization format, ...) ● Test the different parameters to see the impact in the test data (throughput, latency, ...) considering different configurations (batch size, compression, linger, ...) ● Evaluate the traffic and leave space for growth when determining the number of partitions ● Low volume applications may need care too ● Re-evaluate after major changes in application or message content (JSON size, ...) and volume Monitoring ● Should be used to identify bottlenecks in running clusters ● Client monitoring is as important as broker monitoring
  • 28. Conclusion 28 Resources ● Optimizing Your Apache Kafka® Deployment ● Optimizing and Tuning ● White paper Optimization approach ● Determine service goals ● Understand Kafka’s internals ● Configure clients & cluster ● Benchmark, monitor & tune Continue the conversation ● How to monitor the cluster & clients? ● Integration with external systems ● Tuning of Kafka Streams & ksqlDB applications?