SlideShare a Scribd company logo
SignalFx
SignalFx
Why and how we wrote a Kafka consumer
Rajiv Kurian, Software Engineer
rajiv@signalfx.com
@rzidane360
Agenda
1. Why we wrote a Kafka consumer
2. Properties and limitations of modern hardware
3. Optimizations
4. Results
SignalFx
Why we wrote a Kafka consumer
• High resolution:
• Any mix of resolutions up to 1 sec
• Streaming analytics:
• Custom analytics pipelines at any scale that output in seconds
• Streaming dashboards update in seconds
• Multidimensional metrics:
• Dimensions allow arbitrary modeling, pivoting, filtering, and
grouping of both raw and derived (from analytics) metrics
interactively on streaming data
• E.g. 99th-percentile-of-latency-by-service-by-customer
SignalFx is built for monitoring modern infrastructure
• Designed to replace SimpleConsumer not the 0.9
consumer
• Needed a non-blocking single threaded consumer
• Wanted it to be low over head
• 100s of thousands of messages/second
• Sensitive to GC
• The Kafka 0.9 consumer wasn’t ready yet
Why write a new Kafka consumer
SignalFx
Kafka consumer - a brief introduction
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Brokers, topics and partitions
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Metadata request and response
Client
Metadata Request
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Metadata request and response
Client
Metadata Request
Metadata Response
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Metadata request and response
Client
Metadata Request
Metadata Response
Partition Broker ID
0 1
1 2
…. ….
n 3
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Offset request and response
Client
Partition offset
0 9024
1 1245
…. ….
n 11645
Partition Broker ID
0 1
1 2
…. ….
n 3
Offsets
(Consumer group/ external source)
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Fetch request and response
Client
Fetch request
Partition offset
0 9024
1 1245
…. ….
n 11645
Partition Broker ID
0 1
1 2
…. ….
n 3
SignalFx
Topic : 0
Topic : 3
Topic : 6
Topic : 9
Topic : 2
Topic : 5
Topic : 8
Topic : 11
Topic : 1
Topic : 4
Topic : 7
Topic : 10
BROKER 1 BROKER 2 BROKER 3
Fetch request and response
Client
Fetch response
Partition offset
0 9026
1 1247
…. ….
n 11649
Partition Broker ID
0 1
1 2
…. ….
n 3
SignalFx
Properties and limitations
of modern hardware
SignalFx Main memory
L1 D L1 I
L3
L1 D L1 I
L2L2
Core 1 Core 2
1
Cache Lines
• Data is transferred between memory and cache in
blocks of fixed size, called cache lines (typically 64
bytes)
• The memory subsystem makes a few bets to help us:
• Temporal locality
• Spatial locality
• Prefetching
SignalFx Main memory
L1 D L1 I
L3
L1 D L1 I
L2L2
Core 1 Core 2
1
1
1
2
1
2
2
2
1 2
SignalFx
L1 D
Main memory
L1 D L1 I
L3
L1 I
L2L2
Core 1 Core 2
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
2
1 2 3 4 5 6 7 8
1
4
3
6 8
75
Reference latency numbers for comparison
By Jeff Dean: http://research.google.com/people/jeff/
L1 Cache 0.5ns
Branch mispredict 5 ns
L2 Cache 7 ns 14x L1 Cache
Mutex lock/unlock 25 ns
Main memory 100 ns 20x L2 Cache, 200x L1 Cache
Compress 1K bytes (Zippy) 3,000 ns
Send 1K bytes over 1Gbps 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same DC 500,000 ns 0.5 ms
Read 1MB sequentially from SSD 1,000,000 ns 1 ms 4x memory
Disk seek 10,000,000 ns 10 ms 20x DC roundtrip
Read 1MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20x SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
SignalFx
L1 CORE
SignalFx
L2 CORE
SignalFx
Main
Memory
CORE
SignalFx
Optimizations
Optimization aims
• We are NOT aiming for more data/second
• Even a very inefficient implementation
will be bottlenecked by the network
• We are aiming to make the client get out of
the way
• The client is not the only thing running on
the system
• Leave all resources for the actual
application
Efficiency VS raw speed
• We value efficiency more than raw speed
for the client
• Fewer cycles
• Less cache usage and fewer cache
misses
• Less memory?
• Efficiency for the client == raw speed for
the application
Efficiency from constraints
• No consumer group functionality needed
• A single topic
• Finite number of integer partitions
• Partition reassignment is rare and happens
during startup and shutdown
• We are in control of the code that consumes
the messages
SignalFx
Use cache conscious data structures
Use arrays and open addressing hash maps
• Single topic. Less than 1024 partitions
• Instead of maps we can use arrays
• Or use primitive specialized open
addressing hash maps
Topic:Partition -> Offset
Topic:Partition offset
Foo:0 9026
Foo:1 1247
…. ….
Foo:n 11649
Partition offset
0 9026
1 1247
…. ….
n 11649
Foo Foo
Offsets
9026
1247
11649
offsetpartition
partition* offset*partition* offset*
Entry*
Entry*
Entry*
Entry*
1
2
3 4
Hash map implemented as an array of lists of key* |
value*
offsetpartition
partition* offset*partition* offset*
Dependable cache miss generator
List
List
List
List
Sparse array
offset 0
offset 1
offset 2
offset 3
offset 4
offset 5
offset 6
offset 7
offset 8
offset 9
offset 10
offset 11
offset 12
offset 13
1
SignalFx
In memory
1160
partition* offset*
Entry*
Entry*
Offsets
116
partition* offset*
0
116
116
Entry* Entry*
In cache
(4 * 2 + 4 + 8) + (4 + 4 + 8) + (4 + 8) + (8 + 8) = 64 bytes
1024 * 8 + 4 + 8 = 8204 bytes
4 * 64 = 256 bytes
1 * 64 = 64 bytes
1
2
3
4
1
Low memory and cache friendly data structures
• Queues built from integer arrays. Negative ->
partition lost
• Zero allocation hashed-wheel timer to close
stuck connections
• Open addressing hash maps
• BitSets coded on top of long arrays whenever a
set of partitions is required
• Can be traversed in O(num set bits)
Applicability and benefit to Kafka consumer 0.9
• Benefits - medium
• Lots of hash map look ups
• Applicability - low
• Multiple topics - sparse arrays not a great
match
• Open addressing hash maps - preserve
most of the benefits
SignalFx
Create buffers once, reuse
Eliminate redundant work
• A single topic. Finite number of partitions:
• Topic and client string immutable
• The metadata request buffer can be created just once and
kept around forever
• Other requests can have their fixed part written out and only
write the variable part on each request
• Offset request
= fixed_part + per_partition_part
• Fetch request create
= fixed_part + per_partition_part
SignalFx
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING REPLICA_ID
MAX_WAIT_TIME MIN_BYTES
NUM_TOPICS TOPIC_STRING
NUM_PARTITIONS
0 1266 1024
1 1164 1024
2 1900 1024
Fixed
Variable
FETCH REQUEST BUFFER
SignalFx
FETCH REQUEST BUFFER
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING REPLICA_ID
MAX_WAIT_TIME MIN_BYTES
NUM_TOPICS TOPIC_STRING
NUM_PARTITIONS
0 1266 1024
1 1164 1024
2 1900 1024
Index
1200
1216
1232
SignalFx
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING REPLICA_ID
MAX_WAIT_TIME MIN_BYTES
NUM_TOPICS TOPIC_STRING
NUM_PARTITIONS
Offsets
1289
1172
1990
0 1266 1024
1 1164 1024
2 1900 1024
Index
1200
1216
1232
FETCH REQUEST BUFFER
SignalFx
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING REPLICA_ID
MAX_WAIT_TIME MIN_BYTES
NUM_TOPICS TOPIC_STRING
NUM_PARTITIONS
Offsets
1289
1172
1990
0 1289 1024
1 1172 1024
2 1990 1024
Index
1200
1216
1232
FETCH REQUEST BUFFER
Code
private void setNewOffsetsForFetchRequest() {
final ByteBuffer buffer = this.fetchRequestBuffer;
// Iterate through the partitions assigned to this broker
// and write the offset directly on the buffer.
for (int i = 0; i < partitionAssignment.length; i++) {
// This loop runs in O(partitions assigned).
long bitSet = partitionAssignment[i];
while (bitSet != 0) {
final long t = bitSet & -bitSet;
final int partitionId = i * 64 + Long.bitCount(t - 1);
// The position in the buffer that points to the
// beginning of the offset for this partition.
final int bufferPositionForOffset = fetchRequestIndex[partitionId];
final long offset = partitionToOffset[partitionId];
// Write the offset directly.
buffer.putLong(bufferPositionForOffset, offset);
bitSet ^= t;
}
}
}
SignalFx
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING NUM_TOPICS
TOPIC_STRING
METADATA REQUEST BUFFER
Fixed
SignalFx
SIZE API_KEY
API_VERSION CORRELATION_ID
CLIEND_ID_STRING REPLICA_ID
NUM_TOPICS TOPIC_STRING
NUM_PARTITIONS
0 1 2 3 4 5
OFFSET REQUEST BUFFER
NUM_PARTITIONS_POSITION
Fixed
Applicability and benefit to Kafka consumer 0.9
• Benefits - high
• Reuse instead of allocating - temporal locality
• Steaming through 3 arrays - prefetching
• One fetch request per fetch response - common
• Metadata or offset requests - rare
• Applicability - high
• Internal detail so API doesn’t change
• Even for consumer groups, partition reassignment
and partition migration events are rare
SignalFx
Zero allocation response processing
Stream responses to application
• Pass each message to the application
when it is ready
• Consume messages synchronously
without a copy or allocation
• No deserialization required
• Benefits add up when processing 100s
of thousands of messages per second
Low level interface
public interface KafkaMessageHandler {
void handleMessage(ByteBuffer buffer, int position, int length);
}
public interface KafkaConsumer {
void poll(KafkaMessageHandler handler, long timeoutMs);
. . .
. . .
}
SignalFx
Partition Message 1 Message 2 Message .. Message n
1 … … … …
2 … … … …
3 … … … …
4 … … … …
Topic string, client string etc
FETCH RESPONSE PARSING
SignalFx
Partition Message 1 Message 2 Message .. Message n
1 … … … …
2 … … … …
3 … … … …
4 … … … …
Topic string, client string etc
FETCH RESPONSE PARSING
public interface KafkaMessageHandler {
void handleMessage(ByteBuffer buffer, int position, int length);
}
SignalFx
Partition Message 1 Message 2 Message .. Message n
1 … … … …
2 … … … …
3 … … … …
4 … … … …
Topic string, client string etc
FETCH RESPONSE PARSING
public interface KafkaMessageHandler {
void handleMessage(ByteBuffer buffer, int position, int length);
}
SignalFx
Partition Message 1 Message 2 Message .. Message n
1 … … … …
2 … … … …
3 … … … …
4 … … … …
Topic string, client string etc
FETCH RESPONSE PARSING
public interface KafkaMessageHandler {
void handleMessage(ByteBuffer buffer, int position, int length);
}
Applicability and benefit to Kafka consumer 0.9
• Benefits - very high
• Reuse response buffer, no allocations - temporal locality
• Data is processed right after being read from the socket -
temporal locality
• Streaming through a buffer - spatial locality + prefetching
• Combine with DirectByteBuffers for zero copy
• Applicability - low
• API too low level
• Integrity of internal buffers compromised by bugs in
application
• Maybe a low level “with great power comes great
responsibility” API
SignalFx
Some numbers
Caveats
• These are from running a very specific
workload similar to our application
• There are many Pareto-optimal choices
for a client. Our’s is not better in any
way - it’s just tuned for our workload
• It can and will prove bad for other
workloads
Benchmark
• Single topic-partition
• Settings of fetch_max_wait, fetch_min_bytes,
max_bytes_per_partition were identical
• Only 5000 messages per second produced by
a single producer
• Each message is 23 bytes
• Warm up -> profile for 5 mins
• 5000/sec * 5 mins = 1.5 million
• Profiler = Java Mission Control
SignalFx
0.9 Consumer allocation profile : TLAB
SignalFx
SignalFx Consumer allocation profile : TLAB
SignalFx
0.9 Consumer code profile
SignalFx
SignalFx Consumer code profile
SignalFx
With 5,000 messages/second
Implementation CPU Allocation TLAB
0.9 consumer 6% 422.8 MB
SignalFx consumer 1.3% 217 KB
4.6x 1944 x
SignalFx
With 10,000 messages/second
Implementation CPU Allocation TLAB
0.9 consumer 6.122% 858 MB
SignalFx consumer 1.456% 400 KB
4.2x 2145 x
SignalFx
Thank You!
Rajiv Kurian
rajiv@signalfx.com
@rzidane360
WE’RE HIRING
jobs@signalfx.com
@SignalFx - signalfx.com/careers
SignalFx
Q&A

More Related Content

What's hot

From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
HBaseCon
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
confluent
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
confluent
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Allen (Xiaozhong) Wang
 
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Databricks
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
Yinghai Lu
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
confluent
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for Scylla
ScyllaDB
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
confluent
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Guozhang Wang
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
Xin Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
Discover Pinterest
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueShafaq Abdullah
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
John Georgiadis
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
confluent
 

What's hot (20)

From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
HBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase UpdateHBaseCon 2015: OpenTSDB and AsyncHBase Update
HBaseCon 2015: OpenTSDB and AsyncHBase Update
 
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
Flink Forward Berlin 2017: Robert Metzger - Keep it going - How to reliably a...
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Multi cluster, multitenant and hierarchical kafka messaging service   slideshareMulti cluster, multitenant and hierarchical kafka messaging service   slideshare
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
 
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...Flink Forward SF 2017:  Cliff Resnick & Seth Wiesman -   From Zero to Streami...
Flink Forward SF 2017: Cliff Resnick & Seth Wiesman - From Zero to Streami...
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
High Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and SolutionsHigh Performance Erlang - Pitfalls and Solutions
High Performance Erlang - Pitfalls and Solutions
 
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
 
Writing Applications for Scylla
Writing Applications for ScyllaWriting Applications for Scylla
Writing Applications for Scylla
 
Top Ten Kafka® Configs
Top Ten Kafka® ConfigsTop Ten Kafka® Configs
Top Ten Kafka® Configs
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Kafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message QueueKafka Evaluation - High Throughout Message Queue
Kafka Evaluation - High Throughout Message Queue
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity PlanningFrom Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
 

Viewers also liked

AWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFxAWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFx
SignalFx
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
SignalFx
 
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx
 
Docker at and with SignalFx
Docker at and with SignalFxDocker at and with SignalFx
Docker at and with SignalFx
SignalFx
 
Microservices and Devs in Charge: Why Monitoring is an Analytics Problem
Microservices and Devs in Charge: Why Monitoring is an Analytics ProblemMicroservices and Devs in Charge: Why Monitoring is an Analytics Problem
Microservices and Devs in Charge: Why Monitoring is an Analytics Problem
SignalFx
 
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
SignalFx
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
Todd Palino
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
Todd Palino
 
Go debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFxGo debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFx
SignalFx
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
Todd Palino
 
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and KafkaReal-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and Kafka
Alexey Kharlamov
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
Patrick McFadin
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Allen (Xiaozhong) Wang
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
Gwen (Chen) Shapira
 
Using Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated ApplicationsUsing Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated Applications
NVIDIA
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
Rahul Jain
 

Viewers also liked (17)

AWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFxAWS Loft Talk: Behind the Scenes with SignalFx
AWS Loft Talk: Behind the Scenes with SignalFx
 
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
Making Cassandra Perform as a Time Series Database - Cassandra Summit 15
 
SignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and AlertingSignalFx Elasticsearch Metrics Monitoring and Alerting
SignalFx Elasticsearch Metrics Monitoring and Alerting
 
Docker at and with SignalFx
Docker at and with SignalFxDocker at and with SignalFx
Docker at and with SignalFx
 
Microservices and Devs in Charge: Why Monitoring is an Analytics Problem
Microservices and Devs in Charge: Why Monitoring is an Analytics ProblemMicroservices and Devs in Charge: Why Monitoring is an Analytics Problem
Microservices and Devs in Charge: Why Monitoring is an Analytics Problem
 
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
Operationalizing Docker at Scale: Lessons from Running Microservices in Produ...
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Tuning Kafka for Fun and Profit
Tuning Kafka for Fun and ProfitTuning Kafka for Fun and Profit
Tuning Kafka for Fun and Profit
 
Go debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFxGo debugging and troubleshooting tips - from real life lessons at SignalFx
Go debugging and troubleshooting tips - from real life lessons at SignalFx
 
Kafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier ArchitecturesKafka at Scale: Multi-Tier Architectures
Kafka at Scale: Multi-Tier Architectures
 
Real-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and KafkaReal-Time Fraud Detection with Storm and Kafka
Real-Time Fraud Detection with Storm and Kafka
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Using Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated ApplicationsUsing Docker for GPU Accelerated Applications
Using Docker for GPU Accelerated Applications
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similar to SignalFx Kafka Consumer Optimization

Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Guozhang Wang
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Network.pptx
Network.pptxNetwork.pptx
Network.pptx
SAMANTHACARDOSO13
 
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
DefconRussia
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
Koan-Sin Tan
 
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Community
 
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
Natan Silnitsky
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
Sayyaparaju Sunil
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
Alexey Kovyazin
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
HBaseCon
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
Amazon Web Services
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
Ivan Babrou
 
Tuning the Kernel for Varnish Cache
Tuning the Kernel for Varnish CacheTuning the Kernel for Varnish Cache
Tuning the Kernel for Varnish Cache
Per Buer
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
Stewart Needham
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Anne Nicolas
 
JDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof DębskiJDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof Dębski
PROIDEA
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf Conference
 

Similar to SignalFx Kafka Consumer Optimization (20)

Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Network.pptx
Network.pptxNetwork.pptx
Network.pptx
 
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
Nikita Abdullin - Reverse-engineering of embedded MIPS devices. Case Study - ...
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
running stable diffusion on android
running stable diffusion on androidrunning stable diffusion on android
running stable diffusion on android
 
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance BarriersCeph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
Ceph Day Beijing - Ceph on All-Flash Storage - Breaking Performance Barriers
 
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
10 Lessons Learned from using Kafka in 1000 microservices - ScalaUA
 
Aerospike Go Language Client
Aerospike Go Language ClientAerospike Go Language Client
Aerospike Go Language Client
 
Resolving Firebird performance problems
Resolving Firebird performance problemsResolving Firebird performance problems
Resolving Firebird performance problems
 
HBase at Xiaomi
HBase at XiaomiHBase at Xiaomi
HBase at Xiaomi
 
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
(WEB401) Optimizing Your Web Server on AWS | AWS re:Invent 2014
 
NFS and Oracle
NFS and OracleNFS and Oracle
NFS and Oracle
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
Debugging linux issues with eBPF
Debugging linux issues with eBPFDebugging linux issues with eBPF
Debugging linux issues with eBPF
 
Tuning the Kernel for Varnish Cache
Tuning the Kernel for Varnish CacheTuning the Kernel for Varnish Cache
Tuning the Kernel for Varnish Cache
 
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart NeedhamThe post release technologies of Crysis 3 (Slides Only) - Stewart Needham
The post release technologies of Crysis 3 (Slides Only) - Stewart Needham
 
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverterKernel Recipes 2014 - NDIV: a low overhead network traffic diverter
Kernel Recipes 2014 - NDIV: a low overhead network traffic diverter
 
JDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof DębskiJDD2015: Make your world event driven - Krzysztof Dębski
JDD2015: Make your world event driven - Krzysztof Dębski
 
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
ZFConf 2011: Что такое Sphinx, зачем он вообще нужен и как его использовать с...
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 

SignalFx Kafka Consumer Optimization

  • 2. SignalFx Why and how we wrote a Kafka consumer Rajiv Kurian, Software Engineer rajiv@signalfx.com @rzidane360
  • 3. Agenda 1. Why we wrote a Kafka consumer 2. Properties and limitations of modern hardware 3. Optimizations 4. Results
  • 4. SignalFx Why we wrote a Kafka consumer
  • 5. • High resolution: • Any mix of resolutions up to 1 sec • Streaming analytics: • Custom analytics pipelines at any scale that output in seconds • Streaming dashboards update in seconds • Multidimensional metrics: • Dimensions allow arbitrary modeling, pivoting, filtering, and grouping of both raw and derived (from analytics) metrics interactively on streaming data • E.g. 99th-percentile-of-latency-by-service-by-customer SignalFx is built for monitoring modern infrastructure
  • 6. • Designed to replace SimpleConsumer not the 0.9 consumer • Needed a non-blocking single threaded consumer • Wanted it to be low over head • 100s of thousands of messages/second • Sensitive to GC • The Kafka 0.9 consumer wasn’t ready yet Why write a new Kafka consumer
  • 7. SignalFx Kafka consumer - a brief introduction
  • 8. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Brokers, topics and partitions
  • 9. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Metadata request and response Client Metadata Request
  • 10. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Metadata request and response Client Metadata Request Metadata Response
  • 11. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Metadata request and response Client Metadata Request Metadata Response Partition Broker ID 0 1 1 2 …. …. n 3
  • 12. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Offset request and response Client Partition offset 0 9024 1 1245 …. …. n 11645 Partition Broker ID 0 1 1 2 …. …. n 3 Offsets (Consumer group/ external source)
  • 13. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Fetch request and response Client Fetch request Partition offset 0 9024 1 1245 …. …. n 11645 Partition Broker ID 0 1 1 2 …. …. n 3
  • 14. SignalFx Topic : 0 Topic : 3 Topic : 6 Topic : 9 Topic : 2 Topic : 5 Topic : 8 Topic : 11 Topic : 1 Topic : 4 Topic : 7 Topic : 10 BROKER 1 BROKER 2 BROKER 3 Fetch request and response Client Fetch response Partition offset 0 9026 1 1247 …. …. n 11649 Partition Broker ID 0 1 1 2 …. …. n 3
  • 16. SignalFx Main memory L1 D L1 I L3 L1 D L1 I L2L2 Core 1 Core 2 1
  • 17. Cache Lines • Data is transferred between memory and cache in blocks of fixed size, called cache lines (typically 64 bytes) • The memory subsystem makes a few bets to help us: • Temporal locality • Spatial locality • Prefetching
  • 18. SignalFx Main memory L1 D L1 I L3 L1 D L1 I L2L2 Core 1 Core 2 1 1 1 2 1 2 2 2 1 2
  • 19. SignalFx L1 D Main memory L1 D L1 I L3 L1 I L2L2 Core 1 Core 2 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 2 1 2 3 4 5 6 7 8 1 4 3 6 8 75
  • 20. Reference latency numbers for comparison By Jeff Dean: http://research.google.com/people/jeff/ L1 Cache 0.5ns Branch mispredict 5 ns L2 Cache 7 ns 14x L1 Cache Mutex lock/unlock 25 ns Main memory 100 ns 20x L2 Cache, 200x L1 Cache Compress 1K bytes (Zippy) 3,000 ns Send 1K bytes over 1Gbps 10,000 ns 0.01 ms Read 4K randomly from SSD 150,000 ns 0.15 ms Read 1MB sequentially from memory 250,000 ns 0.25 ms Round trip within same DC 500,000 ns 0.5 ms Read 1MB sequentially from SSD 1,000,000 ns 1 ms 4x memory Disk seek 10,000,000 ns 10 ms 20x DC roundtrip Read 1MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20x SSD Send packet CA->Netherlands->CA 150,000,000 ns 150 ms
  • 25. Optimization aims • We are NOT aiming for more data/second • Even a very inefficient implementation will be bottlenecked by the network • We are aiming to make the client get out of the way • The client is not the only thing running on the system • Leave all resources for the actual application
  • 26. Efficiency VS raw speed • We value efficiency more than raw speed for the client • Fewer cycles • Less cache usage and fewer cache misses • Less memory? • Efficiency for the client == raw speed for the application
  • 27. Efficiency from constraints • No consumer group functionality needed • A single topic • Finite number of integer partitions • Partition reassignment is rare and happens during startup and shutdown • We are in control of the code that consumes the messages
  • 28. SignalFx Use cache conscious data structures
  • 29. Use arrays and open addressing hash maps • Single topic. Less than 1024 partitions • Instead of maps we can use arrays • Or use primitive specialized open addressing hash maps
  • 30. Topic:Partition -> Offset Topic:Partition offset Foo:0 9026 Foo:1 1247 …. …. Foo:n 11649 Partition offset 0 9026 1 1247 …. …. n 11649 Foo Foo Offsets 9026 1247 11649
  • 31. offsetpartition partition* offset*partition* offset* Entry* Entry* Entry* Entry* 1 2 3 4 Hash map implemented as an array of lists of key* | value*
  • 32. offsetpartition partition* offset*partition* offset* Dependable cache miss generator List List List List
  • 33. Sparse array offset 0 offset 1 offset 2 offset 3 offset 4 offset 5 offset 6 offset 7 offset 8 offset 9 offset 10 offset 11 offset 12 offset 13 1
  • 34. SignalFx In memory 1160 partition* offset* Entry* Entry* Offsets 116 partition* offset* 0 116 116 Entry* Entry* In cache (4 * 2 + 4 + 8) + (4 + 4 + 8) + (4 + 8) + (8 + 8) = 64 bytes 1024 * 8 + 4 + 8 = 8204 bytes 4 * 64 = 256 bytes 1 * 64 = 64 bytes 1 2 3 4 1
  • 35. Low memory and cache friendly data structures • Queues built from integer arrays. Negative -> partition lost • Zero allocation hashed-wheel timer to close stuck connections • Open addressing hash maps • BitSets coded on top of long arrays whenever a set of partitions is required • Can be traversed in O(num set bits)
  • 36. Applicability and benefit to Kafka consumer 0.9 • Benefits - medium • Lots of hash map look ups • Applicability - low • Multiple topics - sparse arrays not a great match • Open addressing hash maps - preserve most of the benefits
  • 38. Eliminate redundant work • A single topic. Finite number of partitions: • Topic and client string immutable • The metadata request buffer can be created just once and kept around forever • Other requests can have their fixed part written out and only write the variable part on each request • Offset request = fixed_part + per_partition_part • Fetch request create = fixed_part + per_partition_part
  • 39. SignalFx SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING REPLICA_ID MAX_WAIT_TIME MIN_BYTES NUM_TOPICS TOPIC_STRING NUM_PARTITIONS 0 1266 1024 1 1164 1024 2 1900 1024 Fixed Variable FETCH REQUEST BUFFER
  • 40. SignalFx FETCH REQUEST BUFFER SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING REPLICA_ID MAX_WAIT_TIME MIN_BYTES NUM_TOPICS TOPIC_STRING NUM_PARTITIONS 0 1266 1024 1 1164 1024 2 1900 1024 Index 1200 1216 1232
  • 41. SignalFx SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING REPLICA_ID MAX_WAIT_TIME MIN_BYTES NUM_TOPICS TOPIC_STRING NUM_PARTITIONS Offsets 1289 1172 1990 0 1266 1024 1 1164 1024 2 1900 1024 Index 1200 1216 1232 FETCH REQUEST BUFFER
  • 42. SignalFx SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING REPLICA_ID MAX_WAIT_TIME MIN_BYTES NUM_TOPICS TOPIC_STRING NUM_PARTITIONS Offsets 1289 1172 1990 0 1289 1024 1 1172 1024 2 1990 1024 Index 1200 1216 1232 FETCH REQUEST BUFFER
  • 43. Code private void setNewOffsetsForFetchRequest() { final ByteBuffer buffer = this.fetchRequestBuffer; // Iterate through the partitions assigned to this broker // and write the offset directly on the buffer. for (int i = 0; i < partitionAssignment.length; i++) { // This loop runs in O(partitions assigned). long bitSet = partitionAssignment[i]; while (bitSet != 0) { final long t = bitSet & -bitSet; final int partitionId = i * 64 + Long.bitCount(t - 1); // The position in the buffer that points to the // beginning of the offset for this partition. final int bufferPositionForOffset = fetchRequestIndex[partitionId]; final long offset = partitionToOffset[partitionId]; // Write the offset directly. buffer.putLong(bufferPositionForOffset, offset); bitSet ^= t; } } }
  • 44. SignalFx SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING NUM_TOPICS TOPIC_STRING METADATA REQUEST BUFFER Fixed
  • 45. SignalFx SIZE API_KEY API_VERSION CORRELATION_ID CLIEND_ID_STRING REPLICA_ID NUM_TOPICS TOPIC_STRING NUM_PARTITIONS 0 1 2 3 4 5 OFFSET REQUEST BUFFER NUM_PARTITIONS_POSITION Fixed
  • 46. Applicability and benefit to Kafka consumer 0.9 • Benefits - high • Reuse instead of allocating - temporal locality • Steaming through 3 arrays - prefetching • One fetch request per fetch response - common • Metadata or offset requests - rare • Applicability - high • Internal detail so API doesn’t change • Even for consumer groups, partition reassignment and partition migration events are rare
  • 48. Stream responses to application • Pass each message to the application when it is ready • Consume messages synchronously without a copy or allocation • No deserialization required • Benefits add up when processing 100s of thousands of messages per second
  • 49. Low level interface public interface KafkaMessageHandler { void handleMessage(ByteBuffer buffer, int position, int length); } public interface KafkaConsumer { void poll(KafkaMessageHandler handler, long timeoutMs); . . . . . . }
  • 50. SignalFx Partition Message 1 Message 2 Message .. Message n 1 … … … … 2 … … … … 3 … … … … 4 … … … … Topic string, client string etc FETCH RESPONSE PARSING
  • 51. SignalFx Partition Message 1 Message 2 Message .. Message n 1 … … … … 2 … … … … 3 … … … … 4 … … … … Topic string, client string etc FETCH RESPONSE PARSING public interface KafkaMessageHandler { void handleMessage(ByteBuffer buffer, int position, int length); }
  • 52. SignalFx Partition Message 1 Message 2 Message .. Message n 1 … … … … 2 … … … … 3 … … … … 4 … … … … Topic string, client string etc FETCH RESPONSE PARSING public interface KafkaMessageHandler { void handleMessage(ByteBuffer buffer, int position, int length); }
  • 53. SignalFx Partition Message 1 Message 2 Message .. Message n 1 … … … … 2 … … … … 3 … … … … 4 … … … … Topic string, client string etc FETCH RESPONSE PARSING public interface KafkaMessageHandler { void handleMessage(ByteBuffer buffer, int position, int length); }
  • 54. Applicability and benefit to Kafka consumer 0.9 • Benefits - very high • Reuse response buffer, no allocations - temporal locality • Data is processed right after being read from the socket - temporal locality • Streaming through a buffer - spatial locality + prefetching • Combine with DirectByteBuffers for zero copy • Applicability - low • API too low level • Integrity of internal buffers compromised by bugs in application • Maybe a low level “with great power comes great responsibility” API
  • 56. Caveats • These are from running a very specific workload similar to our application • There are many Pareto-optimal choices for a client. Our’s is not better in any way - it’s just tuned for our workload • It can and will prove bad for other workloads
  • 57. Benchmark • Single topic-partition • Settings of fetch_max_wait, fetch_min_bytes, max_bytes_per_partition were identical • Only 5000 messages per second produced by a single producer • Each message is 23 bytes • Warm up -> profile for 5 mins • 5000/sec * 5 mins = 1.5 million • Profiler = Java Mission Control
  • 62. SignalFx With 5,000 messages/second Implementation CPU Allocation TLAB 0.9 consumer 6% 422.8 MB SignalFx consumer 1.3% 217 KB 4.6x 1944 x
  • 63. SignalFx With 10,000 messages/second Implementation CPU Allocation TLAB 0.9 consumer 6.122% 858 MB SignalFx consumer 1.456% 400 KB 4.2x 2145 x
  • 64. SignalFx Thank You! Rajiv Kurian rajiv@signalfx.com @rzidane360 WE’RE HIRING jobs@signalfx.com @SignalFx - signalfx.com/careers

Editor's Notes

  1. A Kafka cluster has multiple brokers. Each broker is a process of its own with an unique id. The unit of serializability in Kafka is a partition. Each partition has all its messages ordered. I like to think of a topic as a group of partitions. A partition has a statically assigned leader. From the POV of regular clients all read/write operations must go through the leader.
  2. So a client needs to know the mapping of topic-partitions to brokers. This mapping can change dynamically. A client begins by sending a metadata request to know this mapping. A metadata request can be sent to any broker in the cluster.
  3. The broker then replies with a metadata response.
  4. So the client can now form a map of partitions to brokers.
  5. Next the client needs to build a table of partition -> next offset to consume. It can get it from the consumer group functionality or some other external source.
  6. Once this is built it can send fetch requests for actual data.
  7. As long as there is actual data to consume and no errors it gets back a fetch response.
  8. Data is transferred between memory and cache in blocks of fixed size, called cache lines (typically 64 bytes). If you need a single byte, 63 others are coming in for the ride and paying the full tax. So you might as well use these bytes. When the processor needs to read or write a location in main memory, it first checks for a corresponding entry in the cache. In the case of: 1. a cache hit, the processor immediately reads or writes the data in the cache line 2. a cache miss, the cache allocates a new entry and copies in data from main memory, then the request (read or write) is fulfilled from the contents of the cache
  9. Data is transferred between memory and cache in blocks of fixed size, called cache lines (typically 64 bytes). If you need a single byte, 63 others are coming in for the ride and paying the full tax. So you might as well use these bytes.
  10. An application summing numbers in nodes of a linked list might take one cache miss per node.
  11. Spatial locality and prefetching help a lot when summing an array on the other hand. The compiler is also able to write better vectorized code if your layout looks like this.
  12. We really really care about cache usage and cache misses. We don’t care about memory as much. So efficiency for the client means more resources for the application which means a faster application.
  13. Almost all our optimizations are based on constraints that come from our use of the consumer. So, many of them are not directly applicable to generic Kafka clients which need to work well under various scenarios. We need no consumer group functionality. We manage partitions and offsets outside of Kafka. This makes our client super simple. A single topic. Our applications mostly consume a topic. We have a finite small number of partitions. Usually <= 1024. Partition reassignment is rare. I would imagine that this is true for most applications. Control of the entire pipeline means we can make some assumptions that a generic client cannot. End to end principle.
  14. Now the interesting part.
  15. Since we have a single topic, all partitions implicitly belong to that topic. So we don’t need a concept of topic-partition. We only have partitions. Since we don’t need topic-partition objects we can store all per partition data in arrays with the array index = partition number.
  16. It is important to acknowledge that this is a tradeoff. Like we said before we really care about cache space and cache misses. We are ready to trade off using extra memory to reduce our cache usage. Here is an example: Let’s imagine that we have a java util hash map of partition to offset. We’ve already shown that we can have multiple cache misses to do an offset get or put. Now let’s imagine that we have a single partition 0 with an offset 116, store in this map. How much memory does this use? We’ll be generous and assume that headers are only 8 bytes and references are only 4 bytes. So let’s assume that the entry array was preallocated for 2 entries. There is a 8 byte header, a 4 byte length and two 4 byte references. That’s 20 bytes. Similarly the actual entry is itself 16 bytes and the boxed long is 16 bytes and the boxed integer is 12 bytes. So in spite of all the references and indirection it only uses 256 bytes of memory. On the other hand let’s assume that our sparse array has been preallocated for 1024 partitions. So it has a 4 byte length, a 8 byte header and 1024 8 byte entries so a total of 8204 bytes which is around 8 KB. This is a lot more than 64 bytes and kind of wasteful. Now let’s look at how much cache is used by each solution. Each cache line is 64 bytes. So even if you want a single byte 63 unrelated bytes might come along for the ride. Now let’s look at the java hash map again. We first need to fetch the right entry - that’s one cache line. The other entry comes alone for the ride and possibly the length and header. So that’s 64 bytes already. Now the actual entry is on another cache line. That is another cache line used up. Now we need to look at the contents of the boxed partition. That’s another random memory location so a new cache line. Finally we fetch the offset itself and that’s another cache line. So it’s 4 cache lines and hence 256 bytes of cache used up through a simple get request. Now let’s look at the sparse offset array. We know where to fetch it from so with a single cache fetch we get the offset. It comes with potentially 7 other offsets none of which might be useful, but it’s still a single cache line. So we use only 64 bytes! This example is a bit counter-intuitive. It goes to show that a data structure using only 64 bytes of memory can actually use many more times that memory in cache and a data structure using 8 KB of memory might only use a single cache line. This is a bit like virtual memory vs physical memory. You can use a lot of virtual memory but use little physical memory and come out ahead. In our example physical memory is abundant (we have gigabytes of it). Cache memory is very limited. We only have around 32 KB of L1 cache for example so it’s much more precious that physical memory. This also shows how we are ready to make trade offs. Sparse arrays can take more memory but have a pretty guaranteed worse case cache usage and cache miss number.
  17. We talked about the main data structures. Our other data structures especially for our state machine implementation are all designed to be zero allocation in the steady state and very cache friendly. Even our hashed-wheel timer is made of primitive arrays with very few indirections.
  18. Since we service a single topic per client, we can stamp out the client id and topic id bits and never change them.
  19. Since this variable sized portion rarely changes, we can afford to create an index to it. So we have an index of a partition to it’s position within the fetch request ByteBuffer.
  20. So let’s imagine that we sent this particular fetch request with partitions 0, 1 and 2. We have a response and the offsets have been advanced as shown by the offsets table. Now to create the next fetch request, we just read the new offsets and use the index to write them directly on the old buffer.
  21. And we are ready to send this buffer. We avoided all the work required to create a buffer, write out the fixed size fields etc. It’s just writing a few integers to locations in memory.
  22. This is how the code looks. There is a bit of noise in the code because we are iterating a bit set representing the partition assignment. But otherwise the code is simple - fetch the position within our request buffer for this partition. Get the next offset to fetch. Write this offset at the right position.
  23. The metadata request is just frozen after consumer creation.
  24. For the offset request for example we can store a pointer to the num partitions part of the request. So when we need to send a new offset request we can directly seek there and write out the partition bits.
  25. We don’t use JSON, XML, Thrift, ProtocolBuffers etc for our messages. Our messages do not need to be deserialized before consumption. They can be consumed directly just like Kafka’s internal messages can be. There is no POJO created from a serialized message. Instead we can wrap the buffer in a flyweight and consume the fields of our messages by doing reads from the underlying buffer. So we don’t need any copies or any allocation for steady state processing.
  26. The interface however is low level. Any handler of a message is fed with a buffer and a position and length within that buffer that represents the message. We could also alternatively set a position and limit on the buffer and send it to the application. The poll call takes such a handler and feeds it with messages.
  27. So let’s imagine that this is a response from Kafka. There are a bunch of fixed size bits on the top that we can skip. The real payload is a message set per partition.
  28. We begin by going to the first message, ensuring there are no errors and then just passing the pointer and length to the handler. It synchronously consumes it making copies if necessary and then returns back to the parsing code.
  29. We then consume the second message.
  30. And the third and so on.
  31. Benefits are huge. Zero copy and zero allocation in the steady path. Since we are not creating a new ByteBuffer every time - DirectByteBuffers become viable. So we elide the copy involved in reading from the socket into HeapByteBuffers. Sadly the applicability of this optimization is low. We are in control of our buffers and their lifetime so it is easy for us to avoid a copy. It is perhaps possible to create a very low level api that is not the default. I’ve not had much luck pushing this agenda in the past :)
  32. The Kafka client allocates about 423 MB for 5000 * 300 = 1.5 million messages That’s 86.56% of all allocations. A size able portion of that is in fetch response parsing. A lot of that is ByteBuffer slicing which our client does not do at all. We talked about a possible but dangerous way to get rid of this entirely. 1.76% is in the selector. About 9.27% is in cluster init. I am not sure why that’s so much.
  33. We allocate 218 KB overall to process 5000 * 300 = 1.5 million messages. The consumer does no allocations of it’s own. There are allocations done by the java NIO stack but they don’t show up in the profile. Selectors allocate and we plan to use an allocation less Selector like the one the Netty project uses.
  34. CPU used was 6.6%. 91% of that was the 0.9 consumer so about 6% 12.% spent on check sum math. 67% on handling fetch responses - we talked about a way to make this very fast. Some 6% in metadata - not sure why
  35. CPU was 2.63%. The client uses about 50% of that so 1.31%. 16.67% of that is spent in the select call. So the client code accounts for 33.33% of 2.63 which is 0.88% CPU
  36. Similar story for 10000 messages/second 4x odd for CPU and a lot more for allocations.