Understanding Kafka Produce and Fetch api calls for high throughtput applications | Mik Kocikowski, Cloudflare

Kafka Wire Protocol
Understanding Kafka Produce And Fetch API Calls For High
Throughput Applications

Context
- My name is Mik Kocikowski
- I work on Data Team at Cloudflare
- We collect and process petabytes of logs per day
- We use Kafka to buffer these logs

Ballpark scale
- Petabytes of data per day
- Hundreds of Kafka brokers in "few" clusters
- Thousands of topic-partitions
- Read 3x input

The problem
- At scale, client resource use was unpredictable
- At scale, client state was incomprehensible

The solution
- Understand what (and why) the client actually does
- Implement the simplest possible client that meets our use case

Talk trajectory
- Go through basic nomenclature
- Describe the wire protocol
- Show how api calls can result in complex client state
- Simplify client for predictable behavior and resource use

Clients talk with brokers using the “wire protocol”
- https://kafka.apache.org/protocol
- Proprietary (but simple) asynchronous protocol over TCP

API “keys”
- https://kafka.apache.org/protocol#protocol_api_keys
- Numbers that identify different API calls
- Produce == 0
- Fetch == 1 (“consume”)
- Metadata == 3
- ApiVersions == 18
- Most keys have multiple versions
- Some calls can be made to any broker
- Others need to be made to the partition leader or group coordinator

API “messages” (requests)
- https://kafka.apache.org/protocol#protocol_messages
- Bodies of the requests and responses
- Nested structs with simple binary marshaling

Record batches
- https://kafka.apache.org/documentation/#recordbatch
- Struct that encapsulates user data
- 1 or more records (data) + their metadata
- The unit at which data is compressed, stored, and retrieved
- Kafka >=0.11 (before that “message sets”)
- "Sweet spot" for us "few MB" per batch

High level produce-fetch
- Records (user data) are collected into record batches
- Record batches are sent to kafka via "Produce" requests (API key 0)
- Record batches are retrieved from kafka via "Fetch" requests (API key 1)

High level “produce” flow (client perspective)
1. Connect to a random broker ("bootstrap")
2. Make a “Metadata” (key 3) call to get list of partition leaders for topic
3. Connect to brokers that lead individual partitions
4. Make “Produce” requests (key 0) to individual brokers
5. Goto #4

High level “fetch” flow (client perspective)
1. Connect to a random broker ("bootstrap")
2. Make a “Metadata” (key 3) call to get list of partition leaders for topic
3. Connect to brokers that lead individual partitions
4. Make “Fetch” (key 1) requests to individual brokers
5. Goto #4

Produce (key 0) requests
- https://kafka.apache.org/protocol#The_Messages_Produce
- Single request can carry data for multiple topics and partitions
- Broker must be leader for every topic-partition
- For every topic-partition 1 record batch is sent per request
- “acks” and “timeout_ms” apply to the whole request

Produce v7 request
Produce Request (Version: 7) => transactional_id acks timeout
[topic_data]
transactional_id => NULLABLE_STRING
acks => INT16
timeout => INT32
topic_data => topic [data]
topic => STRING
data => partition record_set
partition => INT32
record_set => RECORDS

Produce (key 0) responses
- https://kafka.apache.org/protocol#The_Messages_Produce
- Success or failure are per topic-partition
- Partial failures possible
- fail to ack replication
- broker not leader for topic-partition

Produce response v7
Produce Response (Version: 7) => [responses] throttle_time_ms
responses => topic [partition_responses]
topic => STRING
partition_responses => partition error_code base_offset log_append_time log_start_offset
partition => INT32
error_code => INT16
base_offset => INT64
log_append_time => INT64
log_start_offset => INT64
throttle_time_ms => INT32

Fetch (key 1) requests
- https://kafka.apache.org/protocol#The_Messages_Fetch
- Single request can be for data from multiple topics and partitions
- Broker must be leader for every topic-partition
- Offset must be specified for every topic-partition
- “max_wait_ms” and “min_bytes” apply to the whole request

Fetch (key 1) responses
- https://kafka.apache.org/protocol#The_Messages_Fetch
- There will be 0 or more record batches for each successful topic-partition
- Record batch is the unit at which data is returned (offset alignment)
- Success or failure are per topic-partition
- Partial failures possible

Broker connections
- Clients maintain one or more connections to each broker they talk to
- Connections in general are long lived
- Calls are asynchronous identified by “correlation id”

Client state can become complex
- Multiple async requests awaiting responses
- Multiple topics per request
- Multiple partitions per topic
- … any of which can be “slow” or “broken”

Complex client state is bad
- More resources required (memory, cpu)
- Error handling and “retry” logic is convoluted
- Troubleshooting is hard (what exactly is slow / broken?)

Client state can be simplified
- Separate connection for each topic-partition
- All requests synchronous

Simple client state is good
- Predictable per topic-partition resource use
- Binary error handling
- Troubleshooting is easier (isolate problems at connection level)

In practice
- We wrote our own kafka client (golang)
- In production for over a year
- Processes petabytes of data every day
- Something goes wrong all the time but:
- Resource consumption remains predictable
- Errors are easily traceable
https://github.com/mkocikowski/libkafka

Conclusion
- Simplicity is a requisite of scale
- Kafka at its core is simple
- Clients that follow the java client design are complex
Thank you!

Bonus point: individual records and offsets
- The unit at which Kafka operates is a record batch
- 1 or more records per record batch
- Compression is applied per record batch
- Fetch requests most efficient when aligned to record batch boundaries
- Our client operates on record batches not on individual records

Understanding Kafka Produce and Fetch api calls for high throughtput applications | Mik Kocikowski, Cloudflare

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Understanding Kafka Produce and Fetch api calls for high throughtput applications | Mik Kocikowski, Cloudflare

Similar to Understanding Kafka Produce and Fetch api calls for high throughtput applications | Mik Kocikowski, Cloudflare (20)

More from HostedbyConfluent

More from HostedbyConfluent (20)

Recently uploaded

Recently uploaded (20)

Understanding Kafka Produce and Fetch api calls for high throughtput applications | Mik Kocikowski, Cloudflare