When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka, Gwen Shapira, Jeff Holoman

When it absolutely, positively,
has to be there
Reliability Guarantees in Apache Kafka
@jeffholoman @gwenshap

Kafka
• High Throughput
• Low Latency
• Scalable
• Centralized
• Real-time

“If data is the lifeblood of high technology, Apache
Kafka is the circulatory system”
--Todd Palino
Kafka SRE @ LinkedIn

If Kafka is a critical piece of our pipeline
 Can we be 100% sure that our data will get there?
 Can we lose messages?
 How do we verify?
 Who’s fault is it?

Distributed Systems
 Things Fail
 Systems are designed to
tolerate failure
 We must expect failures
and design our code and
configure our systems to
handle them

Network
Broker MachineClient Machine
Data Flow
Kafka Client
Broker
O/S Socket Buffer
NIC
NIC
Page Cache
Disk
Application Thread
O/S Socket Buffer
async
callback
✗
✗
✗
✗
✗
✗
data
ack /
exception

Client Machine
Kafka Client
O/S Socket Buffer
NIC
Application Thread
✗
✗
Broker Machine
Broker
NIC
Page Cache
Disk
O/S Socket Buffer
miss
✗
✗
✗
Network
Data Flow
✗
data
offsets
ZK
Kafka✗

Replication is your friend
 Kafka protects against failures by replicating data
 The unit of replication is the partition
 One replica is designated as the Leader
 Follower replicas fetch data from the leader
 The leader holds the list of “in-sync” replicas

Replication and ISRs
0
1
2
0
1
2
0
1
2
Producer
Broker
100
Broker
101
Broker
102
Topic:
Partitions:
Replicas:
my_topic
3
3
Partition:
Leader:
ISR:
1
101
100,102
Partition:
Leader:
ISR:
2
102
101,100
Partition:
Leader:
ISR:
0
100
101,102

ISR
• 2 things make a replica in-sync
– Lag behind leader
• replica.lag.time.max.ms – replica that didn’t fetch or is behind
• replica.lag.max.messages – will go away has gone away in 0.9
– Connection to Zookeeper

Terminology
• Acked
– Producers will not retry sending.
– Depends on producer setting
• Committed
– Consumers can read.
– Only when message got to all
ISR.
• replica.lag.time.max.ms
– how long can a dead replica
prevent consumers from
reading?

Replication
• Acks = all
– only waits for in-sync replicas to reply.
Replica 3
100
Replica 2
100
Replica 1
100
Time

• Replica 3 stopped replicating for some reason
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
Acked in acks = all
“committed”
Acked in acks = 1
but not
“committed”

Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline
• All messages are now acked and committed

• 2nd Replica drops out, or is offline
Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time

Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
• Now we’re in trouble

Replication
• If Replica 2 or 3 come back online before the leader, you can will lose
data.
Replica 3
100
Replica 2
100
101
Replica 1
100
101
102
103
104Time
All those are
“acked” and
“committed”

So what to do
• Disable Unclean Leader Election
– unclean.leader.election.enable = false
• Set replication factor
– default.replication.factor = 3
• Set minimum ISRs
– min.insync.replicas = 2

Warning
• min.insync.replicas is applied at the topic-level.
• Must alter the topic configuration manually if created
before the server level change
• Must manually alter the topic < 0.9.0 (KAFKA-2114)

Replication
• Replication = 3
• Min ISR = 2
Replica 3
100
Replica 2
100
Replica 1
100
Time

Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101
Time
• One replica drops out of ISR, or goes offline

Replication
Replica 3
100
Replica 2
100
101
Replica 1
100
101102
103
104
Time
• 2nd Replica fails out, or is out of sync
Buffers
in
Produce
r

Producer Internals
• Producer sends batches of messages to a buffer
M3
Application
Thread
Application
Thread
Application
Thread
send()
M2 M1 M0
Batch 3
Batch 2
Batch 1
Fail
? response
retry
Update Future
callback
drain
Metadata or
Exception

Basics
• Durability can be configured with the producer
configuration request.required.acks
– 0 The message is written to the network (buffer)
– 1 The message is written to the leader
– all The producer gets an ack after all ISRs receive the data; the
message is committed
• Make sure producer doesn’t just throws messages away!
– block.on.buffer.full = true

“New” Producer
• All calls are non-blocking async
• 2 Options for checking for failures:
– Immediately block for response: send().get()
– Do followup work in Callback, close producer after error threshold
• Be careful about buffering these failures. Future work? KAFKA-1955
• Don’t forget to close the producer! producer.close() will block until in-flight txns
complete
• retries (producer config) defaults to 0
• message.send.max.retries (server config) defaults to 3
• In flight requests could lead to message re-ordering

Consumer
• Three choices for Consumer API
– Simple Consumer
– High Level Consumer (ZookeeperConsumer)
– New KafkaConsumer

New Consumer – attempt #1
props.put("enable.auto.commit", "true");
props.put("auto.commit.interval.ms", "10000");
KafkaConsumer<String, String> consumer = new KafkaConsumer<String, String>(props);
consumer.subscribe(Arrays.asList("foo", "bar"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(100);
for (ConsumerRecord<String, String> record : records) {
processAndUpdateDB(record);
}
}
What if we crash
after 8 seconds?
Commit automatically every
10 seconds

props.put("enable.auto.commit", "false");
while (true) {
consumer.commitSync();
What are you really
committing?

while (true) {
TopicPartition tp = new TopicPartition(record.topic(), record.partition());
OffsetAndMetadata oam = new OffsetAndMetadata(record.offset() +1);
consumer.commitSync(Collections.singletonMap(tp,oam));
Is this fast enough?

int counter = 0;
while (true) {
counter ++;
if (counter % 100 == 0) {
TopicPartition tp = new TopicPartition(record.topic(), record.partition());
OffsetAndMetadata oam = new OffsetAndMetadata(record.offset() + 1);
consumer.commitSync(Collections.singletonMap(tp, oam));

Consumer Offsets
P0 P2 P3 P4 P5 P6
Commit

Consumer Offsets
P0 P2 P3 P4 P5 P6
Consumer
Thread 1 Thread 2 Thread 3 Thread 4
Duplicates

Rebalance Listener
public class MyRebalanceListener implements ConsumerRebalanceListener {
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
}
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
commitOffsets();
}
}
consumer.subscribe(Arrays.asList("foo", "bar"), new MyRebalanceListener());
Careful! This method will need
to know the topic, partition and
offset of last record you got

At Least Once Consuming
1. Commit your own offsets - Set autocommit.enable =
false
2. Use Rebalance Listener to limit duplicates
3. Make sure you commit only what you are done processing
4. Note: New consumer is single threaded – one consumer
per thread.

Exactly Once Semantics
• At most once is easy
• At least once is not bad either – commit after 100% sure
data is safe
• Exactly once is tricky
– Commit data and offsets in one transaction
– Idempotent producer

Using External Store
• Don’t use commitSync()
• Implement your own “commit” that saves both data and
offsets to external store.
• Use the RebalanceListener to find the correct offset

Seeking right offset
public class SaveOffsetsOnRebalance implements ConsumerRebalanceListener {
private Consumer<?,?> consumer;
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// save the offsets in an external store using some custom code not described here
for (TopicPartition partition : partitions)
saveOffsetInExternalStore(consumer.position(partition));
}
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// read the offsets from an external store using some custom code not described here
for (TopicPartition partition : partitions)
consumer.seek(partition, readOffsetFromExternalStore(partition));
}
}

Monitoring for Data Loss
• Monitor for producer errors – watch the retry numbers
• Monitor consumer lag – MaxLag or via offsets
• Standard schema:
– Each message should contain timestamp and originating service and
host
• Each producer can report message counts and offsets to a
special topic
• “Monitoring consumer” reports message counts to another
special topic
• “Important consumers” also report message counts
• Reconcile the results

Be Safe, Not Sorry
• Acks = all
• Block.on.buffer.full = true
• Retries = MAX_INT
• ( Max.inflight.requests.per.connect = 1 )
• Producer.close()
• Replication-factor >= 3
• Min.insync.replicas = 2
• Unclean.leader.election = false
• Auto.offset.commit = false
• Commit after processing
• Monitor!

When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka, Gwen Shapira, Jeff Holoman

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka, Gwen Shapira, Jeff Holoman

Similar to When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka, Gwen Shapira, Jeff Holoman (20)

More from confluent

More from confluent (20)

Recently uploaded

Recently uploaded (20)

When it Absolutely, Positively, Has to be There: Reliability Guarantees in Kafka, Gwen Shapira, Jeff Holoman

Editor's Notes