Exactly-once Semantics in Apache Kafka

1
Introducing Exactly Once
Semantics in Apache Kafka™
Apurva Mehta, Software Engineer,
Gehrig Kunz, Technical Product Marketing Manager

2
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka: Is it Practical?
• Next Steps

3
Exactly Once Semantics is a hard problem

4
An overview of messaging semantics
• At-most once
• At-least once
• Exactly-once

5
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.

6
Why exactly-once?
• Stream processing is becoming the norm; it’s more natural.
• Apache Kafka is the most popular streaming platform.
• Mission critical applications require stronger guarantees.
In other words: make stream processing easy,
simple, and reliable enough for everyone.

7
Apache Kafka’s existing semantics
At Least Once

8
Kafka’s Existing Semantics

9

10

11

12

13

14
What do we do now???

15
Kafka’s Existing Semantics: At Least Once

16

17

18
Why are duplicates introduced?
Various failures must be handled correctly:
• Broker can fail
• Producer-to-Broker RPC can fail
• Producer or Consumer client can fail

19
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates and headaches.

20
The age old engineering question
Before we make this work, are we sure we should?

21
KafkaCash: A Peer to Peer Lending App
A peer-to-peer lending platform.

22
Help Bob reach $1000, send him $10

23
KafkaCash, powered by Kafka

27
How did Kafka add exactly once semantics?

28
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer: Exactly-once, in-order, delivery
per partition.
• Transactions: Atomic writes across partitions.
• Exactly-once stream processing across read-process-
write tasks.

29
Part 1/3 : Idempotent Producer
Exactly-once, in-order, delivery per partition

30
Idempotent Producer Semantics
A single --successful!-- producer.send will result in
exactly one copy of the message in the log in all
circumstances.

31
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 0 (preferably MAX_INT)

40
TL;DR: idempotent producer
• Works transparently -- only one config change.
• Sequence numbers and producer ids are in the log.
• Resilient to broker failures, producer retries, etc.

41
Part 2/3 : Transactions
Atomic writes across multiple partitions.

42
Transactions semantics
• Atomic writes across multiple partitions.
• All messages in a transaction are made visible together,
or none are.
• Consumers must be configured to skip uncommitted
messages.

43
Producer config for transactions
• transactional.id = ‘some string’
• Typically based on the partition identifier in a partitioned,
stateful, app.
• Enables transaction recovery across producer sessions.

44
The transaction API
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.commitTransaction();
} catch (KafkaException e) {
producer.abortTransaction();
}

46
1. Initialize the producer
try {
}

47
Initializing ‘transactions’

48
2. Begin transactions and send data
try {
}

51
3. Commit transaction
try {
}

56
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”

57
What do you get with isolation levels?
• read_committed: consumers read to the point where there
are no open transactions.
• read_uncommitted: will read everything.
• Messages read in offset order.

58
TL;DR: Transactions
• Atomic, multi-partition, writes.
• Use the new producer APIs for transactions.
• Consumers can filter out uncommitted or aborted
transactional messages.

59
Part 3/3 : Stream Processing
Stream Processing with
Exactly Once Semantics

60
Streams config
• processing.mode = “exactly_once”

61
End-to-end exactly-once semantics
• The read-process-write operation is atomic.
• Thus streams tasks produce valid answers even when
failures happen.

63
Exactly Once Semantics in Kafka
Is it practical?

64
Performance boost for Apache Kafka 0.11!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Details: https://bit.ly/kafka-eos-perf

65
Gains due to more efficient message format

66
What about the idempotent producer and transactions?
• Transactions: 3-5% overhead for 100ms transactions, 1KB
messages.
• Longer transactions and better batching result in better
performance.
• 20% overhead relative to at-most once delivery without
ordering guarantees.
• Idempotent producer alone has negligible overhead.

67
Putting it together
• We talked through an idempotent producer
• How we added transactions with atomic writes
• The impact it has on stream processing

68
When is it available?
Available to use in Kafka 0.11, June 2017.

69
Where we’ve come
2007
High throughput
messaging broker
2008
Highly available
replicated log 2012
Top Level
Apache Project
2016
Streams API
Connect API
2017
Exactly Once
Semantics

70
San Francisco
August 28, 2017
Organized by Confluent

71
What’s next for you
slackpass.io/
confluentcommunity
v
Try it
v v
Join the Community Let us know what
you think
@ConfluentDownload Confluent
Open Source

Exactly-once Semantics in Apache Kafka

More Related Content

What's hot

Viewers also liked

Similar to Exactly-once Semantics in Apache Kafka

More from confluent

Recently uploaded

Exactly-once Semantics in Apache Kafka