Pragmatic Guide to
Apache Kafka’s
Exactly Once Semantics
Gwen Shapira, Principal Engineer II
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2
Exactly-once
Semantics is two
features:
Idempotent Producer
Transactions:
● Atomic multi-partition write
● Read committed
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Setting the stage
3
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Duplicate messages
caused by producer retry
Writing Exactly Once to Kafka - What can go wrong?
4
Re-processing due to
application crash
Re-processing due to
zombie application
instance
Produce Output
A
Produce
A
Output
Produce
A A
Output
A
Consumer 1
A B C D
Input
offset
Consumer 2
A B C D
Input
Consumer 1
X
Idempotent Producer
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates caused by retries
6
Produce
Output Topic
Produce
A’
Output Topic
Produce
A’ A’
Output Topic
A’
No Response
A’
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it?
7
Produce (ID = 1)
Output Topic
Produce (ID = 1)
A’
Output Topic
Produce (ID = 1)
A’
Output Topic
A’
p.id = 1
seq = 2
No Response
P.id=1, seq=2
P.id=1, seq=2
A’
p.id = 1
seq = 2
Warning:
Duplicate
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it?
enable.idempotence=true
8
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 9
What doesn’t it solve? • Duplicates caused by calling
producer.send()twice with same
record
• Duplicates from external sources
• Duplicates caused by two different
producers
• Duplicates between application
restarts
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
When to avoid it?
• If you don’t care about reliability.
• If you use very short lived producers (new
producer per record or close).
10
99.95% of the time, idempotent
producer is safe and recommended
Transactions
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
13
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
3
2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
14
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
15
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
X
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
16
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
1
2
X
Consume
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to crashes
17
Stream Processing Application
Consume
Process
Produce
A B C D
A’ B’ C’ A’
Input Topic
Output Topic
offset Topic
1
2
X
Consume
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
18
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
2
2
We want two writes
“at the same time”
Or at least:
Either we wrote both output and offset
or pretend neither happened
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
19
Stream Processing Application
Consume
Process
Produce
A B C D
A’
Input Topic
Output Topic
1
offset Topic
1
2
2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
20
Produce
A’
Output Topic
1
offset Topic
I’m going to write
to output and offsets
topics
Transaction Log
1
2
2
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? Atomic multi-partition write
21
Produce
A’ C
Output Topic
1 C
offset Topic
I’m going to write
to output and offsets
topics
Transaction Log
3
4
4
Writing was successful. I’m
about to commit!
Committed and we are
done. TTYL
5
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’
Output Topic
<NOTHING>
Console
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C
Output Topic
A’
Console
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D E
Output Topic
A’
Console
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D E A
Output Topic
A’
Console
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Transactions - Consumer view
Read committed Consumer
A’ C D’ E’ A V
Output Topic
A’, V
Console
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
27
Stream Processing Application
Consume
Process
Produce
A B C D
Input Topic
Output Topic
offset Topic
1
2
A B C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
28
Stream Processing Application
Consume
Process
Produce
A B C D
Input Topic
Output Topic
offset Topic
1
2
A B C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
29
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
A B C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
30
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
A B C
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What does it solve? Duplicates due to Zombies
31
Stream Processing
Application
Consume
Process
Produce
A B C D
A’ B’ C’ A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Stream Processing Application
Consume
Process
Produce
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
32
Application 1
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Consume
Process
Produce
A B C
How do we know
that we have a
zombie?
We give unique
transactional.id
to every application
instance.
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
33
Application 1
Epoch 0
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Epoch 1
Consume
Process
Produce
A B C
How do we know
who is the zombie?
Apps register
transactional.id when
they start and get
an epoch.
Newest epoch wins
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How does it solve it? zombie fencing
34
Application 1
Epoch 0
Consume
Process
Produce
A B C D
A’ B’ C’
Input Topic
Output Topic
offset Topic
A’ B’ C’
Application 1
Epoch 1
Consume
Process
Produce
A’, epoch 0
Dude, you are
dead.
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (The sane way)
Use Kafka Streams with
processing.guarantee = exactly_once
(If you have broker 2.5+: exactly_once_beta)
35
Sidebar: Exactly Once Beta
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 37
Picking a good
transactional.id
is non-trivial
What is “same instance of app”?
In Kafka streams it is
“consuming from same partition”
Task = consumer + processor + producer
exactly_once uses task_id as
transactional.id
But…
- Producer per task is heavy
- Need to initialize new producer on every
rebalance
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 38
exactly_once_beta • Does not use transactional.id for
fencing
• Use consumer group information
instead:
Group ID
Consumer group generation
(epoch)
• Fencing happens during the offset
commit, which includes the consumer
group information
• Made possible by KIP-477
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (Hard and likely wrong way)
39
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
How to use it? (Hard and likely wrong way)
40
What does it solve?
The main use-case is accurate aggregation in
streams processing applications.
Easy to use in any Kafka Streams application.
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
What doesn’t it solve?
● Side-effects during processing
● Reading from Kafka and writing to a database
● Reading from a database and writing to Kafka
● Replicating from one Kafka to another (unless replicating all topics)
● Publish-subscribe pattern (or rather - this depends a lot on the consumer)
42
When to avoid it?
If it doesn’t fit into a Kafka Streams app, it is
probably not a good idea.
Don’t keep creating new transactional.id
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Performance Notes
● Overhead on producer - fixed per transaction:
○ Register transactional.id once in its lifetime
○ Register partition in transaction once per partition per transaction
○ Extra commit marker per partition
○ Synchronous commit
● Consumer:
○ Reads extra commit markers
○ read_committed will wait for transaction commits.
Large transactions will increase end to end latency
Larger transactions == higher throughput (due to lower overhead), but higher e2e latency
44
Summary
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 47
Two things to
remember
Use idempotent producer,
but not with FaaS.
Use Kafka Streams with
processing.guarantee = exactly_once
Or exactly_once_beta
if you have 2.5+ brokers
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 48
Small Plug
The talk is based on a new chapter.
Early release is available via O’Reilly Safari.
Thanks to Ron Dagostino, Justine Olshan,
Lucas Bradstreet, Mike Bin, Bob Barrett,
Boyang Chen, Guozhang Wang and
Jason Gustafson for all the help.
Copyright 2021, Confluent, Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc.
Good resources
https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/Trans
actionalMessageCopier.java
https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/
https://www.confluent.io/blog/transactions-apache-kafka/
https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/
https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Sem
antics
49
Thank you!
@gwenshap
gwen@confluent.io
cnfl.io/meetups cnfl.io/slack
cnfl.io/blog

Pragmatic Guide to Apache Kafka®'s Exactly Once Semantics

  • 1.
    Pragmatic Guide to ApacheKafka’s Exactly Once Semantics Gwen Shapira, Principal Engineer II
  • 2.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 2 Exactly-once Semantics is two features: Idempotent Producer Transactions: ● Atomic multi-partition write ● Read committed
  • 3.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Setting the stage 3 Stream Processing Application Consume Process Produce A B C D A’ Input Topic Output Topic 1 offset Topic
  • 4.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Duplicate messages caused by producer retry Writing Exactly Once to Kafka - What can go wrong? 4 Re-processing due to application crash Re-processing due to zombie application instance Produce Output A Produce A Output Produce A A Output A Consumer 1 A B C D Input offset Consumer 2 A B C D Input Consumer 1 X
  • 5.
  • 6.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates caused by retries 6 Produce Output Topic Produce A’ Output Topic Produce A’ A’ Output Topic A’ No Response A’
  • 7.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? 7 Produce (ID = 1) Output Topic Produce (ID = 1) A’ Output Topic Produce (ID = 1) A’ Output Topic A’ p.id = 1 seq = 2 No Response P.id=1, seq=2 P.id=1, seq=2 A’ p.id = 1 seq = 2 Warning: Duplicate
  • 8.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How to use it? enable.idempotence=true 8
  • 9.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 9 What doesn’t it solve? • Duplicates caused by calling producer.send()twice with same record • Duplicates from external sources • Duplicates caused by two different producers • Duplicates between application restarts
  • 10.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. When to avoid it? • If you don’t care about reliability. • If you use very short lived producers (new producer per record or close). 10
  • 11.
    99.95% of thetime, idempotent producer is safe and recommended
  • 12.
  • 13.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to crashes 13 Stream Processing Application Consume Process Produce A B C D A’ Input Topic Output Topic 1 offset Topic 1 3 2
  • 14.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to crashes 14 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic 1 2
  • 15.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to crashes 15 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic 1 2 X
  • 16.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to crashes 16 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic 1 2 X Consume
  • 17.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to crashes 17 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ A’ Input Topic Output Topic offset Topic 1 2 X Consume
  • 18.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? Atomic multi-partition write 18 Stream Processing Application Consume Process Produce A B C D A’ Input Topic Output Topic 1 offset Topic 1 2 2 We want two writes “at the same time” Or at least: Either we wrote both output and offset or pretend neither happened
  • 19.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? Atomic multi-partition write 19 Stream Processing Application Consume Process Produce A B C D A’ Input Topic Output Topic 1 offset Topic 1 2 2
  • 20.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? Atomic multi-partition write 20 Produce A’ Output Topic 1 offset Topic I’m going to write to output and offsets topics Transaction Log 1 2 2
  • 21.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? Atomic multi-partition write 21 Produce A’ C Output Topic 1 C offset Topic I’m going to write to output and offsets topics Transaction Log 3 4 4 Writing was successful. I’m about to commit! Committed and we are done. TTYL 5
  • 22.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Transactions - Consumer view Read committed Consumer A’ Output Topic <NOTHING> Console
  • 23.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Transactions - Consumer view Read committed Consumer A’ C Output Topic A’ Console
  • 24.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Transactions - Consumer view Read committed Consumer A’ C D E Output Topic A’ Console
  • 25.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Transactions - Consumer view Read committed Consumer A’ C D E A Output Topic A’ Console
  • 26.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Transactions - Consumer view Read committed Consumer A’ C D’ E’ A V Output Topic A’, V Console
  • 27.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to Zombies 27 Stream Processing Application Consume Process Produce A B C D Input Topic Output Topic offset Topic 1 2 A B C
  • 28.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to Zombies 28 Stream Processing Application Consume Process Produce A B C D Input Topic Output Topic offset Topic 1 2 A B C
  • 29.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to Zombies 29 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Stream Processing Application Consume Process Produce A B C
  • 30.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to Zombies 30 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Stream Processing Application Consume Process Produce A B C
  • 31.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What does it solve? Duplicates due to Zombies 31 Stream Processing Application Consume Process Produce A B C D A’ B’ C’ A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Stream Processing Application Consume Process Produce
  • 32.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? zombie fencing 32 Application 1 Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Application 1 Consume Process Produce A B C How do we know that we have a zombie? We give unique transactional.id to every application instance.
  • 33.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? zombie fencing 33 Application 1 Epoch 0 Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Application 1 Epoch 1 Consume Process Produce A B C How do we know who is the zombie? Apps register transactional.id when they start and get an epoch. Newest epoch wins
  • 34.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How does it solve it? zombie fencing 34 Application 1 Epoch 0 Consume Process Produce A B C D A’ B’ C’ Input Topic Output Topic offset Topic A’ B’ C’ Application 1 Epoch 1 Consume Process Produce A’, epoch 0 Dude, you are dead.
  • 35.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How to use it? (The sane way) Use Kafka Streams with processing.guarantee = exactly_once (If you have broker 2.5+: exactly_once_beta) 35
  • 36.
  • 37.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 37 Picking a good transactional.id is non-trivial What is “same instance of app”? In Kafka streams it is “consuming from same partition” Task = consumer + processor + producer exactly_once uses task_id as transactional.id But… - Producer per task is heavy - Need to initialize new producer on every rebalance
  • 38.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 38 exactly_once_beta • Does not use transactional.id for fencing • Use consumer group information instead: Group ID Consumer group generation (epoch) • Fencing happens during the offset commit, which includes the consumer group information • Made possible by KIP-477
  • 39.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How to use it? (Hard and likely wrong way) 39
  • 40.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. How to use it? (Hard and likely wrong way) 40
  • 41.
    What does itsolve? The main use-case is accurate aggregation in streams processing applications. Easy to use in any Kafka Streams application.
  • 42.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. What doesn’t it solve? ● Side-effects during processing ● Reading from Kafka and writing to a database ● Reading from a database and writing to Kafka ● Replicating from one Kafka to another (unless replicating all topics) ● Publish-subscribe pattern (or rather - this depends a lot on the consumer) 42
  • 43.
    When to avoidit? If it doesn’t fit into a Kafka Streams app, it is probably not a good idea. Don’t keep creating new transactional.id
  • 44.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Performance Notes ● Overhead on producer - fixed per transaction: ○ Register transactional.id once in its lifetime ○ Register partition in transaction once per partition per transaction ○ Extra commit marker per partition ○ Synchronous commit ● Consumer: ○ Reads extra commit markers ○ read_committed will wait for transaction commits. Large transactions will increase end to end latency Larger transactions == higher throughput (due to lower overhead), but higher e2e latency 44
  • 45.
  • 47.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 47 Two things to remember Use idempotent producer, but not with FaaS. Use Kafka Streams with processing.guarantee = exactly_once Or exactly_once_beta if you have 2.5+ brokers
  • 48.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. 48 Small Plug The talk is based on a new chapter. Early release is available via O’Reilly Safari. Thanks to Ron Dagostino, Justine Olshan, Lucas Bradstreet, Mike Bin, Bob Barrett, Boyang Chen, Guozhang Wang and Jason Gustafson for all the help.
  • 49.
    Copyright 2021, Confluent,Inc. All rights reserved. This document may not be reproduced in any manner without the express written permission of Confluent, Inc. Good resources https://github.com/apache/kafka/blob/trunk/tools/src/main/java/org/apache/kafka/tools/Trans actionalMessageCopier.java https://www.confluent.io/blog/enabling-exactly-once-kafka-streams/ https://www.confluent.io/blog/transactions-apache-kafka/ https://www.confluent.io/blog/simplified-robust-exactly-one-semantics-in-kafka-2-5/ https://cwiki.apache.org/confluence/display/KAFKA/KIP-129%3A+Streams+Exactly-Once+Sem antics 49
  • 50.