Kafka
(Exactly-once)”
1
2
e
• ) notm l tm
• 2.6 6/-u y
• .54 6 4 22 . 0 P 1 B
• . A . EC CC 0 B A FC
• 3.2 6 C 1 B
• ( h SO u
• ) a d rs
• M
• R h( u gi
• M / C B
L
LTI
erhwenkuo@gmail.com
3
Agenda
• Why exactly-once?
• An overview of messaging semantics
• Why are duplicates introduced?
• What is exactly-once semantics?
• Exactly-once semantics in Kafka
4
Kafka Exactly-once
5
An overview of messaging semantics
Kafka message delivery semantics
• At most once: offsets are committed as soon as the message is received. If
the processing goes wrong, the message will be lost (it won’t be read again).
• At least once: offsets are committed after the message is processed. If the
processing goes wrong, the message will be read again. This can result in
duplicate processing of messages. Make sure your processing is idempotent
(i.e. processing again the message won’t impact your systems)
• Exactly once: Very difficult to achieve / need strong engineering. (Kafka start
to provide “exactly once” from v.0.11
6
• Stream processing is becoming the
norm; it’s more natural.
• Apache Kafka is the most popular
streaming platform.
• Mission critical applications require
stronger guarantees.
Why exactly-once?
7
Apache Kafka’s existing semantics
At Least Once
8
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
9
Producer configurations
Kafka’s Existing Semantics
At-least-once
Key Value
x yx y
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
Send(x, y)
10
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
Key Value
x yx y
K V
x yx y
11
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
ack
K V
x yx y
12
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx y
Key Value
x ya b
Send(a, b)
13
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
K V
x yx y
Key Value
x ya b
K V
x ya b
14
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx yack
K V
x ya b
,
15
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx yack
K V
x ya b
16
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
K V
x yx y
K V
x ya b
Key Value
x ya b
Send(a, b)
,
17
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
append(x, y)
K V
x yx y
Key Value
x ya b
K V
x ya b
K V
x ya b
18
Producer configurations
Kafka’s Existing Semantics
At-least-once
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
ack
K V
x yx y
K V
x ya b
K V
x ya b
B
At-least-once
!, ,
19
Producer configurations
Various failures must be handled correctly
• Broker can fail
• Producer-to-Broker RPC can fail
• Network between Producer & Broker can fail
• Producer client can fail
• Producer client can become zombie
Why are duplicates introduced?
20
Semantic Weaknesses
At-least-once
• Producer retries are not safe
• Processed data is not written atomically with corresponding offsets
• No protection from evil zombies
21
Producer
How did Kafka add exactly once
semantics?
version >= 0.11
22
Exactly-once semantics in Kafka, explained
Apache Kafka’s guarantees are stronger in 3 ways:
• Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Transactions
• Atomic writes across multiple topics/partitions.
• Exactly-once stream processing - (Kafka Stream & KSQL)
• across read-process-write tasks
23
Exactly-once, in-order, delivery
per partition
Idempotent Producer
24
Idempotent Producer Semantics
• Idempotent is the second name to exactly once. To stop processing a
message multiple times, message must be persisted to Kafka topic
only once.
• A single successful producer.send( ) will result in exactly one copy of
the message in the log in all circumstances
• Idempotent delivery ensures that messages are delivered exactly
once to a particular topic partition during the lifetime of a single
producer.
25
How idempotent producer works?
Key Design Principle
Idempotent producer
• Exactly-once, in-order, delivery per partition.
• Avoid data duplication
• Works transparently -- only one config change.
• Resilient to broker failures, producer retries, etc.
26
How idempotent producer works?
Message Binary Format Change
Idempotent producer
• Change Log Message Binary Format
• Add “ProducerId”
• Add “Sequence” number offset
Message Format
key
value
timestamp
headers
producerid
sequence
27
The idempotent producer
pid = 100pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
,
The log
28
Producer configurations
The idempotent producer
pid = 100pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
Send(x, y)
key value
x yx y
pid seq
x y100 0
29
The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
key value
x yx y
pid seq
x y100 0
pid = 100
append(x, y)
key value
x yx y
pid seq
x y100 0
30
The idempotent producer
pid = 100
seq = 0
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0ack
31
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 0
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
32
pid = 100
seq = 0
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
append(a, b)
33
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
key value
x ya b
pid seq
x y100 1
ack
,
34
The idempotent producer
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0Send(a, b)
key value
x ya b
pid seq
x y100 1
key value
x ya b
pid seq
x y100 1
,
35
The idempotent producer
Broker found duplicate (pid + seq)!
pid = 100
seq = 1
Producer Partition (leader)
Topic: xxx
Kafka
Brokers
The log
pid = 100
seq = 1
key value
x yx y
pid seq
x y100 0
ack - duplicate
key value
x ya b
pid seq
x y100 1
+ , -
B
,
+
36
Producer Configs
• idempotent=true
• retries=infinite
• acks = all
• max.inflight=1 ??
-() 1 )
1 1 1 () ! 1( )
- .- ,
37
Producer Configs
https://issues.apache.org/jira/browse/KAFKA-5494
38
Producer Configs (Revised)
• idempotent=true
• retries=infinite
• acks = all
• max.inflight=3 (or whatever)
, E
) 0 1)
. -. )
K
39
40

TDEA 2018 Kafka EOS (Exactly-once)

  • 1.
  • 2.
  • 3.
    e • ) notml tm • 2.6 6/-u y • .54 6 4 22 . 0 P 1 B • . A . EC CC 0 B A FC • 3.2 6 C 1 B • ( h SO u • ) a d rs • M • R h( u gi • M / C B L LTI erhwenkuo@gmail.com 3
  • 4.
    Agenda • Why exactly-once? •An overview of messaging semantics • Why are duplicates introduced? • What is exactly-once semantics? • Exactly-once semantics in Kafka 4
  • 5.
  • 6.
    An overview ofmessaging semantics Kafka message delivery semantics • At most once: offsets are committed as soon as the message is received. If the processing goes wrong, the message will be lost (it won’t be read again). • At least once: offsets are committed after the message is processed. If the processing goes wrong, the message will be read again. This can result in duplicate processing of messages. Make sure your processing is idempotent (i.e. processing again the message won’t impact your systems) • Exactly once: Very difficult to achieve / need strong engineering. (Kafka start to provide “exactly once” from v.0.11 6
  • 7.
    • Stream processingis becoming the norm; it’s more natural. • Apache Kafka is the most popular streaming platform. • Mission critical applications require stronger guarantees. Why exactly-once? 7
  • 8.
    Apache Kafka’s existingsemantics At Least Once 8
  • 9.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log 9 Producer configurations
  • 10.
    Kafka’s Existing Semantics At-least-once KeyValue x yx y Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) 10 Producer configurations
  • 11.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log append(x, y) Key Value x yx y K V x yx y 11 Producer configurations
  • 12.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y 12 Producer configurations
  • 13.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log K V x yx y Key Value x ya b Send(a, b) 13 Producer configurations
  • 14.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b 14 Producer configurations
  • 15.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b , 15 Producer configurations
  • 16.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log K V x yx yack K V x ya b 16 Producer configurations
  • 17.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log K V x yx y K V x ya b Key Value x ya b Send(a, b) , 17 Producer configurations
  • 18.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log append(x, y) K V x yx y Key Value x ya b K V x ya b K V x ya b 18 Producer configurations
  • 19.
    Kafka’s Existing Semantics At-least-once ProducerPartition (leader) Topic: xxx Kafka Brokers The log ack K V x yx y K V x ya b K V x ya b B At-least-once !, , 19 Producer configurations
  • 20.
    Various failures mustbe handled correctly • Broker can fail • Producer-to-Broker RPC can fail • Network between Producer & Broker can fail • Producer client can fail • Producer client can become zombie Why are duplicates introduced? 20
  • 21.
    Semantic Weaknesses At-least-once • Producerretries are not safe • Processed data is not written atomically with corresponding offsets • No protection from evil zombies 21 Producer
  • 22.
    How did Kafkaadd exactly once semantics? version >= 0.11 22
  • 23.
    Exactly-once semantics inKafka, explained Apache Kafka’s guarantees are stronger in 3 ways: • Idempotent producer • Exactly-once, in-order, delivery per partition. • Transactions • Atomic writes across multiple topics/partitions. • Exactly-once stream processing - (Kafka Stream & KSQL) • across read-process-write tasks 23
  • 24.
    Exactly-once, in-order, delivery perpartition Idempotent Producer 24
  • 25.
    Idempotent Producer Semantics •Idempotent is the second name to exactly once. To stop processing a message multiple times, message must be persisted to Kafka topic only once. • A single successful producer.send( ) will result in exactly one copy of the message in the log in all circumstances • Idempotent delivery ensures that messages are delivered exactly once to a particular topic partition during the lifetime of a single producer. 25
  • 26.
    How idempotent producerworks? Key Design Principle Idempotent producer • Exactly-once, in-order, delivery per partition. • Avoid data duplication • Works transparently -- only one config change. • Resilient to broker failures, producer retries, etc. 26
  • 27.
    How idempotent producerworks? Message Binary Format Change Idempotent producer • Change Log Message Binary Format • Add “ProducerId” • Add “Sequence” number offset Message Format key value timestamp headers producerid sequence 27
  • 28.
    The idempotent producer pid= 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers , The log 28 Producer configurations
  • 29.
    The idempotent producer pid= 100pid = 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log Send(x, y) key value x yx y pid seq x y100 0 29
  • 30.
    The idempotent producer pid= 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log key value x yx y pid seq x y100 0 pid = 100 append(x, y) key value x yx y pid seq x y100 0 30
  • 31.
    The idempotent producer pid= 100 seq = 0 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0ack 31
  • 32.
    The idempotent producer pid= 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 0 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 32 pid = 100 seq = 0
  • 33.
    The idempotent producer pid= 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 append(a, b) 33
  • 34.
    The idempotent producer pid= 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 key value x ya b pid seq x y100 1 ack , 34
  • 35.
    The idempotent producer pid= 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0Send(a, b) key value x ya b pid seq x y100 1 key value x ya b pid seq x y100 1 , 35
  • 36.
    The idempotent producer Brokerfound duplicate (pid + seq)! pid = 100 seq = 1 Producer Partition (leader) Topic: xxx Kafka Brokers The log pid = 100 seq = 1 key value x yx y pid seq x y100 0 ack - duplicate key value x ya b pid seq x y100 1 + , - B , + 36
  • 37.
    Producer Configs • idempotent=true •retries=infinite • acks = all • max.inflight=1 ?? -() 1 ) 1 1 1 () ! 1( ) - .- , 37
  • 38.
  • 39.
    Producer Configs (Revised) •idempotent=true • retries=infinite • acks = all • max.inflight=3 (or whatever) , E ) 0 1) . -. ) K 39
  • 40.