SlideShare a Scribd company logo
1 of 136
Download to read offline
Guozhang Wang Lei Chen Ayusman Dikshit Jason Gustafson
Boyang Chen Matthias J. Sax John Roesler Sophie Blee-Goldman Bruno Cadonna
Apurva Mehta Varun Madan Jun Rao
Consistency and Completeness
Rethinking Distributed Stream Processing in Apache Kafka
Outline
• Stream processing (with Kafka): correctness challenges
• Exactly-once consistency with failures
• Completeness with out-of-order data
• Use case and conclusion
2
3
Stream Processing
• A different programming paradigm
• .. that brings computation to unbounded data
• .. but not necessarily transient, approximate, or lossy
4
• Persistent Buffering
• Logical Ordering
• Scalable “source-of-truth”
Kafka: Streaming Platform
Kafka Concepts: the Log
4 5 5 7 8 9 10 11 12
...
Producer Write
Consumer1 Reads
(offset 7)
Consumer2 Reads
(offset 10)
Messages
3
5
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
Kafka Concepts: the Log
6
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
7
High-Availability: Must-have
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
7
High-Availability: Must-have
Topic 1
Topic 2
Partitions
Producers
Producers
Consumers
Consumers
Brokers
7
High-Availability: Must-have
[VLDB 2015]
8
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
Your App
Stream Processing with Kafka
9
Process
State
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
Your App
Stream Processing with Kafka
10
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Process
Streams App
Duplicate Results on Failures
10
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
State
Process
Streams App
Duplicate Results on Failures
10
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
Duplicate Results on Failures
11
State
Process
Kafka Topic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
State
Streams App
Duplicate Results on Failures
12
Duplicate Results on Failures
12
2
2
3
3
4
4
Duplicate Results on Failures
12
2
2
3
3
4
4
Duplicates propagated downstream and emitted externally
Duplicate Results on Failures
13
Challenge #1: Consistency (a.k.a. exactly-once)
An correctness guarantee for stream processing,
.. that for each received record,
.. its process results will be reflected exactly once,
.. even under failures
It’s all about Time
• Event-time (when a record is created)
• Processing-time (when a record is processed)
14
Event-time 4 5 6 1 2 3 7 8
Processing-time 1977 1980 1983 1999 2002 2005 2015 2017
15
P
H
A
N
T
O
M
M
E
N
A
C
E
A
T
T
A
C
K
O
F
T
H
E
C
L
O
N
E
S
R
E
V
E
N
G
E
O
F
T
H
E
S
I
T
H
A
N
E
W
H
O
P
E
T
H
E
E
M
P
I
R
E
S
T
R
I
K
E
S
B
A
C
K
R
E
T
U
R
N
O
F
T
H
E
J
E
D
I
T
H
E
F
O
R
C
E
A
W
A
K
E
N
S
Out-of-Order
T
H
E
L
A
S
T
J
E
D
I
Event-time 4 5 6 1 2 3 7 8
Processing-time 1977 1980 1983 1999 2002 2005 2015 2017
15
P
H
A
N
T
O
M
M
E
N
A
C
E
A
T
T
A
C
K
O
F
T
H
E
C
L
O
N
E
S
R
E
V
E
N
G
E
O
F
T
H
E
S
I
T
H
A
N
E
W
H
O
P
E
T
H
E
E
M
P
I
R
E
S
T
R
I
K
E
S
B
A
C
K
R
E
T
U
R
N
O
F
T
H
E
J
E
D
I
T
H
E
F
O
R
C
E
A
W
A
K
E
N
S
Out-of-Order
T
H
E
L
A
S
T
J
E
D
I
Event-time 4 5 6 1 2 3 7 8
Processing-time 1977 1980 1983 1999 2002 2005 2015 2017
15
P
H
A
N
T
O
M
M
E
N
A
C
E
A
T
T
A
C
K
O
F
T
H
E
C
L
O
N
E
S
R
E
V
E
N
G
E
O
F
T
H
E
S
I
T
H
A
N
E
W
H
O
P
E
T
H
E
E
M
P
I
R
E
S
T
R
I
K
E
S
B
A
C
K
R
E
T
U
R
N
O
F
T
H
E
J
E
D
I
T
H
E
F
O
R
C
E
A
W
A
K
E
N
S
Out-of-Order
T
H
E
L
A
S
T
J
E
D
I
Incomplete results produced due to time skewness
16
Challenge #2: Completeness (with out-of-order data)
An correctness guarantee for stream processing,
.. that even with out-of-order data streams,
.. incomplete results would not be delivered
17
Blocking + Checkpointing
One stone to kill all birds?
• Block processing and result emitting until complete
• Hard trade-offs between latency and correctness
• Depend on global blocking markers
18
A Log-based Approach:
• Leverage on persistent, immutable, ordered logs
• Decouple consistency and completeness handling
Kafka Streams
• New client library beyond producer and consumer
• Powerful yet easy-to-use
• Event time, stateful processing
• Out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
19
20
Anywhere, anytime
Ok. Ok. Ok. Ok.
Streams DSL and KSQL
21
CREATE STREAM fraudulent_payments AS
SELECT * FROM payments
WHERE fraudProbability > 0.8;
val fraudulentPayments: KStream[String, Payment] = builder
.stream[String, Payment](“payments-kafka-topic”)
.filter((_ ,payment) => payment.fraudProbability > 0.8)
fraudulentPayments.to(“fraudulent-payments-topic”)
[EDBT 2019]
22
Kafka Topic B Kafka Topic A
P1
P2
P1
P2
Processing in Kafka Streams
23
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
Processing in Kafka Streams
24
Kafka Topic A
Kafka Topic B
Processing in Kafka Streams
P1
P2
P1
P2
24
Kafka Topic A
Kafka Topic B
Processing in Kafka Streams
P1
P2
P1
P2
MyApp.2
MyApp.1
Kafka Topic B
Task2
Task1
25
Kafka Topic A
State State
Processing in Kafka Streams
P1
P2
P1
P2
MyApp.2
MyApp.1
Kafka Topic B
Task2
Task1
25
Kafka Topic A
State State
Processing in Kafka Streams
MyApp.2
MyApp.1
Kafka Topic B
Task2
Task1
26
Kafka Topic A
State State
Processing in Kafka Streams
MyApp.2
MyApp.1
Kafka Topic B
Task2
Task1
26
Kafka Topic A
State State
Processing in Kafka Streams
Kafka Changelog Topic
MyApp.2
MyApp.1
Kafka Topic B
Task2
Task1
26
Kafka Topic A
State State
Processing in Kafka Streams
Kafka Changelog Topic
27
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
Exactly-Once with Kafka
27
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
• Offset commit for source topics
Exactly-Once with Kafka
27
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
• Offset commit for source topics
• State update on processor
Exactly-Once with Kafka
27
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
Kafka Topic A
Kafka Topic B
commit
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
Exactly-Once with Kafka
27
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
Exactly-Once with Kafka
28
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
Exactly-Once with Kafka
29
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
All or Nothing
Exactly-Once with Kafka
30
Exactly-Once with Kafka Streams
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
All or Nothing
30
Exactly-Once with Kafka Streams
• Acked produce to sink topics
• Offset commit for source topics
• State update on processor
31
Exactly-Once with Kafka Streams
• Acked produce to sink topics
• A batch of records sent to the offset topic
• State update on processor
32
• Acked produce to sink topics
• A batch of records sent to the offset topic
Exactly-Once with Kafka Streams
• A batch of records sent to changelog topics
33
Exactly-Once with Kafka Streams
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog topics
34
Exactly-Once with Kafka Streams
All or Nothing
• A batch of records sent to sink topics
• A batch of records sent to the offset topic
• A batch of records sent to changelog topics
35
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
<-
Txn Coordinator
Txn Log Topic
35
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
<-
Txn Coordinator
Txn Log Topic
register txn.id
35
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
<-
Txn Coordinator
Txn Log Topic
register txn.id
35
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
<-
Txn Coordinator
Txn Log Topic
txn.id -> empty ()
register txn.id
36
Exactly-Once with Kafka Streams
Input Topic
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
<-
}
Txn Coordinator
Txn Log Topic
State
txn.id -> empty ()
36
Exactly-Once with Kafka Streams
Input Topic
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
<-
}
Txn Coordinator
Txn Log Topic
State
txn.id -> empty ()
37
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
}
Txn Coordinator
Txn Log Topic
txn.id -> empty ()
37
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
}
Txn Coordinator
Txn Log Topic
txn.id -> empty ()
38
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}
Txn Coordinator
Txn Log Topic
txn.id -> empty ()
38
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}
Txn Coordinator
Txn Log Topic
add partition
txn.id -> empty ()
38
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}
Txn Coordinator
Txn Log Topic
add partition
txn.id -> empty ()
38
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}
Txn Coordinator
Txn Log Topic
txn.id -> ongoing
(output, changelog)
add partition
38
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
}
}
Txn Coordinator
Txn Log Topic
txn.id -> ongoing
(output, changelog)
add partition
39
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
}
Txn Coordinator
Txn Log Topic
txn.id -> ongoing
(output, changelog)
39
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
}
Txn Coordinator
Txn Log Topic
add offsets
txn.id -> ongoing
(output, changelog)
39
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
}
Txn Coordinator
Txn Log Topic
add offsets
txn.id -> ongoing
(output, changelog)
39
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
}
Txn Coordinator
Txn Log Topic
add offsets
txn.id -> ongoing
(output, changelog, offset)
39
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
}
Txn Coordinator
Txn Log Topic
add offsets
txn.id -> ongoing
(output, changelog, offset)
40
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> ongoing
(output, changelog, offset)
40
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
prepare commit
txn.id -> ongoing
(output, changelog, offset)
40
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
prepare commit
txn.id -> ongoing
(output, changelog, offset)
40
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
prepare commit
txn.id -> prepare-commit
(output, changelog, offset)
41
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-commit
(output, changelog, offset)
41
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-commit
(output, changelog, offset)
41
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-commit
(output, changelog, offset)
41
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-commit
(output, changelog, offset)
41
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> complete-commit
(output, changelog, offset)
42
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> ongoing
(output, changelog, offset)
42
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
prepare abort
txn.id -> ongoing
(output, changelog, offset)
42
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
prepare abort
txn.id -> ongoing
(output, changelog, offset)
42
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-abort
(output, changelog, offset)
prepare abort
43
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-abort
(output, changelog, offset)
43
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-abort
(output, changelog, offset)
43
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-abort
(output, changelog, offset)
43
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> prepare-abort
(output, changelog, offset)
43
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> complete-abort
(output, changelog, offset)
44
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> complete-abort
(output, changelog, offset)
44
Exactly-Once with Kafka Streams
Input Topic
State
Process
Streams
Changelog Topic
Output Topic
Offset Topic
producer.initTxn();
try {
producer.beginTxn();
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
producer.send(“output”, ..);
producer.send(“changelog”, ..);
producer.sendOffsets(“input”, ..);
}
producer.commitTxn();
} catch (KafkaException e) {
producer.abortTxn();
}
Txn Coordinator
Txn Log Topic
txn.id -> complete-abort
(output, changelog, offset)
45
State
Process
State
Process
State
Process
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka
46
State
Process
State
Process
State
Process
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka
46
State
Process
State
Process
Exactly-Once with Failures
Kafka
Kafka Streams
Kafka Changelog
Kafka
47
State
Process
State
Process
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
State
48
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
48
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
48
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
49
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
49
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
49
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
49
State
Process
State
Process
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
Exactly-Once with Failures
50
What about Completeness?
• Option 1: defer emitting output and committing txn
• Effectively coupling completeness with consistency
• Increased end-to-end processing latency
• Option 2: emitting output early when possible
• Not try to prevent incompleteness via coordination
• Instead, compensate when out-of-order data happens
50
What about Completeness?
• Option 1: defer emitting output and committing txn
• Effectively coupling completeness with consistency
• Increased end-to-end processing latency
• Option 2: emitting output early when possible
• Not try to prevent incompleteness via coordination
• Instead, compensate when out-of-order data happens
51
2
2
3
3
4
4
Remember the Logs
• upstream-downstream communication can be replayed
• Emitted records are naturally ordered by offsets
52
Ordering and Monotonicity
• Stateless (filter, mapValues)
• Order-agnostic: no need to block on emitting
• Stateful (join, aggregate)
• Order-sensitive: current results depend on history
• Whether block emitting results depend on output types
KStream = interprets data as record stream
~ think: “append-only”
KTable = data as changelog stream
~ continuously updated materialized view
53
54
alice eggs bob bread alice milk
alice lnkd bob googl alice msft
User purchase history
User employment profile
KStream
KTable
55
alice eggs bob bread alice milk
alice lnkd bob googl alice msft
User purchase history
User employment profile
time
“Alice bought eggs.”
“Alice is now at LinkedIn.”
KStream
KTable
56
alice eggs bob bread alice milk
alice lnkd bob googl alice msft
User purchase history
User employment profile
time
“Alice bought eggs and milk.”
“Alice is now at LinkedIn
Microsoft.”
KStream
KTable
57
time
(Alice: A/null)
alice 10
alice A alice B
“do not emit”
KStream.leftJoin(KStream)
-> KStream
KTable.leftJoin(KTable)
-> KTable
58
time
alice 10
alice A alice B
(Alice: A/null)
(Alice: A/10)
“do not emit”
(Alice: A/10)
KStream.leftJoin(KStream)
-> KStream
KTable.leftJoin(KTable)
-> KTable
59
time
alice 10
alice A alice B
(Alice: A/null)
(Alice: A/10)
(Alice: B/10)
“do not emit”
(Alice: A/10)
(Alice: B/10)
KStream.leftJoin(KStream)
-> KStream
KTable.leftJoin(KTable)
-> KTable
60
time
(Alice: null)
(Alice: null)
alice A/null A/10 alice B/10
alice
KStream.leftJoin(KStream)
-> KStream
.aggregate()
-> KTable
KTable.leftJoin(KTable)
-> KTable
.aggregate()
-> KTable
61
time
(Alice: null 10)
(Alice: null+10)
alice A/null A/10 alice B/10
alice
KStream.leftJoin(KStream)
-> KStream
.aggregate()
-> KTable
KTable.leftJoin(KTable)
-> KTable
.aggregate()
-> KTable
62
time
(Alice: 10 10)
(Alice: null+10+10)
KStream.leftJoin(KStream)
-> KStream
.aggregate()
-> KTable
KTable.leftJoin(KTable)
-> KTable
.aggregate()
-> KTable
alice A/null A/10 alice B/10
alice
62
time
(Alice: 10 10)
(Alice: null+10+10)
KStream.leftJoin(KStream)
-> KStream
.aggregate()
-> KTable
KTable.leftJoin(KTable)
-> KTable
.aggregate()
-> KTable
alice A/null A/10 alice B/10
alice
[BIRTE 2015]
63
Use Case: Bloomberg Real-time Pricing
64
Use Case: Bloomberg Real-time Pricing
64
Use Case: Bloomberg Real-time Pricing
64
Use Case: Bloomberg Real-time Pricing
• One billion plus market events / day
• 160 cores / 2TB RAM deployed on k8s
• Exactly-once for market data stateful stream processing
• Apache Kafka: persistent logs to achieve correctness
65
Take-aways
• Apache Kafka: persistent logs to achieve correctness
• Transactional log appends for exactly-once
66
Take-aways
• Apache Kafka: persistent logs to achieve correctness
• Transactional log appends for exactly-once
• Non-blocking output with revisions to handle out-of-order data
67
Take-aways
• Apache Kafka: persistent logs to achieve correctness
• Transactional log appends for exactly-once
• Non-blocking output with revisions to handle out-of-order data
68
Take-aways
Guozhang Wang | guozhang@confluent.io | @guozhangwang
Read the full paper at: https://cnfl.io/sigmod
• Apache Kafka: persistent logs to achieve correctness
• Transactional log appends for exactly-once
• Non-blocking output with revisions to handle out-of-order data
68
Take-aways
THANKS!
Guozhang Wang | guozhang@confluent.io | @guozhangwang
Read the full paper at: https://cnfl.io/sigmod
69
BACKUP SLIDES
Ongoing Work (3.0+)
• Scalability improvements
• Consistent state query serving
• Further reduce end-to-end latency
• … and more
70

More Related Content

What's hot

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®confluent
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...HostedbyConfluent
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkFlink Forward
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database Systemconfluent
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumTathastu.ai
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeperSaurav Haloi
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Storesconfluent
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...confluent
 

What's hot (20)

Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
The Flux Capacitor of Kafka Streams and ksqlDB (Matthias J. Sax, Confluent) K...
 
Changelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache FlinkChangelog Stream Processing with Apache Flink
Changelog Stream Processing with Apache Flink
 
kafka
kafkakafka
kafka
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
 

Similar to Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka

Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017confluent
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streamsconfluent
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsGuozhang Wang
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandRan Silberman
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...HostedbyConfluent
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a ServiceSteven Wu
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafkaconfluent
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020confluent
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)Erhwen Kuo
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProChester Chen
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaQAware GmbH
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...confluent
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformGuido Schmutz
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...StreamNative
 

Similar to Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka (20)

Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka StreamsKafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
Kafka Summit SF 2017 - Exactly-once Stream Processing with Kafka Streams
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
From a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised LandFrom a kafkaesque story to The Promised Land
From a kafkaesque story to The Promised Land
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
Learnings From Shipping 1000+ Streaming Data Pipelines To Production with Hak...
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Building Stream Processing as a Service
Building Stream Processing as a ServiceBuilding Stream Processing as a Service
Building Stream Processing as a Service
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)TDEA 2018 Kafka EOS (Exactly-once)
TDEA 2018 Kafka EOS (Exactly-once)
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
SFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a ProSFBigAnalytics_20190724: Monitor kafka like a Pro
SFBigAnalytics_20190724: Monitor kafka like a Pro
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Concepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with KafkaConcepts and Patterns for Streaming Services with Kafka
Concepts and Patterns for Streaming Services with Kafka
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 

More from Guozhang Wang

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfGuozhang Wang
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Guozhang Wang
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaGuozhang Wang
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsGuozhang Wang
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedGuozhang Wang
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingGuozhang Wang
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingGuozhang Wang
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaGuozhang Wang
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaGuozhang Wang
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedInGuozhang Wang
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduceGuozhang Wang
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsGuozhang Wang
 

More from Guozhang Wang (12)

Consensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
 
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
Exactly-Once Made Easy: Transactional Messaging Improvement for Usability and...
 
Introduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of KafkaIntroduction to the Incremental Cooperative Protocol of Kafka
Introduction to the Incremental Cooperative Protocol of Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache KafkaBuilding Stream Infrastructure across Multiple Data Centers with Apache Kafka
Building Stream Infrastructure across Multiple Data Centers with Apache Kafka
 
Building a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache KafkaBuilding a Replicated Logging System with Apache Kafka
Building a Replicated Logging System with Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Behavioral Simulations in MapReduce
Behavioral Simulations in MapReduceBehavioral Simulations in MapReduce
Behavioral Simulations in MapReduce
 
Automatic Scaling Iterative Computations
Automatic Scaling Iterative ComputationsAutomatic Scaling Iterative Computations
Automatic Scaling Iterative Computations
 

Recently uploaded

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.eptoze12
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxk795866
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)dollysharma2066
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfme23b1001
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEroselinkalist12
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...121011101441
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 

Recently uploaded (20)

Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.Oxy acetylene welding presentation note.
Oxy acetylene welding presentation note.
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Introduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptxIntroduction-To-Agricultural-Surveillance-Rover.pptx
Introduction-To-Agricultural-Surveillance-Rover.pptx
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
Call Us ≽ 8377877756 ≼ Call Girls In Shastri Nagar (Delhi)
 
Electronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdfElectronically Controlled suspensions system .pdf
Electronically Controlled suspensions system .pdf
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETEINFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
INFLUENCE OF NANOSILICA ON THE PROPERTIES OF CONCRETE
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...Instrumentation, measurement and control of bio process parameters ( Temperat...
Instrumentation, measurement and control of bio process parameters ( Temperat...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 

Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka

  • 1. Guozhang Wang Lei Chen Ayusman Dikshit Jason Gustafson Boyang Chen Matthias J. Sax John Roesler Sophie Blee-Goldman Bruno Cadonna Apurva Mehta Varun Madan Jun Rao Consistency and Completeness Rethinking Distributed Stream Processing in Apache Kafka
  • 2. Outline • Stream processing (with Kafka): correctness challenges • Exactly-once consistency with failures • Completeness with out-of-order data • Use case and conclusion 2
  • 3. 3 Stream Processing • A different programming paradigm • .. that brings computation to unbounded data • .. but not necessarily transient, approximate, or lossy
  • 4. 4 • Persistent Buffering • Logical Ordering • Scalable “source-of-truth” Kafka: Streaming Platform
  • 5. Kafka Concepts: the Log 4 5 5 7 8 9 10 11 12 ... Producer Write Consumer1 Reads (offset 7) Consumer2 Reads (offset 10) Messages 3 5
  • 10. 8 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Your App Stream Processing with Kafka
  • 11. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Your App Stream Processing with Kafka
  • 12. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Your App Stream Processing with Kafka
  • 13. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Your App Stream Processing with Kafka
  • 14. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Your App Stream Processing with Kafka
  • 15. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack Your App Stream Processing with Kafka
  • 16. 9 Process State Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit Your App Stream Processing with Kafka
  • 17. 10 Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Process Streams App Duplicate Results on Failures
  • 18. 10 Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack State Process Streams App Duplicate Results on Failures
  • 19. 10 Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App Duplicate Results on Failures
  • 20. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Streams App Duplicate Results on Failures
  • 21. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Streams App Duplicate Results on Failures
  • 22. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D Streams App Duplicate Results on Failures
  • 23. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App Duplicate Results on Failures
  • 24. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App Duplicate Results on Failures
  • 25. 11 State Process Kafka Topic A Kafka Topic B Kafka Topic C Kafka Topic D State Streams App Duplicate Results on Failures
  • 28. 12 2 2 3 3 4 4 Duplicates propagated downstream and emitted externally Duplicate Results on Failures
  • 29. 13 Challenge #1: Consistency (a.k.a. exactly-once) An correctness guarantee for stream processing, .. that for each received record, .. its process results will be reflected exactly once, .. even under failures
  • 30. It’s all about Time • Event-time (when a record is created) • Processing-time (when a record is processed) 14
  • 31. Event-time 4 5 6 1 2 3 7 8 Processing-time 1977 1980 1983 1999 2002 2005 2015 2017 15 P H A N T O M M E N A C E A T T A C K O F T H E C L O N E S R E V E N G E O F T H E S I T H A N E W H O P E T H E E M P I R E S T R I K E S B A C K R E T U R N O F T H E J E D I T H E F O R C E A W A K E N S Out-of-Order T H E L A S T J E D I
  • 32. Event-time 4 5 6 1 2 3 7 8 Processing-time 1977 1980 1983 1999 2002 2005 2015 2017 15 P H A N T O M M E N A C E A T T A C K O F T H E C L O N E S R E V E N G E O F T H E S I T H A N E W H O P E T H E E M P I R E S T R I K E S B A C K R E T U R N O F T H E J E D I T H E F O R C E A W A K E N S Out-of-Order T H E L A S T J E D I
  • 33. Event-time 4 5 6 1 2 3 7 8 Processing-time 1977 1980 1983 1999 2002 2005 2015 2017 15 P H A N T O M M E N A C E A T T A C K O F T H E C L O N E S R E V E N G E O F T H E S I T H A N E W H O P E T H E E M P I R E S T R I K E S B A C K R E T U R N O F T H E J E D I T H E F O R C E A W A K E N S Out-of-Order T H E L A S T J E D I Incomplete results produced due to time skewness
  • 34. 16 Challenge #2: Completeness (with out-of-order data) An correctness guarantee for stream processing, .. that even with out-of-order data streams, .. incomplete results would not be delivered
  • 35. 17 Blocking + Checkpointing One stone to kill all birds? • Block processing and result emitting until complete • Hard trade-offs between latency and correctness • Depend on global blocking markers
  • 36. 18 A Log-based Approach: • Leverage on persistent, immutable, ordered logs • Decouple consistency and completeness handling
  • 37. Kafka Streams • New client library beyond producer and consumer • Powerful yet easy-to-use • Event time, stateful processing • Out-of-order handling • Highly scalable, distributed, fault tolerant • and more.. 19
  • 39. Streams DSL and KSQL 21 CREATE STREAM fraudulent_payments AS SELECT * FROM payments WHERE fraudProbability > 0.8; val fraudulentPayments: KStream[String, Payment] = builder .stream[String, Payment](“payments-kafka-topic”) .filter((_ ,payment) => payment.fraudProbability > 0.8) fraudulentPayments.to(“fraudulent-payments-topic”) [EDBT 2019]
  • 40. 22 Kafka Topic B Kafka Topic A P1 P2 P1 P2 Processing in Kafka Streams
  • 41. 23 Kafka Topic B Kafka Topic A Processor Topology P1 P2 P1 P2 Processing in Kafka Streams
  • 42. 24 Kafka Topic A Kafka Topic B Processing in Kafka Streams P1 P2 P1 P2
  • 43. 24 Kafka Topic A Kafka Topic B Processing in Kafka Streams P1 P2 P1 P2
  • 44. MyApp.2 MyApp.1 Kafka Topic B Task2 Task1 25 Kafka Topic A State State Processing in Kafka Streams P1 P2 P1 P2
  • 45. MyApp.2 MyApp.1 Kafka Topic B Task2 Task1 25 Kafka Topic A State State Processing in Kafka Streams
  • 46. MyApp.2 MyApp.1 Kafka Topic B Task2 Task1 26 Kafka Topic A State State Processing in Kafka Streams
  • 47. MyApp.2 MyApp.1 Kafka Topic B Task2 Task1 26 Kafka Topic A State State Processing in Kafka Streams Kafka Changelog Topic
  • 48. MyApp.2 MyApp.1 Kafka Topic B Task2 Task1 26 Kafka Topic A State State Processing in Kafka Streams Kafka Changelog Topic
  • 49. 27 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit Exactly-Once with Kafka
  • 50. 27 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit • Offset commit for source topics Exactly-Once with Kafka
  • 51. 27 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit • Offset commit for source topics • State update on processor Exactly-Once with Kafka
  • 52. 27 Process State Kafka Topic C Kafka Topic D ack ack Kafka Topic A Kafka Topic B commit • Acked produce to sink topics • Offset commit for source topics • State update on processor Exactly-Once with Kafka
  • 53. 27 • Acked produce to sink topics • Offset commit for source topics • State update on processor Exactly-Once with Kafka
  • 54. 28 • Acked produce to sink topics • Offset commit for source topics • State update on processor Exactly-Once with Kafka
  • 55. 29 • Acked produce to sink topics • Offset commit for source topics • State update on processor All or Nothing Exactly-Once with Kafka
  • 56. 30 Exactly-Once with Kafka Streams • Acked produce to sink topics • Offset commit for source topics • State update on processor All or Nothing
  • 57. 30 Exactly-Once with Kafka Streams • Acked produce to sink topics • Offset commit for source topics • State update on processor
  • 58. 31 Exactly-Once with Kafka Streams • Acked produce to sink topics • A batch of records sent to the offset topic • State update on processor
  • 59. 32 • Acked produce to sink topics • A batch of records sent to the offset topic Exactly-Once with Kafka Streams • A batch of records sent to changelog topics
  • 60. 33 Exactly-Once with Kafka Streams • A batch of records sent to sink topics • A batch of records sent to the offset topic • A batch of records sent to changelog topics
  • 61. 34 Exactly-Once with Kafka Streams All or Nothing • A batch of records sent to sink topics • A batch of records sent to the offset topic • A batch of records sent to changelog topics
  • 62. 35 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); <- Txn Coordinator Txn Log Topic
  • 63. 35 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); <- Txn Coordinator Txn Log Topic register txn.id
  • 64. 35 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); <- Txn Coordinator Txn Log Topic register txn.id
  • 65. 35 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); <- Txn Coordinator Txn Log Topic txn.id -> empty () register txn.id
  • 66. 36 Exactly-Once with Kafka Streams Input Topic Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); <- } Txn Coordinator Txn Log Topic State txn.id -> empty ()
  • 67. 36 Exactly-Once with Kafka Streams Input Topic Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); <- } Txn Coordinator Txn Log Topic State txn.id -> empty ()
  • 68. 37 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. } } Txn Coordinator Txn Log Topic txn.id -> empty ()
  • 69. 37 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. } } Txn Coordinator Txn Log Topic txn.id -> empty ()
  • 70. 38 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } Txn Coordinator Txn Log Topic txn.id -> empty ()
  • 71. 38 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } Txn Coordinator Txn Log Topic add partition txn.id -> empty ()
  • 72. 38 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } Txn Coordinator Txn Log Topic add partition txn.id -> empty ()
  • 73. 38 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } Txn Coordinator Txn Log Topic txn.id -> ongoing (output, changelog) add partition
  • 74. 38 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); } } Txn Coordinator Txn Log Topic txn.id -> ongoing (output, changelog) add partition
  • 75. 39 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } Txn Coordinator Txn Log Topic txn.id -> ongoing (output, changelog)
  • 76. 39 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } Txn Coordinator Txn Log Topic add offsets txn.id -> ongoing (output, changelog)
  • 77. 39 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } Txn Coordinator Txn Log Topic add offsets txn.id -> ongoing (output, changelog)
  • 78. 39 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } Txn Coordinator Txn Log Topic add offsets txn.id -> ongoing (output, changelog, offset)
  • 79. 39 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } } Txn Coordinator Txn Log Topic add offsets txn.id -> ongoing (output, changelog, offset)
  • 80. 40 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> ongoing (output, changelog, offset)
  • 81. 40 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic prepare commit txn.id -> ongoing (output, changelog, offset)
  • 82. 40 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic prepare commit txn.id -> ongoing (output, changelog, offset)
  • 83. 40 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic prepare commit txn.id -> prepare-commit (output, changelog, offset)
  • 84. 41 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-commit (output, changelog, offset)
  • 85. 41 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-commit (output, changelog, offset)
  • 86. 41 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-commit (output, changelog, offset)
  • 87. 41 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-commit (output, changelog, offset)
  • 88. 41 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } Txn Coordinator Txn Log Topic txn.id -> complete-commit (output, changelog, offset)
  • 89. 42 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> ongoing (output, changelog, offset)
  • 90. 42 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic prepare abort txn.id -> ongoing (output, changelog, offset)
  • 91. 42 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic prepare abort txn.id -> ongoing (output, changelog, offset)
  • 92. 42 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-abort (output, changelog, offset) prepare abort
  • 93. 43 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-abort (output, changelog, offset)
  • 94. 43 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-abort (output, changelog, offset)
  • 95. 43 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-abort (output, changelog, offset)
  • 96. 43 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> prepare-abort (output, changelog, offset)
  • 97. 43 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> complete-abort (output, changelog, offset)
  • 98. 44 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> complete-abort (output, changelog, offset)
  • 99. 44 Exactly-Once with Kafka Streams Input Topic State Process Streams Changelog Topic Output Topic Offset Topic producer.initTxn(); try { producer.beginTxn(); recs = consumer.poll(); for (Record rec <- recs) { // process .. producer.send(“output”, ..); producer.send(“changelog”, ..); producer.sendOffsets(“input”, ..); } producer.commitTxn(); } catch (KafkaException e) { producer.abortTxn(); } Txn Coordinator Txn Log Topic txn.id -> complete-abort (output, changelog, offset)
  • 111. 50 What about Completeness? • Option 1: defer emitting output and committing txn • Effectively coupling completeness with consistency • Increased end-to-end processing latency • Option 2: emitting output early when possible • Not try to prevent incompleteness via coordination • Instead, compensate when out-of-order data happens
  • 112. 50 What about Completeness? • Option 1: defer emitting output and committing txn • Effectively coupling completeness with consistency • Increased end-to-end processing latency • Option 2: emitting output early when possible • Not try to prevent incompleteness via coordination • Instead, compensate when out-of-order data happens
  • 113. 51 2 2 3 3 4 4 Remember the Logs • upstream-downstream communication can be replayed • Emitted records are naturally ordered by offsets
  • 114. 52 Ordering and Monotonicity • Stateless (filter, mapValues) • Order-agnostic: no need to block on emitting • Stateful (join, aggregate) • Order-sensitive: current results depend on history • Whether block emitting results depend on output types
  • 115. KStream = interprets data as record stream ~ think: “append-only” KTable = data as changelog stream ~ continuously updated materialized view 53
  • 116. 54 alice eggs bob bread alice milk alice lnkd bob googl alice msft User purchase history User employment profile KStream KTable
  • 117. 55 alice eggs bob bread alice milk alice lnkd bob googl alice msft User purchase history User employment profile time “Alice bought eggs.” “Alice is now at LinkedIn.” KStream KTable
  • 118. 56 alice eggs bob bread alice milk alice lnkd bob googl alice msft User purchase history User employment profile time “Alice bought eggs and milk.” “Alice is now at LinkedIn Microsoft.” KStream KTable
  • 119. 57 time (Alice: A/null) alice 10 alice A alice B “do not emit” KStream.leftJoin(KStream) -> KStream KTable.leftJoin(KTable) -> KTable
  • 120. 58 time alice 10 alice A alice B (Alice: A/null) (Alice: A/10) “do not emit” (Alice: A/10) KStream.leftJoin(KStream) -> KStream KTable.leftJoin(KTable) -> KTable
  • 121. 59 time alice 10 alice A alice B (Alice: A/null) (Alice: A/10) (Alice: B/10) “do not emit” (Alice: A/10) (Alice: B/10) KStream.leftJoin(KStream) -> KStream KTable.leftJoin(KTable) -> KTable
  • 122. 60 time (Alice: null) (Alice: null) alice A/null A/10 alice B/10 alice KStream.leftJoin(KStream) -> KStream .aggregate() -> KTable KTable.leftJoin(KTable) -> KTable .aggregate() -> KTable
  • 123. 61 time (Alice: null 10) (Alice: null+10) alice A/null A/10 alice B/10 alice KStream.leftJoin(KStream) -> KStream .aggregate() -> KTable KTable.leftJoin(KTable) -> KTable .aggregate() -> KTable
  • 124. 62 time (Alice: 10 10) (Alice: null+10+10) KStream.leftJoin(KStream) -> KStream .aggregate() -> KTable KTable.leftJoin(KTable) -> KTable .aggregate() -> KTable alice A/null A/10 alice B/10 alice
  • 125. 62 time (Alice: 10 10) (Alice: null+10+10) KStream.leftJoin(KStream) -> KStream .aggregate() -> KTable KTable.leftJoin(KTable) -> KTable .aggregate() -> KTable alice A/null A/10 alice B/10 alice [BIRTE 2015]
  • 126. 63 Use Case: Bloomberg Real-time Pricing
  • 127. 64 Use Case: Bloomberg Real-time Pricing
  • 128. 64 Use Case: Bloomberg Real-time Pricing
  • 129. 64 Use Case: Bloomberg Real-time Pricing • One billion plus market events / day • 160 cores / 2TB RAM deployed on k8s • Exactly-once for market data stateful stream processing
  • 130. • Apache Kafka: persistent logs to achieve correctness 65 Take-aways
  • 131. • Apache Kafka: persistent logs to achieve correctness • Transactional log appends for exactly-once 66 Take-aways
  • 132. • Apache Kafka: persistent logs to achieve correctness • Transactional log appends for exactly-once • Non-blocking output with revisions to handle out-of-order data 67 Take-aways
  • 133. • Apache Kafka: persistent logs to achieve correctness • Transactional log appends for exactly-once • Non-blocking output with revisions to handle out-of-order data 68 Take-aways Guozhang Wang | guozhang@confluent.io | @guozhangwang Read the full paper at: https://cnfl.io/sigmod
  • 134. • Apache Kafka: persistent logs to achieve correctness • Transactional log appends for exactly-once • Non-blocking output with revisions to handle out-of-order data 68 Take-aways THANKS! Guozhang Wang | guozhang@confluent.io | @guozhangwang Read the full paper at: https://cnfl.io/sigmod
  • 136. Ongoing Work (3.0+) • Scalability improvements • Consistent state query serving • Further reduce end-to-end latency • … and more 70