5. 5
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
6. 6
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
7. 7
Exactly-Once
• An application property for stream processing,
• .. that for each received record,
• .. it will be processed exactly once,
• .. even under failures
8. 8
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
9. 9
Error Scenario #1: Duplicate Write
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
10. 10
Error Scenario #1: Duplicate Write
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
11. 11
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
12. 12
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
13. 13
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
14. 14
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
15. 15
Error Scenario #3: Data loss
State
Process
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
16. 16
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
17. 17
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
18. 18
Exactly-Once does NOT mean..
• Two Generals problem can now be solved
• .. or FLP result is proved wrong
• .. or TCP at transport level is “perfect”
• .. or you can get distributed consensus in any settings
19. 19
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
20. 20
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
99.9%
0.01%
21. 21
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
99.9%
0.01%
Can we do better for the 99.99% ?
35. 35
Exactly-Once Processing with Kafka
• Offset commit for source topics
• Value update on processor state
• Acked produce to sink topics
All or Nothing
36. 36
Kafka Streams (0.10+)
• New client library besides producer and consumer
• Powerful yet easy-to-use
• Event-at-a-time, Stateful
• Windowing with out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
39. 39
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
40. 40
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
41. 41
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
42. 42
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
43. 43
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
56. 56
• All or Nothing for the following:
• Offset commit for source topics
• Value update on processor state
• Acked produce to sink topics
57. 57
Exactly-Once with Kafka Streams (0.11+)
• Process data in transactions of:
• A batch of input records from source topics
• A batch of output records to changelog topics
• A batch of output records to sink topics
config: processing.mode = exactly-once (default = at-least-once)