SlideShare a Scribd company logo
1
Exactly-once Data Processing
with Kafka Streams
Guozhang Wang
Kafka Meetup SF, July 27, 2017
2
Outline
• What is exactly-once for stream processing?
• How to achieve exactly-once with Kafka?
• Kafka Streams: exactly-once made easy
3
4
Stream Processing
Source SinkProcess
State
Source Sink
5
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
6
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
7
Exactly-Once
• An application property for stream processing,
• .. that for each received record,
• .. it will be processed exactly once,
• .. even under failures
8
Stream Processing with Kafka
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
9
Error Scenario #1: Duplicate Write
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
10
Error Scenario #1: Duplicate Write
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
11
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
commit
ack
ack
12
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
13
Error Scenario #2: Re-process
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
14
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
15
Error Scenario #3: Data loss
State
Process
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
16
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
17
Error Scenario #3: Data loss
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
18
Exactly-Once does NOT mean..
• Two Generals problem can now be solved
• .. or FLP result is proved wrong
• .. or TCP at transport level is “perfect”
• .. or you can get distributed consensus in any settings
19
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
20
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
99.9%
0.01%
21
What can cause incorrect results?
• Unbounded network partition (algorithmical proof)
• A long GC or hard crash
• A bad config in your system
• A human operating error
• A bug in your code
99.9%
0.01%
Can we do better for the 99.99% ?
22
So how to achieve Exactly-Once?
23
Option #1: “Just give up”
Streaming
Source Sink
Batch
State
State
24
Option #2: At-least-once + Dedup
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
25
Option #2: At-least-once + Dedup
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
26
Option #2: At-least-once + Dedup
Process
State
KafkaTopic A
Kafka Topic B
Kafka Topic C
Kafka Topic D
ack
ack
commit
27
Option #2: At-least-once + Dedup
2
2
3
3
4
4
Dedup
28
Option #3: The Kafka Way!(0.11+)
• Idempotent producer: send exactly-once per partition
• Transactional messaging: multiple-sends atomically
29
Idempotent Producer
Producer
Kafka Topic Cack
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28
30
Idempotent Producer
Producer
Kafka Topic Cack
pid = 1
pid = 1
seq = 28
pid = 1
seq = 28
config: enable.idempotence = true
31
Atomic Multi-Sends (aka. “transactions”)
Producer
Kafka Topic C
Kafka Topic D
producer.beginTxn();
producer.send(rec1); // topic C
producer.send(rec2); // topic D
producer.sendOffsetsToTxn(A, 10);
KafkaTopic A
producer.commitTxn();
try {
} catch (KafkaException e) {
}
Atomic
Commit
32
Atomic Multi-Sends (aka. “transactions”)
Producer
Kafka Topic C
Kafka Topic D
producer.beginTxn();
producer.send(rec1); // topic C
producer.send(rec2); // topic D
producer.sendOffsetsToTxn(A, 10);
KafkaTopic A
producer.commitTxn();
try {
} catch (KafkaException e) {
}
Atomic
Commit
producer.abortTxn();
33
Atomic Multi-Sends (aka. “transactions”)
Consumer
Kafka Topic C
Kafka Topic D
Read
Committed
consumer.subscribe(C, D);
recs = consumer.poll();
for (Record rec <- recs) {
// process ..
}
config: isolation.level = read_committed (default = read_uncommitted)
34
Exactly-Once Processing with Kafka
Process
State
Kafka Topic C
Kafka Topic D
ack
ack
KafkaTopic A
Kafka Topic B
commit
35
Exactly-Once Processing with Kafka
• Offset commit for source topics
• Value update on processor state
• Acked produce to sink topics
All or Nothing
36
Kafka Streams (0.10+)
• New client library besides producer and consumer
• Powerful yet easy-to-use
• Event-at-a-time, Stateful
• Windowing with out-of-order handling
• Highly scalable, distributed, fault tolerant
• and more..
37
Anywhere, anytime
Ok. Ok. Ok. Ok.
38
Anywhere, anytime
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-streams</artifactId>
<version>0.10.0.0</version>
</dependency>
39
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
40
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
41
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
42
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
43
Kafka Streams DSL
public static void main(String[] args) {
// specify the processing topology by first reading in a stream from a topic
KStream<String, String> words = builder.stream(”topic1”);
// count the words in this stream as an aggregated table
KTable<String, Long> counts = words.countByKey(”Counts”);
// write the result table to a new topic
counts.to(”topic2”);
// create a stream processing instance and start running it
KafkaStreams streams = new KafkaStreams(builder, config);
streams.start();
}
44
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
45
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
46
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
47
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
48
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
49
Processor Topology
KStream<..> stream1 = builder.stream(”topic3”);
KStream<..> stream2 = builder.stream(”topic3”);
KStream<..> joined = stream1.leftJoin(stream2, ...);
KTable<..> aggregated = joined.aggregateByKey(...);
aggregated.to(”topic3”);
State
50
Processing in Kafka Streams
Kafka Topic B Kafka Topic A
51
Processing in Kafka Streams
Kafka Topic B Kafka Topic A
Processor Topology
P1
P2
P1
P2
52
Processing in Kafka Streams
Kafka Topic AKafka Topic B
53
Processing in Kafka Streams
Kafka Topic B Kafka Topic A
MyApp.1 MyApp.2
Task2Task1
54
States in Stream Processing
MyApp.2MyApp.1
Kafka Topic B
Task2Task1
Kafka Topic A
State State
55
Fault Tolerance in Streams
StateProcess
StateProcess
StateProcess
Kafka
Kafka Streams
Kafka
Kafka Changelog
56
• All or Nothing for the following:
• Offset commit for source topics
• Value update on processor state
• Acked produce to sink topics
57
Exactly-Once with Kafka Streams (0.11+)
• Process data in transactions of:
• A batch of input records from source topics
• A batch of output records to changelog topics
• A batch of output records to sink topics
config: processing.mode = exactly-once (default = at-least-once)
58
Exactly-Once with Failures
State
Process
StateProcess
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
59
Exactly-Once with Failures
State
Process
StateProcess
State
Process
Kafka
Kafka Streams
Kafka Changelog
Kafka
60
Exactly-Once with Failures
StateProcess
StateProcess
StateProcess
Kafka Streams
Kafka
Kafka Changelog
Kafka
61
Exactly-Once with Failures
StateProcess
StateProcess
StateProcess
Kafka Streams
Kafka
Kafka Changelog
Kafka
62
Exactly-Once life is goooood~
63
What if not all my data is in Kafka?
64
65
Connectors
• 40+ since first release this
Feb (0.9+)
• 13 from &
partners
66
End-to-End Exactly-Once
67
Take-aways
• Exactly-once: important property for stream processing
• Kafka Streams: exactly-once made easy
Join Kafka Summit 2017 SF (discount code available!)
Additional resources:
http://www.confluent.io/resources
Guozhang Wang | guozhang@confluent.io | @guozhangwang
68
Thank You!
Guozhang Wang
Kafka Meetup SF, July 27, 2017

More Related Content

What's hot

Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Guozhang Wang
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
confluent
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
confluent
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
confluent
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
confluent
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
Bjoern Rost
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
Yaroslav Tkachenko
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
confluent
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
confluent
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
Guozhang Wang
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
HostedbyConfluent
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
confluent
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
confluent
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
Yaroslav Tkachenko
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
confluent
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
confluent
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPI
confluent
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
confluent
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
confluent
 

What's hot (20)

Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
What's the time? ...and why? (Mattias Sax, Confluent) Kafka Summit SF 2019
 
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
Production Ready Kafka on Kubernetes (Devandra Tagare, Lyft) Kafka Summit SF ...
 
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQLSteps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
Steps to Building a Streaming ETL Pipeline with Apache Kafka® and KSQL
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Streaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQLStreaming ETL - from RDBMS to Dashboard with KSQL
Streaming ETL - from RDBMS to Dashboard with KSQL
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 
Introducing Kafka's Streams API
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Apache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know AboutApache Kafka: New Features That You Might Not Know About
Apache Kafka: New Features That You Might Not Know About
 
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
Real-world Streaming Architectures
Real-world Streaming ArchitecturesReal-world Streaming Architectures
Real-world Streaming Architectures
 
Intro to AsyncAPI
Intro to AsyncAPIIntro to AsyncAPI
Intro to AsyncAPI
 
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
Kafka Summit NYC 2017 - The Best Thing Since Partitioned Bread
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 

Similar to Exactly-once Data Processing with Kafka Streams - July 27, 2017

Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
Guozhang Wang
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
Joe Kutner
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
HostedbyConfluent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
confluent
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
confluent
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
Guozhang Wang
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
inside-BigData.com
 
Improving Streams Scalability with Transactional StateStores (KIP-892)
Improving Streams Scalability with Transactional StateStores (KIP-892)Improving Streams Scalability with Transactional StateStores (KIP-892)
Improving Streams Scalability with Transactional StateStores (KIP-892)
HostedbyConfluent
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
confluent
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
Cliff Gilmore
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
HostedbyConfluent
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
confluent
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
confluent
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Guido Schmutz
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
LivePerson
 
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking ToolkitConnect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
HostedbyConfluent
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer
mcwilson1
 

Similar to Exactly-once Data Processing with Kafka Streams - July 27, 2017 (20)

Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
I can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and SpringI can't believe it's not a queue: Kafka and Spring
I can't believe it's not a queue: Kafka and Spring
 
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka StreamsStreams Don't Fail Me Now - Robustness Features in Kafka Streams
Streams Don't Fail Me Now - Robustness Features in Kafka Streams
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
Improving Streams Scalability with Transactional StateStores (KIP-892)
Improving Streams Scalability with Transactional StateStores (KIP-892)Improving Streams Scalability with Transactional StateStores (KIP-892)
Improving Streams Scalability with Transactional StateStores (KIP-892)
 
How to Build an Apache Kafka® Connector
How to Build an Apache Kafka® ConnectorHow to Build an Apache Kafka® Connector
How to Build an Apache Kafka® Connector
 
Chicago Kafka Meetup
Chicago Kafka MeetupChicago Kafka Meetup
Chicago Kafka Meetup
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and KubernetesDeploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
Spark (Structured) Streaming vs. Kafka Streams - two stream processing platfo...
 
From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking ToolkitConnect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
Connect, Test, Optimize: The Ultimate Kafka Connector Benchmarking Toolkit
 
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Concurrency at the Database Layer
Concurrency at the Database Layer Concurrency at the Database Layer
Concurrency at the Database Layer
 

More from confluent

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
confluent
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
confluent
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
confluent
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
confluent
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
confluent
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
confluent
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
confluent
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
confluent
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
confluent
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
confluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
confluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
confluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
confluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
confluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
confluent
 

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
ShamsuddeenMuhammadA
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
QuickwayInfoSystems3
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 

Recently uploaded (20)

Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptxText-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
Text-Summarization-of-Breaking-News-Using-Fine-tuning-BART-Model.pptx
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Enterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptxEnterprise Software Development with No Code Solutions.pptx
Enterprise Software Development with No Code Solutions.pptx
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 

Exactly-once Data Processing with Kafka Streams - July 27, 2017

  • 1. 1 Exactly-once Data Processing with Kafka Streams Guozhang Wang Kafka Meetup SF, July 27, 2017
  • 2. 2 Outline • What is exactly-once for stream processing? • How to achieve exactly-once with Kafka? • Kafka Streams: exactly-once made easy
  • 3. 3
  • 5. 5 Stream Processing with Kafka Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D
  • 6. 6 Stream Processing with Kafka Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 7. 7 Exactly-Once • An application property for stream processing, • .. that for each received record, • .. it will be processed exactly once, • .. even under failures
  • 8. 8 Stream Processing with Kafka Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 9. 9 Error Scenario #1: Duplicate Write Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack
  • 10. 10 Error Scenario #1: Duplicate Write Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack
  • 11. 11 Error Scenario #2: Re-process Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D commit ack ack
  • 12. 12 Error Scenario #2: Re-process Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D
  • 13. 13 Error Scenario #2: Re-process Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D
  • 14. 14 Error Scenario #3: Data loss Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 15. 15 Error Scenario #3: Data loss State Process KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 16. 16 Error Scenario #3: Data loss Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack
  • 17. 17 Error Scenario #3: Data loss Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack
  • 18. 18 Exactly-Once does NOT mean.. • Two Generals problem can now be solved • .. or FLP result is proved wrong • .. or TCP at transport level is “perfect” • .. or you can get distributed consensus in any settings
  • 19. 19 What can cause incorrect results? • Unbounded network partition (algorithmical proof) • A long GC or hard crash • A bad config in your system • A human operating error • A bug in your code
  • 20. 20 What can cause incorrect results? • Unbounded network partition (algorithmical proof) • A long GC or hard crash • A bad config in your system • A human operating error • A bug in your code 99.9% 0.01%
  • 21. 21 What can cause incorrect results? • Unbounded network partition (algorithmical proof) • A long GC or hard crash • A bad config in your system • A human operating error • A bug in your code 99.9% 0.01% Can we do better for the 99.99% ?
  • 22. 22 So how to achieve Exactly-Once?
  • 23. 23 Option #1: “Just give up” Streaming Source Sink Batch State State
  • 24. 24 Option #2: At-least-once + Dedup Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 25. 25 Option #2: At-least-once + Dedup Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D
  • 26. 26 Option #2: At-least-once + Dedup Process State KafkaTopic A Kafka Topic B Kafka Topic C Kafka Topic D ack ack commit
  • 27. 27 Option #2: At-least-once + Dedup 2 2 3 3 4 4 Dedup
  • 28. 28 Option #3: The Kafka Way!(0.11+) • Idempotent producer: send exactly-once per partition • Transactional messaging: multiple-sends atomically
  • 29. 29 Idempotent Producer Producer Kafka Topic Cack pid = 1 pid = 1 seq = 28 pid = 1 seq = 28
  • 30. 30 Idempotent Producer Producer Kafka Topic Cack pid = 1 pid = 1 seq = 28 pid = 1 seq = 28 config: enable.idempotence = true
  • 31. 31 Atomic Multi-Sends (aka. “transactions”) Producer Kafka Topic C Kafka Topic D producer.beginTxn(); producer.send(rec1); // topic C producer.send(rec2); // topic D producer.sendOffsetsToTxn(A, 10); KafkaTopic A producer.commitTxn(); try { } catch (KafkaException e) { } Atomic Commit
  • 32. 32 Atomic Multi-Sends (aka. “transactions”) Producer Kafka Topic C Kafka Topic D producer.beginTxn(); producer.send(rec1); // topic C producer.send(rec2); // topic D producer.sendOffsetsToTxn(A, 10); KafkaTopic A producer.commitTxn(); try { } catch (KafkaException e) { } Atomic Commit producer.abortTxn();
  • 33. 33 Atomic Multi-Sends (aka. “transactions”) Consumer Kafka Topic C Kafka Topic D Read Committed consumer.subscribe(C, D); recs = consumer.poll(); for (Record rec <- recs) { // process .. } config: isolation.level = read_committed (default = read_uncommitted)
  • 34. 34 Exactly-Once Processing with Kafka Process State Kafka Topic C Kafka Topic D ack ack KafkaTopic A Kafka Topic B commit
  • 35. 35 Exactly-Once Processing with Kafka • Offset commit for source topics • Value update on processor state • Acked produce to sink topics All or Nothing
  • 36. 36 Kafka Streams (0.10+) • New client library besides producer and consumer • Powerful yet easy-to-use • Event-at-a-time, Stateful • Windowing with out-of-order handling • Highly scalable, distributed, fault tolerant • and more..
  • 39. 39 Kafka Streams DSL public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  • 40. 40 Kafka Streams DSL public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  • 41. 41 Kafka Streams DSL public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  • 42. 42 Kafka Streams DSL public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  • 43. 43 Kafka Streams DSL public static void main(String[] args) { // specify the processing topology by first reading in a stream from a topic KStream<String, String> words = builder.stream(”topic1”); // count the words in this stream as an aggregated table KTable<String, Long> counts = words.countByKey(”Counts”); // write the result table to a new topic counts.to(”topic2”); // create a stream processing instance and start running it KafkaStreams streams = new KafkaStreams(builder, config); streams.start(); }
  • 44. 44 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”);
  • 45. 45 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”);
  • 46. 46 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”);
  • 47. 47 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”);
  • 48. 48 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”);
  • 49. 49 Processor Topology KStream<..> stream1 = builder.stream(”topic3”); KStream<..> stream2 = builder.stream(”topic3”); KStream<..> joined = stream1.leftJoin(stream2, ...); KTable<..> aggregated = joined.aggregateByKey(...); aggregated.to(”topic3”); State
  • 50. 50 Processing in Kafka Streams Kafka Topic B Kafka Topic A
  • 51. 51 Processing in Kafka Streams Kafka Topic B Kafka Topic A Processor Topology P1 P2 P1 P2
  • 52. 52 Processing in Kafka Streams Kafka Topic AKafka Topic B
  • 53. 53 Processing in Kafka Streams Kafka Topic B Kafka Topic A MyApp.1 MyApp.2 Task2Task1
  • 54. 54 States in Stream Processing MyApp.2MyApp.1 Kafka Topic B Task2Task1 Kafka Topic A State State
  • 55. 55 Fault Tolerance in Streams StateProcess StateProcess StateProcess Kafka Kafka Streams Kafka Kafka Changelog
  • 56. 56 • All or Nothing for the following: • Offset commit for source topics • Value update on processor state • Acked produce to sink topics
  • 57. 57 Exactly-Once with Kafka Streams (0.11+) • Process data in transactions of: • A batch of input records from source topics • A batch of output records to changelog topics • A batch of output records to sink topics config: processing.mode = exactly-once (default = at-least-once)
  • 63. 63 What if not all my data is in Kafka?
  • 64. 64
  • 65. 65 Connectors • 40+ since first release this Feb (0.9+) • 13 from & partners
  • 67. 67 Take-aways • Exactly-once: important property for stream processing • Kafka Streams: exactly-once made easy Join Kafka Summit 2017 SF (discount code available!) Additional resources: http://www.confluent.io/resources Guozhang Wang | guozhang@confluent.io | @guozhangwang
  • 68. 68 Thank You! Guozhang Wang Kafka Meetup SF, July 27, 2017