SlideShare a Scribd company logo
1 of 33
Download to read offline
Exactly-once Stream Processing
Matthias J. Sax, Software Engineer
Apache Kafka committer and PMC member
matthias@confluent.io | @MatthiasJSax
@MatthiasJSax
Exactly-once: Delivery vs Semantics
Exactly-once Delivery
• Academic distributed system problem:
• Can we send a message an ensure it’s delivered to the receiver exactly once?
• Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault)
• Provable not possible!
Deliver != Semantics
2
@MatthiasJSax
Take input record, process it, update result, and record progress.
No Error. No Problem.
What is Exactly-once Semantics About?
3
@MatthiasJSax
What happens if something goes wrong?
Error during read, processing, write, or record progress.
We retry!
But is it safe?
What is Exactly-once Semantics About?
4
@MatthiasJSax
5
Are retries safe? With exactly-once, yes!
Exactly-once is about masking errors via safe retries.
The result of an exactly-once retry,
is semantically the same as if no error had occurred.
What is Exactly-once Semantics About?
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
6
@MatthiasJSax
There is no* Write-only Exactly-once!
(*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
• Downstream read-only consumer!
8
@MatthiasJSax
There is NO Read-only Exactly-once!
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics.
Kafka for processing
• Pattern: Consume -> Process -> Produce
• Built-in exactly-once via Kafka Streams (or DIY).
• Also possible with external source/target system!
10
@MatthiasJSax
Let’s Break it Down
Steps in a Processing Pipeline
• Read input:
• Does not modify state; re-reading is always safe.
• Process data:
• Stateless re-processing (filter, map etc) is always safe.
• Stateful re-processing: need to roll-back state before we can retry.
• Update result:
• Need to “retract” (partial) results.
• Or: rely on idempotent updates. (There are dragons!)
• Record progress:
• Modifies state in the source system (or does it?)
11
@MatthiasJSax
Exactly-once
==
At-least-once + Idempotency
It depends…
@MatthiasJSax
Idempotent Updates (Internal State)?
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
13
Cnt: 73 Cnt: 74
73+1
Cnt: 74 Cnt: 75
74+1
Retry: L
@MatthiasJSax
Idempotent Updates? Maybe…
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
• Idempotency requires context agnostic state modifications, e.g., set a new address.
14
City: LA City: NY
Set “NY”
City: NY City: NY
Set “NY”
Retry: J
@MatthiasJSax
Idempotent Updates (External State)
The issue of time travel…
15
City: LA City: NY
Set “NY”
City: BO
Set “BO”
Read: NY Read: BO
Read: LA
@MatthiasJSax
Idempotent Updates (External State)
Retrying a sequence of updates:
16
City: BO City: NY
Set “NY”
City: BO
Set “BO”
Read: NY L
Read: BO J Read: BO J
@MatthiasJSax
Idempotency is not enough.
All State Changes must be Atomic!
@MatthiasJSax
All State Changes must be Atomic
What is ”state”?
• Internal processing state.
• External state, i.e., result state.
• External state, i.e., source progress.
Transactions for the rescue!
Do we want to (can we) do a cross-system distributed transaction?
Good news: we don’t have to…
18
@MatthiasJSax
Exactly-Once with Kafka and External Systems
19
Example: Downstream target RDBMS
(Async) offset update
(not part of the transaction)
Atomic write via
ACID transaction
State
Result
Offsets
@MatthiasJSax
Exactly-Once with Kafka and External Systems
20
Example: Downstream target RDBMS
State
Result
Offsets
Reset offsets
and retry
@MatthiasJSax
Kafka Connect (Part 1)
Exactly-once Sink
• Has “nothing” to do with Kafka:
• Kafka provides source system progress tracking via offsets.
• Connect provide API to fetch start offsets from target system.
• Depends on targe system properties / features.
• Each individual connector must implement it.
21
@MatthiasJSax
How does Kafka Tackle Exactly-once?
22
Kafka Transactions
Multi-partition/multi-topic atomic write:
0 0
0 0 0
1 1 1 1
2
2
2
3
4
3
1
2
t
1
-
p
0
t
1
-
p
1
t
2
-
p
0
t
2
-
p
1
t
2
-
p
2
2
3
@MatthiasJSax
How does Kafka Tackle Exactly-once?
23
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Exactly-Once with Kafka
24
Kafka as Sink
Requirement: ability to track source system progress.
result
state (via changelogs)
source progress (via custom metadata topic)
@MatthiasJSax
Kafka Connect (Part 2)
•
•
•
•
•
Exactly-once Source
• “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors”
• Tomorrow: 2pm
• Chris Egerton, Aiven
• KIP-618 (Apache Kafka 3.3):
• https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors
25
@MatthiasJSax
Kafka Streams
26
Kafka Transactions
Atomic read-process-write pattern:
@MatthiasJSax
Kafka Streams
27
__consumer_offsets
changelogs
result
Kafka Transactions
Multi-partition/multi-topic atomic write:
@MatthiasJSax
Kafka Streams
28
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.addOffsetsToTransaction(…);
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Kafka Streams
Single vs Multi-cluster
Kafka Streams (current) only works against a single broker cluster:
• Does not really matter. We still rely on the brokers as target system.
• Need source offsets but commit them via the producer.
• Single broker cluster only avoids “dual” commit of source offsets.
Supporting cross-cluster EOS with Kafka Streams is possible:
• Add custom metadata topic to targe cluster.
• Replace addOffsetsToTransaction() with send().
• Fetch consumer offset manually from metadata topic.
• Issues:
• EOS v2 implementation (producer per thread) not possible.
• Limited to single target cluster.
29
@MatthiasJSax
The Big Challenge
Error Handling in a (Distributed) Application
Kafka transaction allow to fence “zombie” producers.
Any EOS target system needs to support something similar (or rely on idempotency if possible).
Kafka Connect Sink Connectors:
• Idempotency or sink system fencing required—Connect framework cannot help at all.
Kafka Connect Source Connectors:
• Relies on producer fencing.
• Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation).
Kafka Streams:
• Relies on producer fencing (EOS v1) or consumer fencing (EOS v2).
• EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster.
30
@MatthiasJSax
What to do in Practice?
Publishing with producer-only app?
The important thing is to figure out where to resume on restart:
• Is there any “source progress” information you can store?
• You need to add a consumer to your app!
• On app restart:
• Initialize producer to fence potential zombie and to force any pending TX to complete.
• Use consumer (in read-committed mode) to inspect the target cluster’s data.
Reading with consumer-only app?
• If there is no target data system, only idempotency can help.
• With no target data system, everything is basically a side-effect.
31
@MatthiasJSax
Exactly-once Key Takeaways
(A) no producer-only EOS
(B) no consumer-only EOS
(C) read-process-write pattern
(1) need ability to track source system read progress
(2) require target system atomic write (plus fencing)
(3) source system progress is recorded in target system
Kafka built-in support via transactions + Zero coding with Kafka Streams
✅
@MatthiasJSax

More Related Content

Similar to Exactly-once Stream Processing Done Right with Matthias J Sax

Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar to Exactly-once Stream Processing Done Right with Matthias J Sax (20)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache Kafka
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610
 
HyperBatch
HyperBatchHyperBatch
HyperBatch
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Recently uploaded (20)

WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 

Exactly-once Stream Processing Done Right with Matthias J Sax

  • 1. Exactly-once Stream Processing Matthias J. Sax, Software Engineer Apache Kafka committer and PMC member matthias@confluent.io | @MatthiasJSax
  • 2. @MatthiasJSax Exactly-once: Delivery vs Semantics Exactly-once Delivery • Academic distributed system problem: • Can we send a message an ensure it’s delivered to the receiver exactly once? • Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault) • Provable not possible! Deliver != Semantics 2
  • 3. @MatthiasJSax Take input record, process it, update result, and record progress. No Error. No Problem. What is Exactly-once Semantics About? 3
  • 4. @MatthiasJSax What happens if something goes wrong? Error during read, processing, write, or record progress. We retry! But is it safe? What is Exactly-once Semantics About? 4
  • 5. @MatthiasJSax 5 Are retries safe? With exactly-once, yes! Exactly-once is about masking errors via safe retries. The result of an exactly-once retry, is semantically the same as if no error had occurred. What is Exactly-once Semantics About?
  • 6. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! 6
  • 7. @MatthiasJSax There is no* Write-only Exactly-once! (*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
  • 8. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! • Downstream read-only consumer! 8
  • 9. @MatthiasJSax There is NO Read-only Exactly-once!
  • 10. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics. Kafka for processing • Pattern: Consume -> Process -> Produce • Built-in exactly-once via Kafka Streams (or DIY). • Also possible with external source/target system! 10
  • 11. @MatthiasJSax Let’s Break it Down Steps in a Processing Pipeline • Read input: • Does not modify state; re-reading is always safe. • Process data: • Stateless re-processing (filter, map etc) is always safe. • Stateful re-processing: need to roll-back state before we can retry. • Update result: • Need to “retract” (partial) results. • Or: rely on idempotent updates. (There are dragons!) • Record progress: • Modifies state in the source system (or does it?) 11
  • 13. @MatthiasJSax Idempotent Updates (Internal State)? Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! 13 Cnt: 73 Cnt: 74 73+1 Cnt: 74 Cnt: 75 74+1 Retry: L
  • 14. @MatthiasJSax Idempotent Updates? Maybe… Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! • Idempotency requires context agnostic state modifications, e.g., set a new address. 14 City: LA City: NY Set “NY” City: NY City: NY Set “NY” Retry: J
  • 15. @MatthiasJSax Idempotent Updates (External State) The issue of time travel… 15 City: LA City: NY Set “NY” City: BO Set “BO” Read: NY Read: BO Read: LA
  • 16. @MatthiasJSax Idempotent Updates (External State) Retrying a sequence of updates: 16 City: BO City: NY Set “NY” City: BO Set “BO” Read: NY L Read: BO J Read: BO J
  • 17. @MatthiasJSax Idempotency is not enough. All State Changes must be Atomic!
  • 18. @MatthiasJSax All State Changes must be Atomic What is ”state”? • Internal processing state. • External state, i.e., result state. • External state, i.e., source progress. Transactions for the rescue! Do we want to (can we) do a cross-system distributed transaction? Good news: we don’t have to… 18
  • 19. @MatthiasJSax Exactly-Once with Kafka and External Systems 19 Example: Downstream target RDBMS (Async) offset update (not part of the transaction) Atomic write via ACID transaction State Result Offsets
  • 20. @MatthiasJSax Exactly-Once with Kafka and External Systems 20 Example: Downstream target RDBMS State Result Offsets Reset offsets and retry
  • 21. @MatthiasJSax Kafka Connect (Part 1) Exactly-once Sink • Has “nothing” to do with Kafka: • Kafka provides source system progress tracking via offsets. • Connect provide API to fetch start offsets from target system. • Depends on targe system properties / features. • Each individual connector must implement it. 21
  • 22. @MatthiasJSax How does Kafka Tackle Exactly-once? 22 Kafka Transactions Multi-partition/multi-topic atomic write: 0 0 0 0 0 1 1 1 1 2 2 2 3 4 3 1 2 t 1 - p 0 t 1 - p 1 t 2 - p 0 t 2 - p 1 t 2 - p 2 2 3
  • 23. @MatthiasJSax How does Kafka Tackle Exactly-once? 23 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.commitTransaction(); // or .abortTransaction()
  • 24. @MatthiasJSax Exactly-Once with Kafka 24 Kafka as Sink Requirement: ability to track source system progress. result state (via changelogs) source progress (via custom metadata topic)
  • 25. @MatthiasJSax Kafka Connect (Part 2) • • • • • Exactly-once Source • “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors” • Tomorrow: 2pm • Chris Egerton, Aiven • KIP-618 (Apache Kafka 3.3): • https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors 25
  • 28. @MatthiasJSax Kafka Streams 28 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.addOffsetsToTransaction(…); producer.commitTransaction(); // or .abortTransaction()
  • 29. @MatthiasJSax Kafka Streams Single vs Multi-cluster Kafka Streams (current) only works against a single broker cluster: • Does not really matter. We still rely on the brokers as target system. • Need source offsets but commit them via the producer. • Single broker cluster only avoids “dual” commit of source offsets. Supporting cross-cluster EOS with Kafka Streams is possible: • Add custom metadata topic to targe cluster. • Replace addOffsetsToTransaction() with send(). • Fetch consumer offset manually from metadata topic. • Issues: • EOS v2 implementation (producer per thread) not possible. • Limited to single target cluster. 29
  • 30. @MatthiasJSax The Big Challenge Error Handling in a (Distributed) Application Kafka transaction allow to fence “zombie” producers. Any EOS target system needs to support something similar (or rely on idempotency if possible). Kafka Connect Sink Connectors: • Idempotency or sink system fencing required—Connect framework cannot help at all. Kafka Connect Source Connectors: • Relies on producer fencing. • Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation). Kafka Streams: • Relies on producer fencing (EOS v1) or consumer fencing (EOS v2). • EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster. 30
  • 31. @MatthiasJSax What to do in Practice? Publishing with producer-only app? The important thing is to figure out where to resume on restart: • Is there any “source progress” information you can store? • You need to add a consumer to your app! • On app restart: • Initialize producer to fence potential zombie and to force any pending TX to complete. • Use consumer (in read-committed mode) to inspect the target cluster’s data. Reading with consumer-only app? • If there is no target data system, only idempotency can help. • With no target data system, everything is basically a side-effect. 31
  • 32. @MatthiasJSax Exactly-once Key Takeaways (A) no producer-only EOS (B) no consumer-only EOS (C) read-process-write pattern (1) need ability to track source system read progress (2) require target system atomic write (plus fencing) (3) source system progress is recorded in target system Kafka built-in support via transactions + Zero coding with Kafka Streams ✅