SlideShare a Scribd company logo
1 of 68
1
Introducing Exactly Once
Semantics in Apache Kafka
Jason Gustafson, Guozhang Wang, Sriram
Subramaniam, and Apurva Mehta
2
On deck..
• Kafka’s existing delivery semantics.
• Why did we improve them?
• What’s new?
• How do you use it?
• Summary.
3
Apache Kafka’s existing semantics
4
Existing Semantics
5
Existing Semantics
6
Existing Semantics
7
Existing Semantics
8
Existing Semantics
9
Existing Semantics
10
Existing Semantics
11
Existing Semantics
12
Existing Semantics
13
TL;DR – What we have today
• At least once in order delivery per partition.
• Producer retries can introduce duplicates.
14
Why improve?
15
Why improve?
• Stream processing is becoming an ever bigger part of the
data landscape.
• Apache Kafka is the heart of the streams platform.
• Strengthening Kafka’s semantics expands the universe of
streaming applications.
16
A motivating example..
A peer to peer lending platform which processes micro-loans
between users.
17
A Peer to Peer Lender
18
The Basic Flow
19
Offset commits
20
Reprocessed transfer, eek!
21
Lost money! Eek eek!
22
What’s new?
23
What’s new
• Exactly once in order delivery per partition
• Atomic writes across multiple partitions
• Performance considerations
24
What’s new, Part 1
Exactly once, in order, delivery per partition
25
The idempotent producer
26
The idempotent producer
27
The idempotent producer
28
The idempotent producer
29
The idempotent producer
30
The idempotent producer
31
The idempotent producer
32
The idempotent producer
33
TL;DR
• Sequence numbers and producer ids:
• enable de-dup
• are in the log.
• Hence de-dup works transparently across leader
changes.
• Will not de-dup application-level resends.
• Works transparently – no API changes.
34
What’s new, part 2
Multi partition writes.
35
Introducing ‘transactions’
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
36
Introducing ‘transactions’
37
Initializing ‘transactions’
38
Transactional sends – part 1
39
Transactional sends – part 2
40
Commit – phase 1
41
Commit – phase 2
42
Commit – phase 2
43
Success!
44
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
45
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
46
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
47
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
48
Let’s review the APIs
producer.initTransactions();
try {
producer.beginTransaction();
producer.send(record0);
producer.send(record1);
producer.sendOffsetsToTxn(…);
producer.commitTransaction();
} catch (ProducerFencedException e) {
producer.close();
} catch (KafkaException e) {
producer.abortTransaction();
}
49
Consumer returns only committed messages
50
Some notes on consuming transactions
• Two ‘isolation levels’ : read_committed, and
read_uncommitted.
• Messages read in offset order.
• read_committed consumers read to the point where there
are no open transactions.
51
TL;DR
• Transaction coordinator and transaction log maintain
transaction state.
• Use the new producer APIs for transactions.
• Consumers can read only committed messages.
52
Part 3
Performance!
53
What’s new, part 3: Performance boost!
• Up to +20% producer throughput
• Up to +50% consumer throughput
• Up to -20% disk utilization
• Savings start when you batch
• Details: https://bit.ly/kafka-eos-perf
54
Too good to be true?
Let’s understand how!
55
The old message format
56
The new format
57
The new format -> new fields
58
The new format -> new fields
59
The new format -> delta encoding
60
A visual comparison with 7 records, 10 bytes each
61
TL;DR
• With a batch size of 2, the new format starts saving
space.
• Savings are maximal for large batches of small
messages.
• Hence higher throughput when IO bound.
• Works as soon as you upgrade to the new format.
62
Cool!
But how do I use this?
63
Producer Configs
• enable.idempotence = true
• max.inflight.requests.per.connection=1
• acks = “all”
• retries > 1 (preferably MAX_INT)
• transactional.id = ‘some unique id’
• enable.idempotence = true
64
Consumer configs
• isolation.level:
• “read_committed”, or
• “read_uncommitted”
65
Streams config
• processing.mode = “exactly_once”
66
Putting it together
• We understood Kafka’s existing delivery semantics
• Understood why we want to improve them
• Learned how these have been strengthened
• Learned how the new semantics work
67
When is it available?
Available to try in Kafka 0.11, June 2017.
68
Thank You!

More Related Content

What's hot

Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
HostedbyConfluent
 

What's hot (20)

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Apache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - VerisignApache Kafka 0.8 basic training - Verisign
Apache Kafka 0.8 basic training - Verisign
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
Event Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQEvent Sourcing & CQRS, Kafka, Rabbit MQ
Event Sourcing & CQRS, Kafka, Rabbit MQ
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 
IBM MQ High Availability 2019
IBM MQ High Availability 2019IBM MQ High Availability 2019
IBM MQ High Availability 2019
 
IBM MQ Online Tutorials
IBM MQ Online TutorialsIBM MQ Online Tutorials
IBM MQ Online Tutorials
 
Kubernetes 101
Kubernetes 101Kubernetes 101
Kubernetes 101
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel QuarkusApache Camel v3, Camel K and Camel Quarkus
Apache Camel v3, Camel K and Camel Quarkus
 
Qlik composeのご紹介
Qlik composeのご紹介Qlik composeのご紹介
Qlik composeのご紹介
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
CQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java DevelopersCQRS and Event Sourcing for Java Developers
CQRS and Event Sourcing for Java Developers
 
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and LinkerdService Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
Service Mesh with Apache Kafka, Kubernetes, Envoy, Istio and Linkerd
 
Building Cloud-Native Applications with Helidon
Building Cloud-Native Applications with HelidonBuilding Cloud-Native Applications with Helidon
Building Cloud-Native Applications with Helidon
 
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
Simplifying Distributed Transactions with Sagas in Kafka (Stephen Zoio, Simpl...
 
TECHTALK 20200825 RやPythonとの連携で実現するQlik Senseの高度な分析
TECHTALK 20200825 RやPythonとの連携で実現するQlik Senseの高度な分析TECHTALK 20200825 RやPythonとの連携で実現するQlik Senseの高度な分析
TECHTALK 20200825 RやPythonとの連携で実現するQlik Senseの高度な分析
 
IBM MQ V9 Overview
IBM MQ V9 OverviewIBM MQ V9 Overview
IBM MQ V9 Overview
 
Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
Robinhood’s Kafkaproxy: Decoupling Kafka Consumer Logic from Application Busi...
 

Viewers also liked

Viewers also liked (8)

Working Effectively With Legacy Code
Working Effectively With Legacy CodeWorking Effectively With Legacy Code
Working Effectively With Legacy Code
 
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark ProcessingBulletproof Jobs: Patterns For Large-Scale Spark Processing
Bulletproof Jobs: Patterns For Large-Scale Spark Processing
 
Open Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache AtlasOpen Metadata and Governance with Apache Atlas
Open Metadata and Governance with Apache Atlas
 
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams APIuser Behavior Analysis with Session Windows and Apache Kafka's Streams API
user Behavior Analysis with Session Windows and Apache Kafka's Streams API
 
Avro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and HadoopAvro Tutorial - Records with Schema for Kafka and Hadoop
Avro Tutorial - Records with Schema for Kafka and Hadoop
 
optics ppt
optics pptoptics ppt
optics ppt
 
Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)Intro to Pinot (2016-01-04)
Intro to Pinot (2016-01-04)
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 

Similar to Introducing Exactly Once Semantics To Apache Kafka

Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 
The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP
Eberhard Wolff
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
Sadayuki Furuhashi
 

Similar to Introducing Exactly Once Semantics To Apache Kafka (20)

Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Springone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and ReactorSpringone2gx 2014 Reactive Streams and Reactor
Springone2gx 2014 Reactive Streams and Reactor
 
BigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache ApexBigDataSpain 2016: Stream Processing Applications with Apache Apex
BigDataSpain 2016: Stream Processing Applications with Apache Apex
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Exactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J SaxExactly-once Stream Processing Done Right with Matthias J Sax
Exactly-once Stream Processing Done Right with Matthias J Sax
 
Stream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas WeiseStream Processing use cases and applications with Apache Apex by Thomas Weise
Stream Processing use cases and applications with Apache Apex by Thomas Weise
 
Stream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and BeamStream processing in python with Apache Samza and Beam
Stream processing in python with Apache Samza and Beam
 
My internship presentation at WSO2
My internship presentation at WSO2My internship presentation at WSO2
My internship presentation at WSO2
 
Highly concurrent yet natural programming
Highly concurrent yet natural programmingHighly concurrent yet natural programming
Highly concurrent yet natural programming
 
Why scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with thisWhy scala is not my ideal language and what I can do with this
Why scala is not my ideal language and what I can do with this
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP The Future of Messaging: RabbitMQ and AMQP
The Future of Messaging: RabbitMQ and AMQP
 
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
/* pOrt80BKK */ - PHP Day - PHP Performance with APC + Memcached for Windows
 
Journey into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka StreamsJourney into Reactive Streams and Akka Streams
Journey into Reactive Streams and Akka Streams
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
 
High Availability for OpenStack
High Availability for OpenStackHigh Availability for OpenStack
High Availability for OpenStack
 
Reactive programming with examples
Reactive programming with examplesReactive programming with examples
Reactive programming with examples
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 

Recently uploaded

School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
Kamal Acharya
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdf
Kamal Acharya
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
tuuww
 

Recently uploaded (20)

Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Teachers record management system project report..pdf
Teachers record management system project report..pdfTeachers record management system project report..pdf
Teachers record management system project report..pdf
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
 
An improvement in the safety of big data using blockchain technology
An improvement in the safety of big data using blockchain technologyAn improvement in the safety of big data using blockchain technology
An improvement in the safety of big data using blockchain technology
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
retail automation billing system ppt.pptx
retail automation billing system ppt.pptxretail automation billing system ppt.pptx
retail automation billing system ppt.pptx
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
The battle for RAG, explore the pros and cons of using KnowledgeGraphs and Ve...
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 

Introducing Exactly Once Semantics To Apache Kafka

Editor's Notes

  1. Stress the application level resends bit. Encourage people to rely on the producer retries and not re-send messages from their apps.
  2. New concept – control messages. Mention that commit markers are special message with log the producer id and the result of the transaction. These messages are not passed on to application – the client interprets them and acts accordingly.
  3. Mention ’read_uncommitted’ Mention that the buffering is broker side.
  4. Mention transaction expiration.
  5. Specify that you don’t need to use any of the new features to get these performance savings.
  6. max.inflight is required for idempotence. It will cause a slowdown because you now have a sync producer.