SlideShare a Scribd company logo
Deep dive into Apache
Kafka consumption
Goals
• Better understanding of Apache Kafka architecture
and possible delivery guarantees
• The happy coding path towards fault-tolerant Kafka
consumption using Kafka Java client and Akka
Stream
Apache Kafka?
“Apache Kafka is publish-subscribe messaging system
rethought as a distributed commit log.”
Consumer
poll messages
Consumer
poll messages
Ordering per partition
Consumer
Commit storage
(ZK, Kafka, …)
commit:
(Partition 0, Offset 5)

(Partition 1, Offset 3)
(Partition 2, Offset 10) poll messages
Consumer
Commit storage
(ZK, Kafka, …)
poll messages
Consumer
Commit storage
(ZK, Kafka, …)
get:
(Partition 0, Offset 5)

(Partition 1, Offset 3)
(Partition 2, Offset 10) poll messages
restarting…
Consumer
1
Commit storage
(ZK, Kafka, …)
commit:
(Partition 0, Offset 5)

(Partition 1, Offset 3)
poll messages
Consumer
2
commit:

(Partition 2, Offset 10)
Same consumer-group

(balance)
Consumer
1
Commit storage
(ZK, Kafka, …)
commit:
(Partition 0, Offset 5)

(Partition 1, Offset 3)
(Partition 2, Offset 10) poll messages
Consumer
2
Different consumer-groups

(broadcast)
commit:
(Partition 0, Offset 2)

(Partition 1, Offset 1)
(Partition 2, Offset 3)
Delivery guarantees:
commit before
1. Get message
2. Commit offset
3. Begin message processing
4. End message processing
loop:
Delivery guarantees: 

commit before
1. Get message
2. Commit offset
3. Begin message processing
4. End message processing
Node failure / 

Redeployment /
Processing failure
Message lost! 

At-most-once guarantee
loop:
Delivery guarantees:

commit after
1. Get message
2. Begin message processing
3. End message processing
4. Commit offset
Node failure / 

Redeployment /
Processing failure
Message processed twice! 

At-least-once guarantee
loop:
Delivery guarantees: 

auto-commit
1. Get message
2. Begin message processing
3. End message processing
Node failure / 

Redeployment /
Processing failure
Message lost OR processed twice! 

No guarantee
loop:
Delivery guarantees:
exactly-once?
• At-least-once + idempotent message processing
• ex: update a key-value DB that stores the last
state of a device
• At-least-once + atomic message processing and
storage of offset
• ex: store offset + message in a SQL DB in a
transaction, and use this DB as the main offset
storage
How can I apply these
concepts in my code?
Kafka Java client: 

at-least-once
Async non-blocking?
• In a Reactive/Scala world, message processing is
usually asynchronous (non-blocking IO call to a
DB, ask Akka actor, …):



def processMsg(message: String): Future[Result]
• How to process your Kafka messages staying
reactive (i.e not blocking threads)?
Kafka Java client: 

async non-blocking?
Kafka Java client: 

async non-blocking?
• Out-of-order processing!
• No guarantee anymore! (offset N can be committed before N-1,
“shadowing” N-1)
• Unbounded amount of messages in-memory. If Kafka message
rate > processing speed, can lead to Out Of Memory
What do we need?
Ordered asynchronous stream processing
with back pressure
What do we need?
Ordered asynchronous stream processing
with back pressure
ENTER REACTIVE STREAMS
Reactive Streams
• “Reactive Streams is an initiative to provide a
standard for asynchronous stream processing with
non-blocking back pressure.”
• Backed by Netflix, Pivotal, Red Hat, Twitter,
Lightbend (Typesafe), …
• Implementations: RxJava, Akka Stream, Reactor, …
Akka Stream
• Stream processing abstraction on top of Akka
Actors
• Types! Types are back!
• Source[A] ~> Flow[A, B] ~> Sink[B]
• Automatic back pressure
Reactive Kafka
• Akka Stream client for Kafka
• On top of Kafka Java client 0.9+
• https://github.com/akka/reactive-kafka
Reactive Kafka
Reactive Kafka
• At-least-once semantic in case of node failure / redeployment
• Asynchronous processing without blocking any thread
• Back pressure
• Ordered processing
• But what if the processMsg function fails?
The difference between
Error and Failure
• Error: something went wrong, and this is
deterministic (it will happen again if you do the
same call)

ex: HTTP 4xx, Deserialisation exception, Duplicate
key DB error
• Failure: something went wrong, and this is not
deterministic (it may not happen again if you do
the same call):

ex: HTTP 5xx, network exception
Error and Failure in Scala
code using Scalactic
Future[Result Or Every[Error]]
can contain one or more Errorscan contain a Failure
Error and Failure in Scala
code (non-async)
Try[Result Or Every[Error]]
can contain one or more Errorscan contain a Failure
Fault-tolerant consumption
with Reactive Kafka
Keeping message ordering
even in failure cases
• Retrying message processing upon failures will block the
processing of subsequent messages, but that’s ok if message
processing is homogenous
• ex: if processMsg of msg N results in a network failure calling
a DB (say ELS), there is a high probability that processMsg of
msg N+1 will encounter the same failure, so blocking is ok
and even better to avoid losing messages due to transient
failures
• If message processing is heterogenous (calling different
external systems according to the msg), it is better to implement
different consumer-groups and/or have different topics
Consumer
poll messages
Reminder: Kafka guarantees
ordering only per partition
If #(consumer instances) < #(Kafka partitions), 

at least one consumer instance will process two or more partitions
Parallel processing between partitions
while keeping ordering per partition
Bonus: auto-adaptive micro-batching
windows per partition based on back
pressure signal
Dynamic trade-off between latency and throughput!
Conclusion
• Apache Kafka as a system is scalable and fault-
tolerant but fault-tolerant consumption can be tricky
• But with the right concepts and the right tools, we
can make Kafka consumption fault-tolerant very
easily (i.e with a few lines of extra code)
Thank you!
Questions?

More Related Content

What's hot

Apache kafka
Apache kafkaApache kafka
Apache kafka
Viswanath J
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
HostedbyConfluent
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
Knoldus Inc.
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
confluent
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
Amir Sedighi
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
iamtodor
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
Goutam Chowdhury
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
confluent
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
emreakis
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
confluent
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
confluent
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
Allen (Xiaozhong) Wang
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
confluent
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
Chhavi Parasher
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
confluent
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
confluent
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
Steven Wu
 

What's hot (20)

Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
Running large scale Kafka upgrades at Yelp (Manpreet Singh,Yelp) Kafka Summit...
 
An Introduction to Apache Kafka
An Introduction to Apache KafkaAn Introduction to Apache Kafka
An Introduction to Apache Kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Kafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summitKafka goutam chowdhury-unicom-spark kafka-summit
Kafka goutam chowdhury-unicom-spark kafka-summit
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
Please Upgrade Apache Kafka. Now. (Gwen Shapira, Confluent) Kafka Summit SF 2019
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
Kafka Needs no Keeper( Jason Gustafson & Colin McCabe, Confluent) Kafka Summi...
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
From Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka Journey
 
Capture the Streams of Database Changes
Capture the Streams of Database ChangesCapture the Streams of Database Changes
Capture the Streams of Database Changes
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
Real Time Streaming Data with Kafka and TensorFlow (Yong Tang, MobileIron) Ka...
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 

Similar to Deep dive into Apache Kafka consumption

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps_Fest
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
Dimitris Kontokostas
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Saroj Panyasrivanit
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
Srikrishna k
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
TarekHamdi8
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Srikrishna k
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Ramakrishna kapa
 
Apache kafka
Apache kafkaApache kafka
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
Knoldus Inc.
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
confluent
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
confluent
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
confluent
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
Ravindra kumar
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
Enno Runne
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
confluent
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
Kumar Shivam
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Dibyendu Bhattacharya
 

Similar to Deep dive into Apache Kafka consumption (20)

DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Exactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache KafkaExactly-once Semantics in Apache Kafka
Exactly-once Semantics in Apache Kafka
 
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
Streaming Design Patterns Using Alpakka Kafka Connector (Sean Glover, Lightbe...
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Let the alpakka pull your stream
Let the alpakka pull your streamLet the alpakka pull your stream
Let the alpakka pull your stream
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 

Recently uploaded

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Deep dive into Apache Kafka consumption

  • 1. Deep dive into Apache Kafka consumption
  • 2. Goals • Better understanding of Apache Kafka architecture and possible delivery guarantees • The happy coding path towards fault-tolerant Kafka consumption using Kafka Java client and Akka Stream
  • 3. Apache Kafka? “Apache Kafka is publish-subscribe messaging system rethought as a distributed commit log.”
  • 4.
  • 5.
  • 8. Consumer Commit storage (ZK, Kafka, …) commit: (Partition 0, Offset 5)
 (Partition 1, Offset 3) (Partition 2, Offset 10) poll messages
  • 10. Consumer Commit storage (ZK, Kafka, …) get: (Partition 0, Offset 5)
 (Partition 1, Offset 3) (Partition 2, Offset 10) poll messages restarting…
  • 11. Consumer 1 Commit storage (ZK, Kafka, …) commit: (Partition 0, Offset 5)
 (Partition 1, Offset 3) poll messages Consumer 2 commit:
 (Partition 2, Offset 10) Same consumer-group
 (balance)
  • 12. Consumer 1 Commit storage (ZK, Kafka, …) commit: (Partition 0, Offset 5)
 (Partition 1, Offset 3) (Partition 2, Offset 10) poll messages Consumer 2 Different consumer-groups
 (broadcast) commit: (Partition 0, Offset 2)
 (Partition 1, Offset 1) (Partition 2, Offset 3)
  • 13. Delivery guarantees: commit before 1. Get message 2. Commit offset 3. Begin message processing 4. End message processing loop:
  • 14. Delivery guarantees: 
 commit before 1. Get message 2. Commit offset 3. Begin message processing 4. End message processing Node failure / 
 Redeployment / Processing failure Message lost! 
 At-most-once guarantee loop:
  • 15. Delivery guarantees:
 commit after 1. Get message 2. Begin message processing 3. End message processing 4. Commit offset Node failure / 
 Redeployment / Processing failure Message processed twice! 
 At-least-once guarantee loop:
  • 16. Delivery guarantees: 
 auto-commit 1. Get message 2. Begin message processing 3. End message processing Node failure / 
 Redeployment / Processing failure Message lost OR processed twice! 
 No guarantee loop:
  • 17. Delivery guarantees: exactly-once? • At-least-once + idempotent message processing • ex: update a key-value DB that stores the last state of a device • At-least-once + atomic message processing and storage of offset • ex: store offset + message in a SQL DB in a transaction, and use this DB as the main offset storage
  • 18. How can I apply these concepts in my code?
  • 19. Kafka Java client: 
 at-least-once
  • 20. Async non-blocking? • In a Reactive/Scala world, message processing is usually asynchronous (non-blocking IO call to a DB, ask Akka actor, …):
 
 def processMsg(message: String): Future[Result] • How to process your Kafka messages staying reactive (i.e not blocking threads)?
  • 21. Kafka Java client: 
 async non-blocking?
  • 22. Kafka Java client: 
 async non-blocking? • Out-of-order processing! • No guarantee anymore! (offset N can be committed before N-1, “shadowing” N-1) • Unbounded amount of messages in-memory. If Kafka message rate > processing speed, can lead to Out Of Memory
  • 23. What do we need? Ordered asynchronous stream processing with back pressure
  • 24. What do we need? Ordered asynchronous stream processing with back pressure ENTER REACTIVE STREAMS
  • 25. Reactive Streams • “Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure.” • Backed by Netflix, Pivotal, Red Hat, Twitter, Lightbend (Typesafe), … • Implementations: RxJava, Akka Stream, Reactor, …
  • 26. Akka Stream • Stream processing abstraction on top of Akka Actors • Types! Types are back! • Source[A] ~> Flow[A, B] ~> Sink[B] • Automatic back pressure
  • 27. Reactive Kafka • Akka Stream client for Kafka • On top of Kafka Java client 0.9+ • https://github.com/akka/reactive-kafka
  • 29. Reactive Kafka • At-least-once semantic in case of node failure / redeployment • Asynchronous processing without blocking any thread • Back pressure • Ordered processing • But what if the processMsg function fails?
  • 30. The difference between Error and Failure • Error: something went wrong, and this is deterministic (it will happen again if you do the same call)
 ex: HTTP 4xx, Deserialisation exception, Duplicate key DB error • Failure: something went wrong, and this is not deterministic (it may not happen again if you do the same call):
 ex: HTTP 5xx, network exception
  • 31. Error and Failure in Scala code using Scalactic Future[Result Or Every[Error]] can contain one or more Errorscan contain a Failure
  • 32. Error and Failure in Scala code (non-async) Try[Result Or Every[Error]] can contain one or more Errorscan contain a Failure
  • 34. Keeping message ordering even in failure cases • Retrying message processing upon failures will block the processing of subsequent messages, but that’s ok if message processing is homogenous • ex: if processMsg of msg N results in a network failure calling a DB (say ELS), there is a high probability that processMsg of msg N+1 will encounter the same failure, so blocking is ok and even better to avoid losing messages due to transient failures • If message processing is heterogenous (calling different external systems according to the msg), it is better to implement different consumer-groups and/or have different topics
  • 35. Consumer poll messages Reminder: Kafka guarantees ordering only per partition If #(consumer instances) < #(Kafka partitions), 
 at least one consumer instance will process two or more partitions
  • 36. Parallel processing between partitions while keeping ordering per partition
  • 37. Bonus: auto-adaptive micro-batching windows per partition based on back pressure signal Dynamic trade-off between latency and throughput!
  • 38. Conclusion • Apache Kafka as a system is scalable and fault- tolerant but fault-tolerant consumption can be tricky • But with the right concepts and the right tools, we can make Kafka consumption fault-tolerant very easily (i.e with a few lines of extra code)