SlideShare a Scribd company logo
1 of 30
Download to read offline
Error handling with Apache Kafka
From patterns to code
Agenda
→ →
WHAT KAFKA DOES FOR YOU
(SINCE KAFKA 3.0)
acks: all
The leader waits for all in-sync replicas to acknowledge
the record. This ensures that the record is not lost as
long as at least one in-sync replica remains alive.
enable.idempotence: true
Ensures that exactly one copy of each message is written
in the stream even in the event of a broker failure
max.in.flight.requests.per.connection: 5
The maximum number of unacknowledged requests the
client will send on a single connection before blocking.
Together with enable.idempotence=true,
this ensures ordering preservation
WHAT YOU NEED TO DO
bootstrap.servers
List more than one broker to ensure that the
cluster can be discovered even if one broker is
down
transactional.id
Enables transactional delivery
If transactional.id is configured,
enable.idempotence is implied
OTHER PROPERTIES YOU
MIGHT WANT TO TUNE
retries
max.block.ms
delivery.timeout.ms
WHAT KAFKA DOES FOR YOU
enable.auto.commit: true
If true, the consumer’s offset will be periodically
committed in the background.
auto.commit.interval.ms: 5000
This represents a reasonable trade-off between
performance and reliability
WHAT YOU NEED TO DO
bootstrap.servers
List more than one broker to ensure that the
cluster can be discovered even if one broker is
down
isolation.level: read_committed
Controls how to read messages written
transactionally. Non-transactional messages will
be returned unconditionally in either mode.
OTHER PROPERTIES YOU
MIGHT WANT TO TUNE
heartbeat.interval.ms
session.timeout.ms
connections.max.idle.ms
max.poll.interval.ms
request.timeout.ms
reconnect.backoff.max.ms
reconnect.backoff.ms
retry.backoff.ms
WHAT KAFKA DOES FOR YOU
unclean.leader.election.enable: false
Replica which are not in sync cannot become
leaders. This prevents data losses
(at the expense of availability)
offsets.topic.replication.factor: 3
The replication factor for the offsets topic
(set higher to enhance reliability)
WHAT YOU NEED TO DO
(default.)replication.factor: 1 -> 3
Determines how many times a topic will be
replicated (including the leader)
min.insync.replicas: 1 -> 2
Specifies the minimum number of replicas that
must acknowledge a write for it to be considered
committed with ack=all. Otherwise, the producer
will raise an exception
broker.rack
Rack of the broker. Rack aware replication will try
to replicate across racks for higher fault tolerance
OTHER PROPERTIES YOU
MIGHT WANT TO TUNE
cleanup.policy
log.retention.*
delete.retention.ms
IMPROVE RELIABILITY
default.deserialization.exception.handler
If not set, the Kafka Streams engine will shut down
in the event of a deserialization error
default.production.exception.handler
If not set, the Kafka Streams engine will shut down
in the event of a serialization error
processing.guarantee: exactly_once_v2
As an alternative, the application can be designed
with idempotence in mind
retries: 2147483647
Setting a value greater than zero will cause
the client to resend any request that fails
with a potentially transient error
IMPROVE AVAILABILITY
num.standby.replicas: 0 -> 1
Standby replicas are shadow copies of local state
stores. They are used to minimize the latency of
task failover.
state.dir
Directory where the state is stored. Should be a
persistent volume in Kubernetes for faster
start-up time
DESCRIPTION
When an error occurs, the application stops. Manual
intervention is required.
WHEN TO USE?
When all input events must be processed in order
without exception.
Examples:
− Kafka Streams applications stop when an
unhandled error is encountered
− In Change Data Capture (CDC), events need to be
processed in order, to guarantee consistency
DISCUSSION
− This is the easiest and safest approach
(at the expense of availability)
− Stopping the world might not be acceptable
in a production environment
DESCRIPTION
When an error occurs, the message is routed to a
dead letter topic. Processing of subsequent messages
carries on.
WHEN TO USE?
When retries do not make sense (e.g.
NullPointerException) or maximum number of
retries has been exhausted.
Example:
− Temperature measurements in an IOT cluster
DISCUSSION
− Ordering is no longer guaranteed
− If this is acceptable, guarantees a higher
availability than “stop on error”
− A process to handle the messages in the dead
letter topic must be defined
DESCRIPTION
When a transient error occurs, the message is routed
to a retry topic. A retry application retries (potentially
multiple times) processing the message.
WHEN TO USE?
In the case of transient errors.
Example:
− An external service (web service, DB) is used to
process the message and its availability cannot
be guaranteed.
DISCUSSION
− Ordering is no longer guaranteed
− Retries do not require manual intervention
(as opposed to DLT)
− There can be multiple retry topics
− Typically, when the maximum number of retries is
reached, the message is routed to a DLT
DESCRIPTION
When a message related to an item cannot be
processed, all messages related to that same item will
be queued until the problem is solved.
WHEN TO USE?
When the ordering must be maintained at all costs.
Example:
− Bank transactions
DISCUSSION
− Complex to implement. No OOTB
support in Spring
− Can also be applied to a retry topic
− Different variants exist (e.g. with a
redirect topic, see
https://www.confluent.io/blog/error-handling-
patterns-in-kafka/)
DESCRIPTION
We want to update the database and send a message
atomically in order to avoid data inconsistencies.
WHEN TO USE?
When two-phase commit is not possible or not
desirable.
DISCUSSION
− With Spring, transaction synchronization can be
used as an alternative
− Transaction synchronization uses the best effort
1-phase commit algorithm
− CDC is typically used for the relay service
(e.g. with Debezium)
DESCRIPTION
We want to implement transactions that
span multiple services. Messages are used
to start, commit and rollback local
transactions.
WHEN TO USE?
When Two-phase commit is not possible
or not desirable.
DISCUSSION
We distinguish between:
− Choreography-based Saga: each local
transaction publishes domain events
that trigger local transactions in other
services
− Orchestration-based Saga: an
orchestrator coordinates the different
local transactions
https://github.com/cedric-schaller/kafka-error-handling
Error Handling with Kafka: From Patterns to Code

More Related Content

Similar to Error Handling with Kafka: From Patterns to Code

Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleScyllaDB
 
Serverless lessons learned #8 backoff
Serverless lessons learned #8 backoffServerless lessons learned #8 backoff
Serverless lessons learned #8 backoffMaik Wiesmüller
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps_Fest
 
Data power Performance Tuning
Data power Performance TuningData power Performance Tuning
Data power Performance TuningKINGSHUK MAJUMDER
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_storedrewz lin
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producerconfluent
 
Indeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformHostedbyConfluent
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introductionSyed Hadoop
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetesTarek Ali
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
TorqueBox at DC:JBUG - November 2011
TorqueBox at DC:JBUG - November 2011TorqueBox at DC:JBUG - November 2011
TorqueBox at DC:JBUG - November 2011bobmcwhirter
 

Similar to Error Handling with Kafka: From Patterns to Code (20)

CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Understanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at ScaleUnderstanding Apache Kafka P99 Latency at Scale
Understanding Apache Kafka P99 Latency at Scale
 
Redis
RedisRedis
Redis
 
AWS.doc
AWS.docAWS.doc
AWS.doc
 
ecs-presentation
ecs-presentationecs-presentation
ecs-presentation
 
Serverless lessons learned #8 backoff
Serverless lessons learned #8 backoffServerless lessons learned #8 backoff
Serverless lessons learned #8 backoff
 
Kafka vs kinesis
Kafka vs kinesisKafka vs kinesis
Kafka vs kinesis
 
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
DevOps Fest 2020. Сергій Калінець. Building Data Streaming Platform with Apac...
 
Data power Performance Tuning
Data power Performance TuningData power Performance Tuning
Data power Performance Tuning
 
Nginx dhruba mandal
Nginx dhruba mandalNginx dhruba mandal
Nginx dhruba mandal
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
Distribute key value_store
Distribute key value_storeDistribute key value_store
Distribute key value_store
 
Common issues with Apache Kafka® Producer
Common issues with Apache Kafka® ProducerCommon issues with Apache Kafka® Producer
Common issues with Apache Kafka® Producer
 
Kafka Deep Dive
Kafka Deep DiveKafka Deep Dive
Kafka Deep Dive
 
Indeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment Platform
 
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQ
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
 
Highly available (ha) kubernetes
Highly available (ha) kubernetesHighly available (ha) kubernetes
Highly available (ha) kubernetes
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
TorqueBox at DC:JBUG - November 2011
TorqueBox at DC:JBUG - November 2011TorqueBox at DC:JBUG - November 2011
TorqueBox at DC:JBUG - November 2011
 

More from HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

More from HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 

Error Handling with Kafka: From Patterns to Code

  • 1. Error handling with Apache Kafka From patterns to code
  • 2.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 14.
  • 15. WHAT KAFKA DOES FOR YOU (SINCE KAFKA 3.0) acks: all The leader waits for all in-sync replicas to acknowledge the record. This ensures that the record is not lost as long as at least one in-sync replica remains alive. enable.idempotence: true Ensures that exactly one copy of each message is written in the stream even in the event of a broker failure max.in.flight.requests.per.connection: 5 The maximum number of unacknowledged requests the client will send on a single connection before blocking. Together with enable.idempotence=true, this ensures ordering preservation WHAT YOU NEED TO DO bootstrap.servers List more than one broker to ensure that the cluster can be discovered even if one broker is down transactional.id Enables transactional delivery If transactional.id is configured, enable.idempotence is implied OTHER PROPERTIES YOU MIGHT WANT TO TUNE retries max.block.ms delivery.timeout.ms
  • 16. WHAT KAFKA DOES FOR YOU enable.auto.commit: true If true, the consumer’s offset will be periodically committed in the background. auto.commit.interval.ms: 5000 This represents a reasonable trade-off between performance and reliability WHAT YOU NEED TO DO bootstrap.servers List more than one broker to ensure that the cluster can be discovered even if one broker is down isolation.level: read_committed Controls how to read messages written transactionally. Non-transactional messages will be returned unconditionally in either mode. OTHER PROPERTIES YOU MIGHT WANT TO TUNE heartbeat.interval.ms session.timeout.ms connections.max.idle.ms max.poll.interval.ms request.timeout.ms reconnect.backoff.max.ms reconnect.backoff.ms retry.backoff.ms
  • 17. WHAT KAFKA DOES FOR YOU unclean.leader.election.enable: false Replica which are not in sync cannot become leaders. This prevents data losses (at the expense of availability) offsets.topic.replication.factor: 3 The replication factor for the offsets topic (set higher to enhance reliability) WHAT YOU NEED TO DO (default.)replication.factor: 1 -> 3 Determines how many times a topic will be replicated (including the leader) min.insync.replicas: 1 -> 2 Specifies the minimum number of replicas that must acknowledge a write for it to be considered committed with ack=all. Otherwise, the producer will raise an exception broker.rack Rack of the broker. Rack aware replication will try to replicate across racks for higher fault tolerance OTHER PROPERTIES YOU MIGHT WANT TO TUNE cleanup.policy log.retention.* delete.retention.ms
  • 18. IMPROVE RELIABILITY default.deserialization.exception.handler If not set, the Kafka Streams engine will shut down in the event of a deserialization error default.production.exception.handler If not set, the Kafka Streams engine will shut down in the event of a serialization error processing.guarantee: exactly_once_v2 As an alternative, the application can be designed with idempotence in mind retries: 2147483647 Setting a value greater than zero will cause the client to resend any request that fails with a potentially transient error IMPROVE AVAILABILITY num.standby.replicas: 0 -> 1 Standby replicas are shadow copies of local state stores. They are used to minimize the latency of task failover. state.dir Directory where the state is stored. Should be a persistent volume in Kubernetes for faster start-up time
  • 19.
  • 20. DESCRIPTION When an error occurs, the application stops. Manual intervention is required. WHEN TO USE? When all input events must be processed in order without exception. Examples: − Kafka Streams applications stop when an unhandled error is encountered − In Change Data Capture (CDC), events need to be processed in order, to guarantee consistency DISCUSSION − This is the easiest and safest approach (at the expense of availability) − Stopping the world might not be acceptable in a production environment
  • 21. DESCRIPTION When an error occurs, the message is routed to a dead letter topic. Processing of subsequent messages carries on. WHEN TO USE? When retries do not make sense (e.g. NullPointerException) or maximum number of retries has been exhausted. Example: − Temperature measurements in an IOT cluster DISCUSSION − Ordering is no longer guaranteed − If this is acceptable, guarantees a higher availability than “stop on error” − A process to handle the messages in the dead letter topic must be defined
  • 22. DESCRIPTION When a transient error occurs, the message is routed to a retry topic. A retry application retries (potentially multiple times) processing the message. WHEN TO USE? In the case of transient errors. Example: − An external service (web service, DB) is used to process the message and its availability cannot be guaranteed. DISCUSSION − Ordering is no longer guaranteed − Retries do not require manual intervention (as opposed to DLT) − There can be multiple retry topics − Typically, when the maximum number of retries is reached, the message is routed to a DLT
  • 23. DESCRIPTION When a message related to an item cannot be processed, all messages related to that same item will be queued until the problem is solved. WHEN TO USE? When the ordering must be maintained at all costs. Example: − Bank transactions DISCUSSION − Complex to implement. No OOTB support in Spring − Can also be applied to a retry topic − Different variants exist (e.g. with a redirect topic, see https://www.confluent.io/blog/error-handling- patterns-in-kafka/)
  • 24.
  • 25.
  • 26.
  • 27. DESCRIPTION We want to update the database and send a message atomically in order to avoid data inconsistencies. WHEN TO USE? When two-phase commit is not possible or not desirable. DISCUSSION − With Spring, transaction synchronization can be used as an alternative − Transaction synchronization uses the best effort 1-phase commit algorithm − CDC is typically used for the relay service (e.g. with Debezium)
  • 28. DESCRIPTION We want to implement transactions that span multiple services. Messages are used to start, commit and rollback local transactions. WHEN TO USE? When Two-phase commit is not possible or not desirable. DISCUSSION We distinguish between: − Choreography-based Saga: each local transaction publishes domain events that trigger local transactions in other services − Orchestration-based Saga: an orchestrator coordinates the different local transactions