SlideShare a Scribd company logo
1 of 124
#DevoxxFR
Apache Kafka
Patterns / AntiPatterns
Florent Ramière @framiere
Jean-Louis Boudart @jlboudart
1
2
Problem ?
3
Silos explained by Data Gravity concept
As data accumulates (builds mass) there is a greater
likelihood that additional services and applications will
be attracted to this data.
This is the same effect gravity has on objects around a
planet. As the mass or density increases, so does the
strength of gravitational pull.
4
With
5
How
6
Store & ETL Process
Publish &
Subscribe
In short
7
From a simple idea
8
From a simple idea
9
with great properties !
• Scalability
• Retention
• Durability
• Replication
• Security
• Resiliency
• Throughput
• Ordering
• Exactly Once Semantic
• Transaction
• Idempotency
• Immutability
• …
10
10
So goooooood
11
11
What could potentially go wrong ?
12
13
13
…which is true for any data systems
14
14
Not thinking about Durability
15
Data durability If you didn’t think
about it… it’s not
durable!
16
17
18
19
19
And you might have lost data!
20
Data durability Kafka is not waiting
for a disk flush by
default.
Durability is achieved
through replication.
21
22
23
24
25
Data durability It depends on your
configuration...
26
27
28
29
30
Data durability acks=1 (default value)
good for latency
acks=all
good for durability
31
32
acks=all The leader will wait for
the full set of in-sync
replicas to acknowledge
the record.
33
34
35
36
min.insync.replicas minimum number of
replicas that must
acknowledge.
Default is 1.
37
38
38
Data Durability while Producing ?
Tune it with the parameters acks and
min.insync.replicas
39
defaults The default values are
optimized for availability
& latency.
If durability is more
important, tune it!
40
40
Deploying on multi datacenters ?
41
42
Multi-dc It’s quite complicated...
It’s easy to make it wrong
on many levels.
It could be a 3h talk.
43
Multi-dc Disaster recovery for
multi datacenter
44
44
What about the consumers ?
45
consumers Consumer can read only
committed data.
46
47
47
Think about data durability and
decide of the best trade-off for you
48
Throughput, latency,
durability, availability
Optimizing your Apache
Kafka deployment
49
49
Focusing only on the happy path
50
51
52
53
retries It will cause the client to
resend any record whose
send fails with a
potentially transient error.
Default value : 0
54
55
retries Use built in retries !
Bump it from 0 to infinity!
56
retries But you are exposed to a
different kind of issue…
57
58
enable.idempotence When set to 'true',
the producer will ensure
that exactly
one copy of each
message is written.
Default value: false
59
60
61
61
Use built in idempotency!
62
62
But it does not save you from
- Managing exception and failure
- Developing Idempotent consumer
63
63
No Idempotent consumer
64
65
65
At least once (default)
At most once
Exactly Once
66
67
68
69
70
71
72
commit Manually committing
aggressively...
Add a huge workload on
Apache Kafka
73
74
commit Manually committing
aggressively...
Does not provide exactly
once semantic
75
75
Embrace at least once
76
76
Rely on Kafka Streams
with Exactly Once !
77
77
No exception handling
78
79
Future<RecordMetadata> send(ProducerRecord<K, V> record);
80
Future<RecordMetadata> send(ProducerRecord<K, V> record,
Callback callback);
81
producer.send(record, (metadata, exception) -> {
});
82
error handling We don’t expect the
unexpected until the
unexpected is expected.
83
84
error handling A message can not be
processed
85
error handling A message can not be
processed
A message doesn’t have
the expected schema
86
86
Retry
87
88
88
Infinite retry
89
90
90
Write to a dead letter queue and
continue
91
92
92
Ignore and continue
93
94
94
No silver bullet
95
95
Handle the exceptions !
https://eng.uber.com/reliable-reprocessing/
96
96
No data governance
97
98
99
100
governance Changes in producers
might impact consumers
101
governance Schema registry
102
103
104
104
Share Schemas
105
105
Let bad citizens wander around
106
107
107
Leverage Security, ACL and Quota
Security
Authorization and ACLs
Enforcing Client Quotas
108
108
Installing prod on Sunday night
109
110
configuration If you use the default
configuration…
You will have issues!
111
111
Please read the doc
Running Kafka in Production
Running ZooKeeper in Production
112
112
Not configuring your OS
113
114
os Tune at least your open
file descriptors and
mmap count.
115
115
Configure your os
Running Kafka in Production
116
116
Disregarding Apache Zookeeper
117
117
Not understanding Ordering
118
118
No monitoring
119
119
Too much partitions
120
120
Not enough partitions
121
121
Partition key choice
122
122
Topics vs Partitions
123
123
Call external services in Kafka
Streams
124
124
Questions

More Related Content

What's hot

What's hot (20)

Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Stream processing using Kafka
Stream processing using KafkaStream processing using Kafka
Stream processing using Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache Kafka® Security Overview
Apache Kafka® Security OverviewApache Kafka® Security Overview
Apache Kafka® Security Overview
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
Kafka 101 and Developer Best Practices
Kafka 101 and Developer Best PracticesKafka 101 and Developer Best Practices
Kafka 101 and Developer Best Practices
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Apache Kafka - Martin Podval
Apache Kafka - Martin PodvalApache Kafka - Martin Podval
Apache Kafka - Martin Podval
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Similar to Apache Kafka - Patterns anti-patterns

Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
Peter Lawrey
 

Similar to Apache Kafka - Patterns anti-patterns (20)

Paris Kafka Meetup - patterns anti-patterns
Paris Kafka Meetup -  patterns anti-patternsParis Kafka Meetup -  patterns anti-patterns
Paris Kafka Meetup - patterns anti-patterns
 
Webinar patterns anti patterns
Webinar patterns anti patternsWebinar patterns anti patterns
Webinar patterns anti patterns
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Reliability Guarantees for Apache Kafka
Reliability Guarantees for Apache KafkaReliability Guarantees for Apache Kafka
Reliability Guarantees for Apache Kafka
 
Apache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-PatternApache Kafka – (Pattern and) Anti-Pattern
Apache Kafka – (Pattern and) Anti-Pattern
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
 
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
 
Spark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream ProcessingSpark Streaming & Kafka-The Future of Stream Processing
Spark Streaming & Kafka-The Future of Stream Processing
 
Building Asynchronous Applications
Building Asynchronous ApplicationsBuilding Asynchronous Applications
Building Asynchronous Applications
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and HadoopEventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
Voxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservicesVoxxed Vienna 2015 Fault tolerant microservices
Voxxed Vienna 2015 Fault tolerant microservices
 
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.comRuby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com
Ruby Proxies for Scale, Performance, and Monitoring - GoGaRuCo - igvita.com
 
Advanced off heap ipc
Advanced off heap ipcAdvanced off heap ipc
Advanced off heap ipc
 
ExpertTalks Manchester September 2018
ExpertTalks Manchester September 2018ExpertTalks Manchester September 2018
ExpertTalks Manchester September 2018
 

More from Florent Ramiere

More from Florent Ramiere (9)

Back to database fundamentals aka the origin of the streaming platform.
Back to database fundamentals aka the origin of the streaming platform.Back to database fundamentals aka the origin of the streaming platform.
Back to database fundamentals aka the origin of the streaming platform.
 
Perfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka PerformancesPerfug 20-11-2019 - Kafka Performances
Perfug 20-11-2019 - Kafka Performances
 
Back to database fundamentals
Back to database fundamentalsBack to database fundamentals
Back to database fundamentals
 
Beyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème KafkaBeyond the brokers - Un tour de l'écosystème Kafka
Beyond the brokers - Un tour de l'écosystème Kafka
 
Jug - ecosystem
Jug -  ecosystemJug -  ecosystem
Jug - ecosystem
 
Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28Paris jug ksql - 2018-06-28
Paris jug ksql - 2018-06-28
 
Chti jug - 2018-06-26
Chti jug - 2018-06-26Chti jug - 2018-06-26
Chti jug - 2018-06-26
 
Riviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQLRiviera Jug - 20/03/2018 - KSQL
Riviera Jug - 20/03/2018 - KSQL
 
Riviera Jug - 20/03/2018 - Kafka streams
Riviera Jug - 20/03/2018 - Kafka streamsRiviera Jug - 20/03/2018 - Kafka streams
Riviera Jug - 20/03/2018 - Kafka streams
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
Entropy, Software Quality, and Innovation (presented at Princeton Plasma Phys...
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
A Python-based approach to data loading in TM1 - Using Airflow as an ETL for TM1
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purityAPVP,apvp apvp High quality supplier safe spot transport, 98% purity
APVP,apvp apvp High quality supplier safe spot transport, 98% purity
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024OpenChain @ LF Japan Executive Briefing - May 2024
OpenChain @ LF Japan Executive Briefing - May 2024
 
What need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java DevelopersWhat need to be mastered as AI-Powered Java Developers
What need to be mastered as AI-Powered Java Developers
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 

Apache Kafka - Patterns anti-patterns

Editor's Notes

  1. This is how a company is built Event-driven architecture isn’t new, but different enough now that it is a new type of animal. If we’re successful this will be a major data platform in companies and will redefine the architecture of a digital company. The people here will be part of making that happen.
  2. So what is a streaming platform? There are a set of core capabilities around data streams you have to have... The first is the ability to publish and subscribe to streams of data. This is something that’s been around for a long time. Messaging systems have been able to do this. What’s different now is the ability to store data and do it properly in a replicated manner. The final capability is to be able to process these streams of data. Initially starting off as a messaging system, over the years Kafka has evolved into a full-fledged distributed streaming platform that embodies the quintessential characteristics of this new category of infrastructure .