#DevoxxFR
1
Patterns – Anti-Patterns
Florent Ramière | florent@confluent.io
Damien Gasparina | damien@confluent.io
2
Problem ?
3
Silos explained by Data Gravity concept
As data accumulates (builds mass) there is a greater
likelihood that additional services and applications will
be attracted to this data.
This is the same effect gravity has on objects around a
planet. As the mass or density increases, so does the
strength of gravitational pull.
4
With
5
How
6
Store & ETL Process
Publish &
Subscribe
In short
7
From a simple idea
8
From a simple idea
9
with great properties !
• Scalability
• Retention
• Durability
• Replication
• Security
• Resiliency
• Throughput
• Ordering
• Exactly Once Semantic
• Transaction
• Idempotency
• Immutability
• …
10
10
So goooooood
11
11
What could potentially go wrong ?
12
13
13
…which is true for any data systems
14
14
Not thinking about Durability
15
Data durability If you didn’t think
about it… it’s not
durable!
16
17
18
19
19
And you might have lost data!
20
Data durability Kafka is not waiting
for a disk flush by
default.
Durability is achieved
through replication.
21
22
23
24
25
Data durability It depends on your
configuration...
26
27
28
29
30
Data durability acks=1 (default value)
good for latency
acks=all
good for durability
31
32
acks=all The leader will wait for
the full set of in-sync
replicas to acknowledge
the record.
33
34
35
36
min.insync.replicas minimum number of
replicas that must
acknowledge.
Default is 1.
37
38
38
Data Durability while Producing ?
Tune it with the parameters acks and
min.insync.replicas
39
defaults The default values are
optimized for availability
& latency.
If durability is more
important, tune it!
40
40
Deploying on multi datacenters ?
41
42
Multi-dc It’s quite complicated...
It’s easy to make it wrong
on many levels.
It could be a 3h talk.
43
Multi-dc Disaster recovery for
multi datacenter
44
44
What about the consumers ?
45
consumers Consumer can read only
committed data.
46
47
47
Think about data durability and
decide of the best trade-off for you
48
Throughput, latency,
durability, availability
Optimizing your Apache
Kafka deployment
49
49
Focusing only on the happy path
50
51
52
53
retries It will cause the client to
resend any record whose
send fails with a
potentially transient error.
Default value : 0
54
55
retries Use built in retries !
Bump it from 0 to infinity!
56
retries But you are exposed to a
different kind of issue…
57
58
enable.idempotence When set to 'true',
the producer will ensure
that exactly
one copy of each
message is written.
Default value: false
59
60
61
61
Use built in idempotency!
62
62
But it does not save you from
- Managing exception and failure
- Developing Idempotent consumer
63
63
No Idempotent consumer
64
65
65
At least once (default)
At most once
Exactly Once
66
67
68
69
70
71
72
commit Manually committing
aggressively...
Add a huge workload on
Apache Kafka
73
74
commit Manually committing
aggressively...
Does not provide exactly
once semantic
75
75
Embrace at least once
76
76
Rely on Kafka Streams
with Exactly Once !
77
77
No exception handling
78
79
Future<RecordMetadata> send(ProducerRecord<K, V> record);
80
Future<RecordMetadata> send(ProducerRecord<K, V> record,
Callback callback);
81
producer.send(record, (metadata, exception) -> {
});
82
error handling We don’t expect the
unexpected until the
unexpected is expected.
83
84
error handling A message can not be
processed
85
error handling A message can not be
processed
A message doesn’t have
the expected schema
86
86
Retry
87
88
88
Infinite retry
89
90
90
Write to a dead letter queue and
continue
91
92
92
Ignore and continue
93
94
94
No silver bullet
95
95
Handle the exceptions !
https://eng.uber.com/reliable-reprocessing/
96
96
No data governance
97
98
99
100
governance Changes in producers
might impact consumers
101
governance Schema registry
102
103
104
104
Share Schemas
105
105
Let bad citizens wander around
106
107
107
Leverage Security, ACL and Quota
Security
Authorization and ACLs
Enforcing Client Quotas
108
108
Installing prod on Sunday night
109
110
configuration If you use the default
configuration…
You will have issues!
111
111
Please read the doc
Running Kafka in Production
Running ZooKeeper in Production
112
112
Not configuring your OS
113
114
os Tune at least your open
file descriptors and
mmap count.
115
115
Configure your os
Running Kafka in Production
116
116
Disregarding Apache Zookeeper
117
117
Not understanding Ordering
118
118
No monitoring
119
119
No business monitoring
120
120
Too much partitions
121
121
Not enough partitions
122
122
Partition key choice
123
123
Topics vs Partitions
124
124
Call external services in Kafka
Streams
125
125
No Training/No Mentoring
126
126
Going too fast
127
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Investment & Time
Solve A Critical
Need
Maturity model
128
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
129
Set up secure Kafka
& build your first app
Understand streaming
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Infrastructure & apps
across LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streaming
Self-service on shared
Kafka
Infrastructure &
applications across
LOBs
Monitor & manage a
mission-critical solution
Set up secure Kafka &
build your first app
Understand streamingUnderstand streaming
Pre-streamingValue
Stream Everything
05Break Silos
04
03
Go To Production
02
Learn Kafka
01
Solve A Critical
Need
Maturity model
130
130
Not having the business context!
131
Business Value!
132
132
2 DC – Stretched
133
133
Questions ?
Florent Ramière | florent@confluent.io
Damien Gasparina | damien@confluent.io
(Yes we answer your emails!)

Paris Kafka Meetup - patterns anti-patterns

Editor's Notes

  • #5 This is how a company is built Event-driven architecture isn’t new, but different enough now that it is a new type of animal. If we’re successful this will be a major data platform in companies and will redefine the architecture of a digital company. The people here will be part of making that happen.
  • #7 So what is a streaming platform? There are a set of core capabilities around data streams you have to have... The first is the ability to publish and subscribe to streams of data. This is something that’s been around for a long time. Messaging systems have been able to do this. What’s different now is the ability to store data and do it properly in a replicated manner. The final capability is to be able to process these streams of data. Initially starting off as a messaging system, over the years Kafka has evolved into a full-fledged distributed streaming platform that embodies the quintessential characteristics of this new category of infrastructure .