SlideShare a Scribd company logo
Why Apache Kafka
Clusters Are Like
Galaxies
And Other Cosmic Kafka Quandaries
Explored
Paul Brebner
Instaclustr Technology Evangelist
June 5 2024
© 2024 NetApp, Inc. All rights reserved.
Performance Engineering Track on again at
Community over Code 7-10 October Denver 2024
© 2024 NetApp, Inc. All rights reserved.
© 2024 NetApp, Inc. All rights reserved.
3
Instaclustr Managed Platform
• Cloud Platform for Big
Data Open Source
Technologies
• Free 30 day trial
• Focus of this talk is on
Apache Kafka®
Centenary of Franz Kafka’s death - June 3 2024
© 2024 NetApp, Inc. All rights reserved.
4
Head of Kafka, Prague (Paul Brebner)
Overview
1 Kafka Scalability
2 Kafka Clusters and Zipf’s Law
3 Kafka Clusters and Storage
4 Top 10 Kafka Clusters and Performance
Thanks to Instaclustr colleagues for Kafka cluster data:
Kafka Clusters & Storage - Alastair Daivis & Kafka Team
Top 10 Clusters - Joseph Clay & Ramana Selvaratnam (Technical Operations Team)
A note on Kafka cluster metrics
Easy Performance Metrics Harder
Broker Cluster All Clusters
Size Metrics available
Focus of our metrics
collection is
Per broker
Not per cluster or all clusters
DALL·E 3
Part 1 Kafka Scalability
(Source: Shutterstock)
Kafka is a distributed streams processing
system—it allows distributed producers to send
messages to distributed consumers via a Kafka
cluster.
What is
Kafka?
Cluster = Brokers + Partitions
Enabling Write & Read Concurrency
Partition n
Topic
Partition 1
Producer
Partition 2
Consumer Group
Consumer
Consumer
Consumers share
work within groups
Consumer
Partitions enable Consumers to share work
(c.f. Amish Barn raising) within a consumer group
Multiple groups enable message broadcasting.
Messages are duplicated (c.f. clones) across groups, as each
consumer group receives a copy of each message.
Multiple Groups Enable Message Broadcasting
Consumer
Consumer
Consumer
Consumer
Topic
Partition 1
Partition 2
Partition n
Producer
Consumer Group
Consumer Group
Messages are
duplicated across
Consumer groups
Messages are duplicated (c.f. clones) across groups,
as each consumer group receives a copy of each message
Partitions – concurrency mechanism –
more is better – until it’s not
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
0
0.5
1
1.5
2
2.5
1 10 100 1000 10000
Partitions vs. Throughput (M TPS)
ZK TPS (M) KRAFT TPS (M) 2020 TPS (M)
2022 - Better
2020 - Worse
2022 results better due to improvements to Kafka and h/w
© 2024 NetApp, Inc. All rights reserved.
17
• Horizontal Scalability (Brokers/Nodes)
• Vertical Scalability (more/faster cores per Broker)
• Hardware (cores, CPU speed/type, RAM, disk, network, etc)
• Partitions + Consumers
• Optimise number of Partitions
• Consumer speed optimization (slow consumers are bad – high latency and too many partitions)
• Kafka cluster and client configurations (many and complex)
• Goals are typically
• High Throughput
• Fast Latency (low 10s ms)
Kafka Scalability and Performance Summary
(Slow consumers are a problem: Getty)
Part 2 Kafka Clusters and Zipf’s Law – size
distribution
Visual size comparison of the six largest Local Group galaxies, with details (Wikipedia)
© 2024 NetApp, Inc. All rights reserved.
19
• Distribution function
• Most frequent observation is twice as common
• as next and so on (i.e. 1/rank)
• Long-tailed distribution
• 80/20 rule (20% of people own 80% of $)
• C.f. Pareto (discrete vs. continuous)
• Log-log rank vs frequency/size gives approx. straight line
• Common examples
• Frequency of words
• Wealth distribution
• Animal species size
• Earthquakes
• City sizes
• Computer systems (e.g. workload modelling, subsystem capacity)
• Galaxy sizes
Scaling/power law
Zipf’s Law
© 2024 NetApp, Inc. All rights reserved.
20
• Question: How large are the largest
structures in the universe?
• Answer: Bigger!
• Zipf’s law predicted that
• bigger galaxies would be detected in older parts of
the universe
• beyond the reach of the Hubble at the time
• confirmed with the James Webb telescope
observations
• But what’s this got to do with Kafka?
Size and Scale Predictions
Apache Kafka + Galaxies?
Image from NASA’s James Webb Space Telescope showing older and bigger galaxy clusters
© 2024 NetApp, Inc. All rights reserved.
21
Raw Kafka Cluster Size Data - Summary Statistics
3 3 3
4.520702635
7.023373433
96
797
3603
1
10
100
1000
10000
Nodes/Cluster
Summary Statistics (log nodes/cluster)
min median mode average stdev max count sum
© 2024 NetApp, Inc. All rights reserved.
22
Histogram (size vs count) – skewed distribution
Raw Kafka Cluster Size Data
0
100
200
300
400
500
600
700
800
Total
3 4 6 8 9 12 15 18 21 24 27 30 33 36 39 48 60 72 78 96
© 2024 NetApp, Inc. All rights reserved.
23
What is the distribution? Definitely a long-tailed power law
Kafka Clusters and Zipf’s Law
0
20
40
60
80
100
120
0 100 200 300 400 500 600 700 800 900
Size
Cluster
Cluster Size Distribution (largest to smallest)
© 2024 NetApp, Inc. All rights reserved.
24
Approximately Zipfian
Kafka Clusters – log size vs log rank
1
10
100
1000
1 10 100
Log
rank
Log size
Kafka Clusters - Log size vs log rank
© 2024 NetApp, Inc. All rights reserved.
25
Can expect larger clusters (animals, galaxies etc)
So What? Kafka and Zipf’s Law (1)
African Elephant, 7 t
Maraapunisaurus, extinct dinosaur, 150 t
© 2024 NetApp, Inc. All rights reserved.
26
Extrapolation of size from Zipf’s law + largest observed cluster
Predicted larger clusters
0.1
1
10
100
1000
1 10 100 1000
Log
rank
Log size
Kafka Clusters - Log size vs log rank
Rank Predicted larger clusters
Predicted larger clusters
Larger
© 2024 NetApp, Inc. All rights reserved.
27
Estimate total nodes for more clusters
Animal transportation problem
So What? Kafka and Zipf’s Law (2)
How many animals can fit in a boat? Public Domain
© 2024 NetApp, Inc. All rights reserved.
28
Total weight of animals on Ark (assuming Elephant is the largest) tends to 90 tonnes
If you know the size of the biggest thing you can predict the total size
© 2024 NetApp, Inc. All rights reserved.
29
Only increases total nodes by 25%
Doubling number of Kafka clusters
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
0 200 400 600 800 1000 1200 1400 1600 1800
Cumulative total nodes
100% more clusters
25%
more
nodes
Part 3 Kafka cluster storage
DALL·E 3
Storage for all Kafka clusters
Available from a recent project
© 2024 NetApp, Inc. All rights reserved.
31
Correlation coefficient between size and disk = 0.9
5.6 PB Total Disk across all Kafka clusters
Raw Data – total disk per cluster
0
50
100
150
200
250
300
350
400
450
500
0 20 40 60 80 100 120
Disk
(TB)
Nodes/cluster
Disk (TB) per cluster
© 2024 NetApp, Inc. All rights reserved.
32
• Disk space used is a function of average write rate x average message size x retention period x RF (Little’s
Law)
• Our metrics our total disk available, not used
• Some clusters are DEV not PROD – not real workloads, and RF may be < 3
• Approximation - number of nodes as a proxy for cluster size – actual instance sizes impact capacity
• Kafka log retention policy and time impact how many messages are retained
• Kafka clusters are sized for peak load not average load
• Some clusters may be older than others (disk can be increased)
• Write vs. Read workload imbalance
• Some clusters may have higher write workload rate (requiring more disk) vs.
• Higher read workload rates (requiring less disk)
What’s going on?
0
100
200
300
400
500
0 20 40 60 80 100 120
Disk
(TB)
Nodes/cluster
Disk (TB) per cluster
Part 4
Performance Metrics for
Top Ten Kafka Clusters
Top 10 tallest buildings (Wikipedia)
But in reality more people are killed by horses, cows, dogs,
and bees than kangaroos, sharks, snakes, crocodiles,
emus, jellyfish, etc!
Most Dangerous
Australian Critters?
Ranking can be tricky
Most “dangerous” = most teeth? Most venomous?
(Paul Brebner) (Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
36
• For all clusters
• Size (number of nodes) and type
• Disk (from extra project)
• Performance Metrics are collected for all clusters
• But not easily available as the focus is per-cluster operations
• Requested Performance Metrics for Top Ten Clusters
• What did I get?
• Static (per cluster):
• Nodes, Topics, Partitions
• For 24 hours (per broker):
• Resource Utilisation: CPU (avg, max)
• Throughput: Bytes in (avg, max), Bytes out (avg, max), Messages in (avg, max) [Have to scale by number of nodes to get cluster metrics]
• Performance: Producer and consumer latency (avg, p99)
What metrics are available for Kafka clusters?
Broker metrics need scaling to cluster metrics
Variation in broker metric values
24 hour sampling loses accuracy
24 hour sample size is limited/biased
Real workloads not benchmarking
Ten biggest clusters by node count only
Speculative Results!
Warning!
© 2024 NetApp, Inc. All rights reserved.
37
Min, Avg, Max
Summary Statistics: Nodes, Topics, Partitions
27
7
2598
56.4
429.7
92145.3
96
1755
508800
1
10
100
1000
10000
100000
1000000
Nodes Topics Partitions
Nodes, Topics, Partitions (Log)
Min Avg Max
© 2024 NetApp, Inc. All rights reserved.
38
Summary Statistics: CPU, GB/s in/out, Message/s (in)
2
0.396
0.12
24.5
3.14175
1.419
67.5
14.4
8.4
0.1
1
10
100
CPU Bytes in/out (GB/s) Messages in (M/s)
CPU, GB/s (in+out), Messages/s (in, M/s) (Log)
Min Avg Max
© 2024 NetApp, Inc. All rights reserved.
39
Producers faster than Consumers
Note that some clusters use EBS, others use SSDs (faster!)
Summary Statistics: Latency (ms)
0.075
6.5
3.2925
106.65
90
700
0.01
0.1
1
10
100
1000
Producer latency (ms) Consumer latency (ms)
Latency (Log)
Min Avg Max
© 2024 NetApp, Inc. All rights reserved.
40
50% of clusters have sub 50ms average latency
Consumer latency distribution
0
50
100
150
200
250
300
350
1 2 3 4 5 6 7 8 9 10
Latency distribution (ms) – increasing
© 2024 NetApp, Inc. All rights reserved.
41
150-3k Bytes
Summary Statistics: Message size (Bytes)
150
1163.950072
3000
0
500
1000
1500
2000
2500
3000
3500
Message size (avg, Bytes)
Message size (avg, Bytes)
min avg max
© 2024 NetApp, Inc. All rights reserved.
42
0.4 to 25 Million/s
Using Average message size, compute messages out à total messages in+out
0
5
10
15
20
25
30
Msgs in+out (M/s)
Msgs in+out (M/s)
min avg max
© 2024 NetApp, Inc. All rights reserved.
43
1.4 to 28 – i.e. 28 consumer groups potentially
Fan out (ratio of consumer to producer messages)
0
5
10
15
20
25
30
Fan out
Fan out
min avg max
© 2024 NetApp, Inc. All rights reserved.
44
Knowing metrics for top 10 clusters we can estimate total values for ALL CLUSTERS
27K topics (probably underestimate), 5.8 M partitions; 321-564 Million messages/s
Assuming Zipf distribution…
27.45051596
5.886516239
321.3248554
564.9712845
1
10
100
1000
1
Grand Totals for All Kafka Clusters
Topics (k) Partitions (M) Msgs in+out (avg, M/s) Msgs in+out (max, M/s)
© 2024 NetApp, Inc. All rights reserved.
45
Nodes – 27 to 96 (1% of clusters, 564 nodes total, 16% of total nodes overall)
Static data – top 10 clusters (largest on right)
27
36 36
48
51
60 60
72
78
96
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10
Nodes/Cluster
© 2024 NetApp, Inc. All rights reserved.
46
Ranges, odd ones out
Biggest (10) cluster has most partitions; cluster 6 has “hottest” topics (max partitions/topic)
Topics/Partitions/Nodes
7
631
13
1337
57 7 27 101
362
1755
0
500
1000
1500
2000
1 2 3 4 5 6 7 8 9 10
Topics/Cluster
6675
57672
2598
200490
11940 11394 13038 23046
85800
508800
0
100000
200000
300000
400000
500000
600000
1 2 3 4 5 6 7 8 9 10
Partitions/Cluster
0
200
400
600
800
1000
1200
1400
1600
1800
1 2 3 4 5 6 7 8 9 10
Partitions/Topic
0
200
400
600
800
1000
1200
1400
1600
1800
2000
1 2 3 4 5 6 7 8 9 10
Partitions/Node
Most topics Most partitions
Hottest topics
© 2024 NetApp, Inc. All rights reserved.
47
Cluster 4 has highest max = highest topics/partitions per cluster/node
Cluster 6 has highest average = highest partitions/topic (“hot” topics)
These are both ”hot” clusters
CPU
0%
10%
20%
30%
40%
50%
60%
70%
80%
1 2 3 4 5 6 7 8 9 10
CPU (Avg, max)
CPU (Avg) CPU (max)
Hottest
Hot
© 2024 NetApp, Inc. All rights reserved.
48
Topics? Theory and our Technical operations people say probably not
as topics are not correlated to throughput (or size)
Correlation = 0.4, some known smaller clusters with way more topics (e.g. 10,000!)
Any obvious correlations to cluster size?
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 20 40 60 80 100 120
Total topics in cluster
© 2024 NetApp, Inc. All rights reserved.
49
Partitions are related to throughput and size in theory
Correlation = 0.63, and the largest cluster has most and above average partitions/nodes
Size/Partition correlation?
0
100000
200000
300000
400000
500000
600000
0 20 40 60 80 100 120
Total partitions
© 2024 NetApp, Inc. All rights reserved.
50
Average – poor correlation
Size/Throughput?
0
5000000
10000000
15000000
20000000
25000000
0 20 40 60 80 100 120
Msgs in+out (avg/s)
© 2024 NetApp, Inc. All rights reserved.
51
Max – poor correlation
But avg & peak TP correlates with “hot” cluster
Real workloads in 24 hour sample period don’t necessarily correlate with cluster capacities
Size/Throughput?
0
5000000
10000000
15000000
20000000
25000000
30000000
0 20 40 60 80 100 120
Msgs in+out (max/s)
© 2024 NetApp, Inc. All rights reserved.
52
• AWS ARM Graviton2 R6g high price performance for memory-intensive workloads
• R6g.4xlarge 16 core (EBS) (4 clusters)
• R6g.2xlarge 8 cores (EBS) (2 clusters)
• AWS ARM Graviton2 Im4gn Nitro SSD for I/O intensive workloads
• Im4gn.4xlarge 16 core SSD (2 clusters, including “hot” cluster)
• AWS ARM Graviton2 M6g for balanced workloads
• M6gd.4xlarge 16 cores SSD (1 cluster)
• AWS x86 I3en for data-intensive workloads
• I3en.3xlarge 12 cores SSD (1 cluster)
A mix of EC2 instance types/sizes (4/5) and storage - EBS (6)/SSD (4)
Top 10 clusters have heterogeneous h/w
© 2024 NetApp, Inc. All rights reserved.
53
Good correlation (0.8) – definite increase in total cores for bigger clusters
Cores per Cluster
0
200
400
600
800
1000
1200
1400
1600
1800
0 20 40 60 80 100 120
Cores
per
cluster
Nodes per cluster
Cores per cluster
© 2024 NetApp, Inc. All rights reserved.
54
• Insights from our Techops team – thanks!
• Biggest cluster (#10)
• Over provisioned, 96 nodes, 1536 cores
• EBS (slow)
• Peak in messages/s = 1M/s
• Consumer latency 200 - 400ms
• Runs “cool” (18-45%)
• Most partitions (0.5088 Million)
• Hottest cluster (#6)
• 60 nodes, 960 cores
• Runs “hot” (45-55%)
• But lowest consumer latency
• Faster SSDs
• Few topics, most partitions/topic (hot “topics”)
Drill down
Biggest cluster vs “hottest” cluster
© 2024 NetApp, Inc. All rights reserved.
55
Average for cluster = 290 ms but actually a large variation across brokers
Also illustrates that metrics are per broker – and have wide variability
© 2024 NetApp, Inc. All rights reserved.
56
For target throughput how many cores and partitions are needed (in practice need both)?
Can only predict a range from this data (avg=conservative; max=optimistic)
Capacity Planning
6288.039891
431.386635
25583.88158
2155.566642
0
5000
10000
15000
20000
25000
30000
Msgs/s per core Msgs/s per partition
Msgs/s per core and partition
Avg Max
© 2024 NetApp, Inc. All rights reserved.
57
Range: Avg (conservative), Max (optimistic)
Cores for target throughput (x2 max current cluster)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
0 10 20 30 40 50 60
TPS (Million/s) vs Cores
Cores (avg) Cores (max)
© 2024 NetApp, Inc. All rights reserved.
58
Range: Avg (conservative) max (optimistic)
Note: This is probably skewed due to large cluster with most partitions having low throughput
and “hot” cluster with highest throughput having few partitions!
Partitions for target throughput (x2 max current cluster)
0
20000
40000
60000
80000
100000
120000
0 10 20 30 40 50 60
TPS (Million/s) vs Partitions
Partitions (avg) Partitions (max)
© 2024 NetApp, Inc. All rights reserved.
59
• Lots of small clusters
• Few big clusters
• Even bigger clusters are likely
• A wide distribution of sizes is observed
• Kafka is horizontally scalable
• Fits many different customer workloads
• Some customers have many smaller clusters
• Some clusters grow in size over time
Conclusions?
Kafka cluster size distribution is Zipfian
DALL·E 3
© 2024 NetApp, Inc. All rights reserved.
61
• Wide range of workloads, throughputs, hot vs cold CPU, fan-outs, latency,
message size and hardware
• Some interesting “odd ones out”
• Biggest
• Hottest
• Performance metrics were
• biased & coarse grain
• due to broker level collection and 24 hour sample & average & summary
• and from real workloads not benchmarks
• Hard to find correlations and make accurate predictions
• Some broad correlations and range predictions possible
Conclusions?
Top 10 clusters are “diverse”
(Paul Brebner)
Adolf Hoffmeister & Franz Kafka (Wikimedia)
© 2024 NetApp, Inc. All rights reserved.
63
• Is normal for our managed Kafka clusters
• Usage/workload varies widely for customers
• Including topics, partitions, throughput, message sizes, client
settings (e.g. batching), fan-out, latency SLAs etc
• Many bigger clusters are dedicated to very specific customer
workloads
• Higher throughput clusters are not representative of lower
throughput clusters
• Hardware varies and is optimized/customized to take into
account specific customer workloads, cost and SLA
requirements
Conclusions?
Custom Cluster Optimization and Sizing
DALL·E 3
© 2024 NetApp, Inc. All rights reserved.
64
• Performance prediction from coarse-grained
metrics feels like Déjà vu
• 2007-2017 I developed an automated approach
to Performance Modelling from distributed
application traces
• This could work for Kafka
• Instrument Apache Kafka source code with
OpenTelemetry to provide
• Kafka specific resource (CPU, IO, network) + time spans
• Run Kafka benchmarks on representative hardware
• Transform OT traces into a performance model
• Make more accurate predictions
Conclusions?
Performance Prediction
DALL·E 3
© 2024 NetApp, Inc. All rights reserved.
65
What next?
• Try us out!
• Free 30 day trial
• Developer size clusters
• www.instaclustr.com
• All my blogs (100+):
• https://instaclustr.com/paul-brebner
Thank you
© 2024 NetApp, Inc. All rights reserved.

More Related Content

Similar to Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandaries Explored)

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
Hiromitsu Komatsu
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
Tal Lavian Ph.D.
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
Robert Metzger
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
Ryousei Takano
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
DataStax Academy
 
Ceilosca
CeiloscaCeilosca
Ceilosca
Fabio Giannetti
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
In-Memory Computing Summit
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
Jorge E. López de Vergara Méndez
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
HostedbyConfluent
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
confluent
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
James Sirota
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
sabnees
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
balmanme
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
OPNFV
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
balmanme
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
inside-BigData.com
 

Similar to Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandaries Explored) (20)

Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...AIST Super Green Cloud: lessons learned from the operation and the performanc...
AIST Super Green Cloud: lessons learned from the operation and the performanc...
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
Typesafe & William Hill: Cassandra, Spark, and Kafka - The New Streaming Data...
 
Ceilosca
CeiloscaCeilosca
Ceilosca
 
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
IMCSummit 2015 - Day 2 IT Business Track - 4 Myths about In-Memory Databases ...
 
On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...On the feasibility of 40 Gbps network data capture and retention with general...
On the feasibility of 40 Gbps network data capture and retention with general...
 
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.ioKickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
Kickstart your Kafka with Faker Data | Francesco Tisiot, Aiven.io
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
Cisco OpenSOC
Cisco OpenSOCCisco OpenSOC
Cisco OpenSOC
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup2018 03 25 system ml ai and openpower meetup
2018 03 25 system ml ai and openpower meetup
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...Network-aware Data Management for High Throughput Flows   Akamai, Cambridge, ...
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
 
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.ioFast datastacks - fast and flexible nfv solution stacks leveraging fd.io
Fast datastacks - fast and flexible nfv solution stacks leveraging fd.io
 
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...Network-aware Data Management for Large Scale Distributed Applications, IBM R...
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
 
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road AheadAmazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
Amazon Elastic Fabric Adapter: Anatomy, Capabilities, and the Road Ahead
 

More from Paul Brebner

30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes
Paul Brebner
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
Architecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data TechnologiesArchitecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data Technologies
Paul Brebner
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
Paul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Paul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
Paul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
Paul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
Paul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
Paul Brebner
 

More from Paul Brebner (20)

30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes30 Of My Favourite Open Source Technologies In 30 Minutes
30 Of My Favourite Open Source Technologies In 30 Minutes
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
Architecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data TechnologiesArchitecting Applications With Multiple Open Source Big Data Technologies
Architecting Applications With Multiple Open Source Big Data Technologies
 
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining PhilosophersApache ZooKeeper and Apache Curator: Meet the Dining Philosophers
Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 

Recently uploaded

active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
sudsdeep
 
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to KnowThe Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
onemonitarsoftware
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
karim wahed
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
ThousandEyes
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Softwares
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
Severalnines
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
908dutch
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
shivamt017
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docxComprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Aardwolf Security
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Sparity1
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
rachitkumar09887
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
sudsdeep
 
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdfBuilding infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
mohitd6
 
Odoo E-commerce website development guides
Odoo E-commerce website development guidesOdoo E-commerce website development guides
Odoo E-commerce website development guides
jhkdigitalmarketing
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
akshesh doshi
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
Philip Schwarz
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
ssuser2b426d1
 

Recently uploaded (20)

active-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptxactive-directory-auditing-solution (2).pptx
active-directory-auditing-solution (2).pptx
 
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to KnowThe Ultimate Guide to Phone Spy Apps: Everything You Need to Know
The Ultimate Guide to Phone Spy Apps: Everything You Need to Know
 
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
AWS Cloud Practitioner Essentials (Second Edition) (Arabic) Course Introducti...
 
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
Cisco Live Announcements: New ThousandEyes Release Highlights - July 2024
 
NBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial CompanyNBFC Software: Optimize Your Non-Banking Financial Company
NBFC Software: Optimize Your Non-Banking Financial Company
 
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
 
Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …Prada Group Reports Strong Growth in First Quarter …
Prada Group Reports Strong Growth in First Quarter …
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
 
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docxComprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
 
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
Agra Girls Call Agra 0X0000000X Unlimited Short Providing Girls Service Avail...
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
 
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdfBuilding infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
Building infrastructure with code_ A deep dive into CDK for IaC in Java.pdf
 
Odoo E-commerce website development guides
Odoo E-commerce website development guidesOdoo E-commerce website development guides
Odoo E-commerce website development guides
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
 
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTIONBITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
BITCOIN HEIST RANSOMEWARE ATTACK PREDICTION
 

Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandaries Explored)

  • 1. Why Apache Kafka Clusters Are Like Galaxies And Other Cosmic Kafka Quandaries Explored Paul Brebner Instaclustr Technology Evangelist June 5 2024 © 2024 NetApp, Inc. All rights reserved.
  • 2. Performance Engineering Track on again at Community over Code 7-10 October Denver 2024 © 2024 NetApp, Inc. All rights reserved.
  • 3. © 2024 NetApp, Inc. All rights reserved. 3 Instaclustr Managed Platform • Cloud Platform for Big Data Open Source Technologies • Free 30 day trial • Focus of this talk is on Apache Kafka®
  • 4. Centenary of Franz Kafka’s death - June 3 2024 © 2024 NetApp, Inc. All rights reserved. 4 Head of Kafka, Prague (Paul Brebner)
  • 5. Overview 1 Kafka Scalability 2 Kafka Clusters and Zipf’s Law 3 Kafka Clusters and Storage 4 Top 10 Kafka Clusters and Performance Thanks to Instaclustr colleagues for Kafka cluster data: Kafka Clusters & Storage - Alastair Daivis & Kafka Team Top 10 Clusters - Joseph Clay & Ramana Selvaratnam (Technical Operations Team)
  • 6. A note on Kafka cluster metrics Easy Performance Metrics Harder Broker Cluster All Clusters Size Metrics available Focus of our metrics collection is Per broker Not per cluster or all clusters DALL·E 3
  • 7. Part 1 Kafka Scalability (Source: Shutterstock)
  • 8. Kafka is a distributed streams processing system—it allows distributed producers to send messages to distributed consumers via a Kafka cluster. What is Kafka?
  • 9. Cluster = Brokers + Partitions Enabling Write & Read Concurrency
  • 10. Partition n Topic Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer Partitions enable Consumers to share work (c.f. Amish Barn raising) within a consumer group
  • 11. Multiple groups enable message broadcasting. Messages are duplicated (c.f. clones) across groups, as each consumer group receives a copy of each message. Multiple Groups Enable Message Broadcasting Consumer Consumer Consumer Consumer Topic Partition 1 Partition 2 Partition n Producer Consumer Group Consumer Group Messages are duplicated across Consumer groups Messages are duplicated (c.f. clones) across groups, as each consumer group receives a copy of each message
  • 12. Partitions – concurrency mechanism – more is better – until it’s not You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 0 0.5 1 1.5 2 2.5 1 10 100 1000 10000 Partitions vs. Throughput (M TPS) ZK TPS (M) KRAFT TPS (M) 2020 TPS (M) 2022 - Better 2020 - Worse 2022 results better due to improvements to Kafka and h/w
  • 13. © 2024 NetApp, Inc. All rights reserved. 17 • Horizontal Scalability (Brokers/Nodes) • Vertical Scalability (more/faster cores per Broker) • Hardware (cores, CPU speed/type, RAM, disk, network, etc) • Partitions + Consumers • Optimise number of Partitions • Consumer speed optimization (slow consumers are bad – high latency and too many partitions) • Kafka cluster and client configurations (many and complex) • Goals are typically • High Throughput • Fast Latency (low 10s ms) Kafka Scalability and Performance Summary (Slow consumers are a problem: Getty)
  • 14. Part 2 Kafka Clusters and Zipf’s Law – size distribution Visual size comparison of the six largest Local Group galaxies, with details (Wikipedia)
  • 15. © 2024 NetApp, Inc. All rights reserved. 19 • Distribution function • Most frequent observation is twice as common • as next and so on (i.e. 1/rank) • Long-tailed distribution • 80/20 rule (20% of people own 80% of $) • C.f. Pareto (discrete vs. continuous) • Log-log rank vs frequency/size gives approx. straight line • Common examples • Frequency of words • Wealth distribution • Animal species size • Earthquakes • City sizes • Computer systems (e.g. workload modelling, subsystem capacity) • Galaxy sizes Scaling/power law Zipf’s Law
  • 16. © 2024 NetApp, Inc. All rights reserved. 20 • Question: How large are the largest structures in the universe? • Answer: Bigger! • Zipf’s law predicted that • bigger galaxies would be detected in older parts of the universe • beyond the reach of the Hubble at the time • confirmed with the James Webb telescope observations • But what’s this got to do with Kafka? Size and Scale Predictions Apache Kafka + Galaxies? Image from NASA’s James Webb Space Telescope showing older and bigger galaxy clusters
  • 17. © 2024 NetApp, Inc. All rights reserved. 21 Raw Kafka Cluster Size Data - Summary Statistics 3 3 3 4.520702635 7.023373433 96 797 3603 1 10 100 1000 10000 Nodes/Cluster Summary Statistics (log nodes/cluster) min median mode average stdev max count sum
  • 18. © 2024 NetApp, Inc. All rights reserved. 22 Histogram (size vs count) – skewed distribution Raw Kafka Cluster Size Data 0 100 200 300 400 500 600 700 800 Total 3 4 6 8 9 12 15 18 21 24 27 30 33 36 39 48 60 72 78 96
  • 19. © 2024 NetApp, Inc. All rights reserved. 23 What is the distribution? Definitely a long-tailed power law Kafka Clusters and Zipf’s Law 0 20 40 60 80 100 120 0 100 200 300 400 500 600 700 800 900 Size Cluster Cluster Size Distribution (largest to smallest)
  • 20. © 2024 NetApp, Inc. All rights reserved. 24 Approximately Zipfian Kafka Clusters – log size vs log rank 1 10 100 1000 1 10 100 Log rank Log size Kafka Clusters - Log size vs log rank
  • 21. © 2024 NetApp, Inc. All rights reserved. 25 Can expect larger clusters (animals, galaxies etc) So What? Kafka and Zipf’s Law (1) African Elephant, 7 t Maraapunisaurus, extinct dinosaur, 150 t
  • 22. © 2024 NetApp, Inc. All rights reserved. 26 Extrapolation of size from Zipf’s law + largest observed cluster Predicted larger clusters 0.1 1 10 100 1000 1 10 100 1000 Log rank Log size Kafka Clusters - Log size vs log rank Rank Predicted larger clusters Predicted larger clusters Larger
  • 23. © 2024 NetApp, Inc. All rights reserved. 27 Estimate total nodes for more clusters Animal transportation problem So What? Kafka and Zipf’s Law (2) How many animals can fit in a boat? Public Domain
  • 24. © 2024 NetApp, Inc. All rights reserved. 28 Total weight of animals on Ark (assuming Elephant is the largest) tends to 90 tonnes If you know the size of the biggest thing you can predict the total size
  • 25. © 2024 NetApp, Inc. All rights reserved. 29 Only increases total nodes by 25% Doubling number of Kafka clusters 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 200 400 600 800 1000 1200 1400 1600 1800 Cumulative total nodes 100% more clusters 25% more nodes
  • 26. Part 3 Kafka cluster storage DALL·E 3 Storage for all Kafka clusters Available from a recent project
  • 27. © 2024 NetApp, Inc. All rights reserved. 31 Correlation coefficient between size and disk = 0.9 5.6 PB Total Disk across all Kafka clusters Raw Data – total disk per cluster 0 50 100 150 200 250 300 350 400 450 500 0 20 40 60 80 100 120 Disk (TB) Nodes/cluster Disk (TB) per cluster
  • 28. © 2024 NetApp, Inc. All rights reserved. 32 • Disk space used is a function of average write rate x average message size x retention period x RF (Little’s Law) • Our metrics our total disk available, not used • Some clusters are DEV not PROD – not real workloads, and RF may be < 3 • Approximation - number of nodes as a proxy for cluster size – actual instance sizes impact capacity • Kafka log retention policy and time impact how many messages are retained • Kafka clusters are sized for peak load not average load • Some clusters may be older than others (disk can be increased) • Write vs. Read workload imbalance • Some clusters may have higher write workload rate (requiring more disk) vs. • Higher read workload rates (requiring less disk) What’s going on? 0 100 200 300 400 500 0 20 40 60 80 100 120 Disk (TB) Nodes/cluster Disk (TB) per cluster
  • 29. Part 4 Performance Metrics for Top Ten Kafka Clusters Top 10 tallest buildings (Wikipedia)
  • 30. But in reality more people are killed by horses, cows, dogs, and bees than kangaroos, sharks, snakes, crocodiles, emus, jellyfish, etc! Most Dangerous Australian Critters? Ranking can be tricky Most “dangerous” = most teeth? Most venomous? (Paul Brebner) (Wikimedia)
  • 31. © 2024 NetApp, Inc. All rights reserved. 36 • For all clusters • Size (number of nodes) and type • Disk (from extra project) • Performance Metrics are collected for all clusters • But not easily available as the focus is per-cluster operations • Requested Performance Metrics for Top Ten Clusters • What did I get? • Static (per cluster): • Nodes, Topics, Partitions • For 24 hours (per broker): • Resource Utilisation: CPU (avg, max) • Throughput: Bytes in (avg, max), Bytes out (avg, max), Messages in (avg, max) [Have to scale by number of nodes to get cluster metrics] • Performance: Producer and consumer latency (avg, p99) What metrics are available for Kafka clusters? Broker metrics need scaling to cluster metrics Variation in broker metric values 24 hour sampling loses accuracy 24 hour sample size is limited/biased Real workloads not benchmarking Ten biggest clusters by node count only Speculative Results! Warning!
  • 32. © 2024 NetApp, Inc. All rights reserved. 37 Min, Avg, Max Summary Statistics: Nodes, Topics, Partitions 27 7 2598 56.4 429.7 92145.3 96 1755 508800 1 10 100 1000 10000 100000 1000000 Nodes Topics Partitions Nodes, Topics, Partitions (Log) Min Avg Max
  • 33. © 2024 NetApp, Inc. All rights reserved. 38 Summary Statistics: CPU, GB/s in/out, Message/s (in) 2 0.396 0.12 24.5 3.14175 1.419 67.5 14.4 8.4 0.1 1 10 100 CPU Bytes in/out (GB/s) Messages in (M/s) CPU, GB/s (in+out), Messages/s (in, M/s) (Log) Min Avg Max
  • 34. © 2024 NetApp, Inc. All rights reserved. 39 Producers faster than Consumers Note that some clusters use EBS, others use SSDs (faster!) Summary Statistics: Latency (ms) 0.075 6.5 3.2925 106.65 90 700 0.01 0.1 1 10 100 1000 Producer latency (ms) Consumer latency (ms) Latency (Log) Min Avg Max
  • 35. © 2024 NetApp, Inc. All rights reserved. 40 50% of clusters have sub 50ms average latency Consumer latency distribution 0 50 100 150 200 250 300 350 1 2 3 4 5 6 7 8 9 10 Latency distribution (ms) – increasing
  • 36. © 2024 NetApp, Inc. All rights reserved. 41 150-3k Bytes Summary Statistics: Message size (Bytes) 150 1163.950072 3000 0 500 1000 1500 2000 2500 3000 3500 Message size (avg, Bytes) Message size (avg, Bytes) min avg max
  • 37. © 2024 NetApp, Inc. All rights reserved. 42 0.4 to 25 Million/s Using Average message size, compute messages out à total messages in+out 0 5 10 15 20 25 30 Msgs in+out (M/s) Msgs in+out (M/s) min avg max
  • 38. © 2024 NetApp, Inc. All rights reserved. 43 1.4 to 28 – i.e. 28 consumer groups potentially Fan out (ratio of consumer to producer messages) 0 5 10 15 20 25 30 Fan out Fan out min avg max
  • 39. © 2024 NetApp, Inc. All rights reserved. 44 Knowing metrics for top 10 clusters we can estimate total values for ALL CLUSTERS 27K topics (probably underestimate), 5.8 M partitions; 321-564 Million messages/s Assuming Zipf distribution… 27.45051596 5.886516239 321.3248554 564.9712845 1 10 100 1000 1 Grand Totals for All Kafka Clusters Topics (k) Partitions (M) Msgs in+out (avg, M/s) Msgs in+out (max, M/s)
  • 40. © 2024 NetApp, Inc. All rights reserved. 45 Nodes – 27 to 96 (1% of clusters, 564 nodes total, 16% of total nodes overall) Static data – top 10 clusters (largest on right) 27 36 36 48 51 60 60 72 78 96 0 20 40 60 80 100 120 1 2 3 4 5 6 7 8 9 10 Nodes/Cluster
  • 41. © 2024 NetApp, Inc. All rights reserved. 46 Ranges, odd ones out Biggest (10) cluster has most partitions; cluster 6 has “hottest” topics (max partitions/topic) Topics/Partitions/Nodes 7 631 13 1337 57 7 27 101 362 1755 0 500 1000 1500 2000 1 2 3 4 5 6 7 8 9 10 Topics/Cluster 6675 57672 2598 200490 11940 11394 13038 23046 85800 508800 0 100000 200000 300000 400000 500000 600000 1 2 3 4 5 6 7 8 9 10 Partitions/Cluster 0 200 400 600 800 1000 1200 1400 1600 1800 1 2 3 4 5 6 7 8 9 10 Partitions/Topic 0 200 400 600 800 1000 1200 1400 1600 1800 2000 1 2 3 4 5 6 7 8 9 10 Partitions/Node Most topics Most partitions Hottest topics
  • 42. © 2024 NetApp, Inc. All rights reserved. 47 Cluster 4 has highest max = highest topics/partitions per cluster/node Cluster 6 has highest average = highest partitions/topic (“hot” topics) These are both ”hot” clusters CPU 0% 10% 20% 30% 40% 50% 60% 70% 80% 1 2 3 4 5 6 7 8 9 10 CPU (Avg, max) CPU (Avg) CPU (max) Hottest Hot
  • 43. © 2024 NetApp, Inc. All rights reserved. 48 Topics? Theory and our Technical operations people say probably not as topics are not correlated to throughput (or size) Correlation = 0.4, some known smaller clusters with way more topics (e.g. 10,000!) Any obvious correlations to cluster size? 0 200 400 600 800 1000 1200 1400 1600 1800 2000 0 20 40 60 80 100 120 Total topics in cluster
  • 44. © 2024 NetApp, Inc. All rights reserved. 49 Partitions are related to throughput and size in theory Correlation = 0.63, and the largest cluster has most and above average partitions/nodes Size/Partition correlation? 0 100000 200000 300000 400000 500000 600000 0 20 40 60 80 100 120 Total partitions
  • 45. © 2024 NetApp, Inc. All rights reserved. 50 Average – poor correlation Size/Throughput? 0 5000000 10000000 15000000 20000000 25000000 0 20 40 60 80 100 120 Msgs in+out (avg/s)
  • 46. © 2024 NetApp, Inc. All rights reserved. 51 Max – poor correlation But avg & peak TP correlates with “hot” cluster Real workloads in 24 hour sample period don’t necessarily correlate with cluster capacities Size/Throughput? 0 5000000 10000000 15000000 20000000 25000000 30000000 0 20 40 60 80 100 120 Msgs in+out (max/s)
  • 47. © 2024 NetApp, Inc. All rights reserved. 52 • AWS ARM Graviton2 R6g high price performance for memory-intensive workloads • R6g.4xlarge 16 core (EBS) (4 clusters) • R6g.2xlarge 8 cores (EBS) (2 clusters) • AWS ARM Graviton2 Im4gn Nitro SSD for I/O intensive workloads • Im4gn.4xlarge 16 core SSD (2 clusters, including “hot” cluster) • AWS ARM Graviton2 M6g for balanced workloads • M6gd.4xlarge 16 cores SSD (1 cluster) • AWS x86 I3en for data-intensive workloads • I3en.3xlarge 12 cores SSD (1 cluster) A mix of EC2 instance types/sizes (4/5) and storage - EBS (6)/SSD (4) Top 10 clusters have heterogeneous h/w
  • 48. © 2024 NetApp, Inc. All rights reserved. 53 Good correlation (0.8) – definite increase in total cores for bigger clusters Cores per Cluster 0 200 400 600 800 1000 1200 1400 1600 1800 0 20 40 60 80 100 120 Cores per cluster Nodes per cluster Cores per cluster
  • 49. © 2024 NetApp, Inc. All rights reserved. 54 • Insights from our Techops team – thanks! • Biggest cluster (#10) • Over provisioned, 96 nodes, 1536 cores • EBS (slow) • Peak in messages/s = 1M/s • Consumer latency 200 - 400ms • Runs “cool” (18-45%) • Most partitions (0.5088 Million) • Hottest cluster (#6) • 60 nodes, 960 cores • Runs “hot” (45-55%) • But lowest consumer latency • Faster SSDs • Few topics, most partitions/topic (hot “topics”) Drill down Biggest cluster vs “hottest” cluster
  • 50. © 2024 NetApp, Inc. All rights reserved. 55 Average for cluster = 290 ms but actually a large variation across brokers Also illustrates that metrics are per broker – and have wide variability
  • 51. © 2024 NetApp, Inc. All rights reserved. 56 For target throughput how many cores and partitions are needed (in practice need both)? Can only predict a range from this data (avg=conservative; max=optimistic) Capacity Planning 6288.039891 431.386635 25583.88158 2155.566642 0 5000 10000 15000 20000 25000 30000 Msgs/s per core Msgs/s per partition Msgs/s per core and partition Avg Max
  • 52. © 2024 NetApp, Inc. All rights reserved. 57 Range: Avg (conservative), Max (optimistic) Cores for target throughput (x2 max current cluster) 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 0 10 20 30 40 50 60 TPS (Million/s) vs Cores Cores (avg) Cores (max)
  • 53. © 2024 NetApp, Inc. All rights reserved. 58 Range: Avg (conservative) max (optimistic) Note: This is probably skewed due to large cluster with most partitions having low throughput and “hot” cluster with highest throughput having few partitions! Partitions for target throughput (x2 max current cluster) 0 20000 40000 60000 80000 100000 120000 0 10 20 30 40 50 60 TPS (Million/s) vs Partitions Partitions (avg) Partitions (max)
  • 54. © 2024 NetApp, Inc. All rights reserved. 59 • Lots of small clusters • Few big clusters • Even bigger clusters are likely • A wide distribution of sizes is observed • Kafka is horizontally scalable • Fits many different customer workloads • Some customers have many smaller clusters • Some clusters grow in size over time Conclusions? Kafka cluster size distribution is Zipfian DALL·E 3
  • 55. © 2024 NetApp, Inc. All rights reserved. 61 • Wide range of workloads, throughputs, hot vs cold CPU, fan-outs, latency, message size and hardware • Some interesting “odd ones out” • Biggest • Hottest • Performance metrics were • biased & coarse grain • due to broker level collection and 24 hour sample & average & summary • and from real workloads not benchmarks • Hard to find correlations and make accurate predictions • Some broad correlations and range predictions possible Conclusions? Top 10 clusters are “diverse” (Paul Brebner) Adolf Hoffmeister & Franz Kafka (Wikimedia)
  • 56. © 2024 NetApp, Inc. All rights reserved. 63 • Is normal for our managed Kafka clusters • Usage/workload varies widely for customers • Including topics, partitions, throughput, message sizes, client settings (e.g. batching), fan-out, latency SLAs etc • Many bigger clusters are dedicated to very specific customer workloads • Higher throughput clusters are not representative of lower throughput clusters • Hardware varies and is optimized/customized to take into account specific customer workloads, cost and SLA requirements Conclusions? Custom Cluster Optimization and Sizing DALL·E 3
  • 57. © 2024 NetApp, Inc. All rights reserved. 64 • Performance prediction from coarse-grained metrics feels like Déjà vu • 2007-2017 I developed an automated approach to Performance Modelling from distributed application traces • This could work for Kafka • Instrument Apache Kafka source code with OpenTelemetry to provide • Kafka specific resource (CPU, IO, network) + time spans • Run Kafka benchmarks on representative hardware • Transform OT traces into a performance model • Make more accurate predictions Conclusions? Performance Prediction DALL·E 3
  • 58. © 2024 NetApp, Inc. All rights reserved. 65 What next? • Try us out! • Free 30 day trial • Developer size clusters • www.instaclustr.com • All my blogs (100+): • https://instaclustr.com/paul-brebner
  • 59. Thank you © 2024 NetApp, Inc. All rights reserved.