SlideShare a Scribd company logo
1 of 48
Download to read offline
© Instaclustr Pty Limited, 2023
Fast Open Source Software —
Without The Fury
Paul Brebner
Open Source Technology Evangelist
www.instaclustr.com/paul-brebner/
Community Over Code Asia 2023
Fast Open Source Software
(Source: wikimedia.org)
© Instaclustr Pty Limited, 2023
Enabled by Scalable Big Data
Technologies (e.g. Apache Kafka®)
(Source: wikimedia.org)
© Instaclustr Pty Limited, 2023
And Cloud Infrastructure
(Source: wikimedia.org)
© Instaclustr Pty Limited, 2023
Cars run on roads
S/W runs on H/W
Without the Fury
Fury 1 Too Many Kafka Topics
Fury 2 Slow Consumers
Fury 3 Too Many Kafka Partitions
Fury 4 Single Threaded Consumers—
concurrency limited by
partitions
© Instaclustr Pty Limited, 2023
(Source: Shutterstock)
(Source: Shutterstock)
Cloud Platform for Big Data
Open Source Technologies
Focus of this talk is on
Apache Kafka®
Instaclustr Managed Platform
© Instaclustr Pty Limited, 2023
© Instaclustr Pty Limited, 2023
Kafka is a distributed streams processing
system—it allows distributed producers to send
messages to distributed consumers via a Kafka
cluster.
What is
Kafka?
Partitions Enable Concurrency:
Cluster and Producers
© Instaclustr Pty Limited, 2023
Partitions Enable Concurrency:
Cluster and Consumers
© Instaclustr Pty Limited, 2023
(and followers)
Partition n
Topic
Partition 1
Producer
Partition 2
Consumer Group
Consumer
Consumer
Consumers share
work within groups
Consumer
Partitions enable Consumers to share work
(c.f. Amish Barn raising) within a consumer group
© Instaclustr Pty Limited, 2023
Multiple groups enable message broadcasting.
Messages are duplicated (c.f. clones) across groups, as
each consumer group receives a copy of each message.
Multiple Groups Enable Message Broadcasting
Consumer
Consumer
Consumer
Consumer
Topic
Partition 1
Partition 2
Partition n
Producer
Consumer Group
Consumer Group
Messages are
duplicated across
Consumer groups
Messages are duplicated (c.f. clones) across groups,
as each consumer group receives a copy of each message
© Instaclustr Pty Limited, 2023
©Instaclustr Pty Limited, 2023
“Kongo” Logistics IoT Application
Fury 1: Too Many Kafka Topics
Design Choices:
Many vs. One Topic?
• 100s of locations (Warehouses, Trucks)
• Each location has a topic and multiple
consumer groups (so all the Goods in a
location receive relevant events)
Option 1:
§ Many topics
§ Many consumer groups per topic ->
high fan-out
1. Many (100s) of Topics
© Instaclustr Pty Limited, 2023
2. One Topic, One Consumer Group
• One topic for all locations
• Using an external notification
mechanism for event broadcasting
(Guava Event Bus)
© Instaclustr Pty Limited, 2023
Single Topic/Single Consumer
Group Wins
© Instaclustr Pty Limited, 2023
Many topics, many
consumer groups, 7200
Single topic, single
consumer group,
1120000
0
200000
400000
600000
800000
1000000
1200000
Many topics Single topic
TPS
155 times
better!
But why?
Trains Are More Scalable Than Cars
CRH2 EMU at Xi'an, near Anyuan Gate
(Source: Wikipedia)
© Instaclustr Pty Limited, 2023
High Fan-Out = Lots of On/Off Ramps
© Instaclustr Pty Limited, 2023
(Source: Shutterstock)
Explanation:
High fan-out =
lots of output
data and many
consumer
groups
(resource
intensive)
Many Topics = Traffic Jam
© Instaclustr Pty Limited, 2023
(Source: Shutterstock)
Explanation:
More topics à
more partitions
But Kafka Is Scalable! Bigger Clusters
(Source: Wikimedia)
Add more lanes!
Vertical/Scale-up
Increase Node
Sizes
© Instaclustr Pty Limited, 2023
Horizontal/Scale-Out
Add More Nodes
Add more roads!
(Source: Shutterstock)
© Instaclustr Pty Limited, 2023
©Instaclustr Pty Limited, 2023
Fury 2: Slow Consumers
Kafka+Cassandra Anomaly Detector
Application
Massively Scalable Anomaly
Detection: Tuning Knobs (Orange h/w, Yellow s/w)
• Initially just increased h/w resources
• Scaling was “easy” with Kubernetes
• Easy to create lots of consumers (100s)
• Initially single threaded Kafka consumer
(no thread pools)
© Instaclustr Pty Limited, 2023
But Scalability Not Great:
Scaling Is Too Easy—Scalability Harder
0
1
2
3
4
5
6
7
8
0 100 200 300 400 500 600 700
Billions
checks/day
Total Cores
Total Cores vs. Billions of Checks/Day (pre-tuning)
© Instaclustr Pty Limited, 2023
Slow Kafka Consumers Problem
Default Kafka consumers:
• Are single-threaded
• If the processing is “slow” then queuing
occurs—as the thread is blocked—reducing
throughput
• Solutions include speed up the processing, or
increase the number of consumers
• But more consumers à more partitions
• As each consumer needs 1 or more partitions
(Source: Getty Images)
© Instaclustr Pty Limited, 2023
Single Threaded Kafka Consumers
And Slow Processing = Slow Consumers
S
L
O
W
> 10 ms
• Slow consumers à
need more consumers
and also more
partitions for higher
throughput
• But more consumers
is slow, try speeding
them up
© Instaclustr Pty Limited, 2023
We Need Some Car Mods (Hacks)
© Instaclustr Pty Limited, 2023
(Source: Getty Images)
Multi-Threaded/Two Pool Consumers
The famous Bondi Ocean Pool
(in Sydney, Australia) has 2 pools
(Source: Shutterstock)
© Instaclustr Pty Limited, 2023
Tuning: Optimize Consumer Speed/
Concurrency Using 2 Stage Pipeline
1
2
Less consumers
(around 100) gives
higher throughput—
a surprise!
1. Speed up polling (thread pool 1)
2. Maximize anomaly detector
concurrency (thread pool 2)
Result—Reduces the number of
consumers and therefore partitions
needed and gives higher
throughput—why?
Don’t more partitions give higher
throughput?! Answer in part 3.
© Instaclustr Pty Limited, 2023
Scalability Post-Tuning—7.5 to 19 Billion
Checks/Day—2.5 Times Improvement
0
2
4
6
8
10
12
14
16
18
20
0 100 200 300 400 500 600 700
Billions
checks/day
Total Cores
Total Cores vs. Billions of Checks/Day (pre-tuning)
Billions of checks/day (pre-tuning) Billions of checks/day (post-tuning)
© Instaclustr Pty Limited, 2023
©Instaclustr Pty Limited, 2023
Fury 3: Too Many Partitions
What’s really going on under
the Kafka Bonnet?
(Source: Adobe Stock)
(Source: Adobe Stock)
Partitions = Pistons (Cylinders)
©Instaclustr Pty Limited, 2021
(Source: Getty Images)
© Instaclustr Pty Limited, 2023
But How Many?
©Instaclustr Pty Limited, 2021
Isetta 1-cylinder car
(Source: Wikimedia)
1 piston isn’t very
powerful:
Isetta (bubble car)
single-cylinder, 10HP,
top speed 55MPH
© Instaclustr Pty Limited, 2023
…16 Pistons Is a Lot!
©Instaclustr Pty Limited, 2021
Cadillac V-16
175HP,
100 MPH!
By Ramgeis - fotografiert von Ramgeis in Pebble Beach, Kalifornien im August 2004,
CC BY-SA 3.0
© Instaclustr Pty Limited, 2023
Can You Have “Too Many”?
©Instaclustr Pty Limited, 2021
Source: Wikimedia
YES!
Experimental 42-cylinder
2,350 hp (plane) engine!
© Instaclustr Pty Limited, 2023
Benchmarking (2020): Partitions and
Replication Factor Are the Culprits
0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1 10 100 1000 10000
TPS
Partitions
Kafka Partitions vs. Throughput
Cluster: 3 nodes x 4 cores = 12 cores total
Replication Factor 3 (TPS) Replication Factor 1 (TPS)
You need sufficient
partitions to benefit
from the cluster
concurrency—
And not too many
that the replication
overhead impacts
overall throughput
© Instaclustr Pty Limited, 2023
2022 Kraft/Zookeeper Modes vs.
2020 Results Better, 1000 Partitions Is Ok
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
0
0.5
1
1.5
2
2.5
1 10 100 1000 10000
Partitions vs. Throughput (M TPS)
ZK TPS (M) KRAFT TPS (M) 2020 TPS (M)
2022 - Better
2020 - Worse
© Instaclustr Pty Limited, 2023
©Instaclustr Pty Limited, 2023
Fury 4: Single Threaded Consumers
What’s New? Kafka Parallel
Consumer = A New Engine!
(Source: Adobe Stock)
"La Jamais Contente", first car to reach 100 km/h
in 1899 – 68hp electric!
(Source: Wikimedia)
NIO EP9 – Chinese Electric Super Car:
4 Engines, 1,341 HP, 0-300KM/H in 16s
(Source: Wikimedia)
Theory: Little’s Law:
Concurrency = Throughput x Time
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
Rearranged:
• Throughput =
Concurrency/Time
• Concurrency is Partitions =
Consumers
• Using default consumer the
throughput drops with
increasing time
• Only solution is to increase
partitions
© Instaclustr Pty Limited, 2023
For Given Target Throughput (1M TPS)
Increasing Partitions With Increasing Time
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
© Instaclustr Pty Limited, 2023
Order in Kafka Is Partition Based—
So How To Increase Consumer Concurrency?
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
(Source: Adobe Stock)
© Instaclustr Pty Limited, 2023
Kafka Parallel Consumers:
Multi-Threaded Consumer
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
• Multiple ordering options—c.f. default Kafka only guarantees order within partitions!
PARTITION à KEY à UNORDERED
Increasing concurrency à
• Concurrency from 1 to lots —depends on client resources, and Partitions/Key space sizes
• KEY has higher concurrency than Partition and is ordered by KEY—reasonable compromise
• UNORDERED is unordered
© Instaclustr Pty Limited, 2023
Kafka Parallel Consumers
Multi-Threaded Consumer = Buses
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
(Source: Getty Images)
© Instaclustr Pty Limited, 2023
Theoretical Improvement for Each
Mode – max 3 orders of magnitude
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
1,000 partitions
100 consumers max
© Instaclustr Pty Limited, 2023
Experimental Results:
3, 50, and 200 times improvement, unordered best
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
1 consumer
10 partitions
100 keys
10ms latency
© Instaclustr Pty Limited, 2023
Watch Out for the Kafka Furies
2022 - Better
2020 - Worse
Too many topics
(= too many partitions) Too many consumer groups
Slow consumers
(= too many partitions)
Insufficient/
too many partitions
Single threaded
consumer
(= too many partitions)
© Instaclustr Pty Limited, 2023
Speed can be achieved with train-buses
You need sufficient partitions to benefit from the cluster concurrency
And not too many that the replication overhead impacts overall throughput
2022 - Better
2020 - Worse
© Instaclustr Pty Limited, 2023 Source: Wikimedia
Minimize Topics and Partitions = Tracks
- Buses are fast & self-driving on tracks
Minimize Consumer Groups = Interchanges
- At interchanges, buses fan out onto roads,
reducing passenger transfers
Maximize Consumer Concurrency = Buses
- Multiple passengers,
integrated with road system
Adelaide’s O-Bahn Busway train-bus system
Track + Buses
www.instaclustr.com
info@instaclustr.com
@instaclustr
Q & A
© Instaclustr Pty Limited, 2023
www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!
© Instaclustr Pty Limited, 2023

More Related Content

Similar to DEVELOPING FAST APPLICATIONS WITH OPEN SOURCE SOFTWARE - WITHOUT THE FURY

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
Deepak Shankar
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PROIDEA
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
blewington
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch
 
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
Cisco Service Provider
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013
dfilppi
 

Similar to DEVELOPING FAST APPLICATIONS WITH OPEN SOURCE SOFTWARE - WITHOUT THE FURY (20)

40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:High-performance 32G Fibre Channel Module on MDS 9700 Directors:
High-performance 32G Fibre Channel Module on MDS 9700 Directors:
 
Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power Evaluating UCIe based multi-die SoC to meet timing and power
Evaluating UCIe based multi-die SoC to meet timing and power
 
Kafka vs kinesis
Kafka vs kinesisKafka vs kinesis
Kafka vs kinesis
 
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
PLNOG 13: Alexis Dacquay: Handling high-bandwidth-consumption applications in...
 
MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1MIG 5th Data Centre Summit 2016 PTS Presentation v1
MIG 5th Data Centre Summit 2016 PTS Presentation v1
 
PLNOG 17 - Alexis Dacquay - 100 G, Skalowalność i Widoczność
PLNOG 17 - Alexis Dacquay - 100 G, Skalowalność i WidocznośćPLNOG 17 - Alexis Dacquay - 100 G, Skalowalność i Widoczność
PLNOG 17 - Alexis Dacquay - 100 G, Skalowalność i Widoczność
 
MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
 
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
Cisco cBR-8 Evolved CCAP: Deliver Scalable Network and Service Growth at a Lo...
 
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s EverywhereData Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
Data Networks: Next-Generation Optical Access toward 10 Gb/s Everywhere
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Cray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge SimulationsCray HPC Environments for Leading Edge Simulations
Cray HPC Environments for Leading Edge Simulations
 
MQTT and SensorThings API MQTT Extension
MQTT and SensorThings API MQTT ExtensionMQTT and SensorThings API MQTT Extension
MQTT and SensorThings API MQTT Extension
 
Cassandra summit-2013
Cassandra summit-2013Cassandra summit-2013
Cassandra summit-2013
 
Vector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupVector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar Meetup
 
Maximising ROI and User Experience with Outdoor Small Cells: Airspan
Maximising ROI and User Experience with Outdoor Small Cells: AirspanMaximising ROI and User Experience with Outdoor Small Cells: Airspan
Maximising ROI and User Experience with Outdoor Small Cells: Airspan
 
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...MayaData  Datastax webinar - Operating Cassandra on Kubernetes with the help ...
MayaData Datastax webinar - Operating Cassandra on Kubernetes with the help ...
 
Appliance Launch Webcast
Appliance Launch WebcastAppliance Launch Webcast
Appliance Launch Webcast
 
OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)OpenStack Architected Like AWS (and GCP)
OpenStack Architected Like AWS (and GCP)
 

Recently uploaded

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
Wonjun Hwang
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 

DEVELOPING FAST APPLICATIONS WITH OPEN SOURCE SOFTWARE - WITHOUT THE FURY

  • 1. © Instaclustr Pty Limited, 2023 Fast Open Source Software — Without The Fury Paul Brebner Open Source Technology Evangelist www.instaclustr.com/paul-brebner/ Community Over Code Asia 2023
  • 2. Fast Open Source Software (Source: wikimedia.org) © Instaclustr Pty Limited, 2023
  • 3. Enabled by Scalable Big Data Technologies (e.g. Apache Kafka®) (Source: wikimedia.org) © Instaclustr Pty Limited, 2023
  • 4. And Cloud Infrastructure (Source: wikimedia.org) © Instaclustr Pty Limited, 2023 Cars run on roads S/W runs on H/W
  • 5. Without the Fury Fury 1 Too Many Kafka Topics Fury 2 Slow Consumers Fury 3 Too Many Kafka Partitions Fury 4 Single Threaded Consumers— concurrency limited by partitions © Instaclustr Pty Limited, 2023 (Source: Shutterstock) (Source: Shutterstock)
  • 6. Cloud Platform for Big Data Open Source Technologies Focus of this talk is on Apache Kafka® Instaclustr Managed Platform © Instaclustr Pty Limited, 2023
  • 7. © Instaclustr Pty Limited, 2023 Kafka is a distributed streams processing system—it allows distributed producers to send messages to distributed consumers via a Kafka cluster. What is Kafka?
  • 8. Partitions Enable Concurrency: Cluster and Producers © Instaclustr Pty Limited, 2023
  • 9. Partitions Enable Concurrency: Cluster and Consumers © Instaclustr Pty Limited, 2023 (and followers)
  • 10. Partition n Topic Partition 1 Producer Partition 2 Consumer Group Consumer Consumer Consumers share work within groups Consumer Partitions enable Consumers to share work (c.f. Amish Barn raising) within a consumer group © Instaclustr Pty Limited, 2023
  • 11. Multiple groups enable message broadcasting. Messages are duplicated (c.f. clones) across groups, as each consumer group receives a copy of each message. Multiple Groups Enable Message Broadcasting Consumer Consumer Consumer Consumer Topic Partition 1 Partition 2 Partition n Producer Consumer Group Consumer Group Messages are duplicated across Consumer groups Messages are duplicated (c.f. clones) across groups, as each consumer group receives a copy of each message © Instaclustr Pty Limited, 2023
  • 12. ©Instaclustr Pty Limited, 2023 “Kongo” Logistics IoT Application Fury 1: Too Many Kafka Topics
  • 13. Design Choices: Many vs. One Topic? • 100s of locations (Warehouses, Trucks) • Each location has a topic and multiple consumer groups (so all the Goods in a location receive relevant events) Option 1: § Many topics § Many consumer groups per topic -> high fan-out 1. Many (100s) of Topics © Instaclustr Pty Limited, 2023
  • 14. 2. One Topic, One Consumer Group • One topic for all locations • Using an external notification mechanism for event broadcasting (Guava Event Bus) © Instaclustr Pty Limited, 2023
  • 15. Single Topic/Single Consumer Group Wins © Instaclustr Pty Limited, 2023 Many topics, many consumer groups, 7200 Single topic, single consumer group, 1120000 0 200000 400000 600000 800000 1000000 1200000 Many topics Single topic TPS 155 times better! But why?
  • 16. Trains Are More Scalable Than Cars CRH2 EMU at Xi'an, near Anyuan Gate (Source: Wikipedia) © Instaclustr Pty Limited, 2023
  • 17. High Fan-Out = Lots of On/Off Ramps © Instaclustr Pty Limited, 2023 (Source: Shutterstock) Explanation: High fan-out = lots of output data and many consumer groups (resource intensive)
  • 18. Many Topics = Traffic Jam © Instaclustr Pty Limited, 2023 (Source: Shutterstock) Explanation: More topics à more partitions
  • 19. But Kafka Is Scalable! Bigger Clusters (Source: Wikimedia) Add more lanes! Vertical/Scale-up Increase Node Sizes © Instaclustr Pty Limited, 2023
  • 20. Horizontal/Scale-Out Add More Nodes Add more roads! (Source: Shutterstock) © Instaclustr Pty Limited, 2023
  • 21. ©Instaclustr Pty Limited, 2023 Fury 2: Slow Consumers Kafka+Cassandra Anomaly Detector Application
  • 22. Massively Scalable Anomaly Detection: Tuning Knobs (Orange h/w, Yellow s/w) • Initially just increased h/w resources • Scaling was “easy” with Kubernetes • Easy to create lots of consumers (100s) • Initially single threaded Kafka consumer (no thread pools) © Instaclustr Pty Limited, 2023
  • 23. But Scalability Not Great: Scaling Is Too Easy—Scalability Harder 0 1 2 3 4 5 6 7 8 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of Checks/Day (pre-tuning) © Instaclustr Pty Limited, 2023
  • 24. Slow Kafka Consumers Problem Default Kafka consumers: • Are single-threaded • If the processing is “slow” then queuing occurs—as the thread is blocked—reducing throughput • Solutions include speed up the processing, or increase the number of consumers • But more consumers à more partitions • As each consumer needs 1 or more partitions (Source: Getty Images) © Instaclustr Pty Limited, 2023
  • 25. Single Threaded Kafka Consumers And Slow Processing = Slow Consumers S L O W > 10 ms • Slow consumers à need more consumers and also more partitions for higher throughput • But more consumers is slow, try speeding them up © Instaclustr Pty Limited, 2023
  • 26. We Need Some Car Mods (Hacks) © Instaclustr Pty Limited, 2023 (Source: Getty Images)
  • 27. Multi-Threaded/Two Pool Consumers The famous Bondi Ocean Pool (in Sydney, Australia) has 2 pools (Source: Shutterstock) © Instaclustr Pty Limited, 2023
  • 28. Tuning: Optimize Consumer Speed/ Concurrency Using 2 Stage Pipeline 1 2 Less consumers (around 100) gives higher throughput— a surprise! 1. Speed up polling (thread pool 1) 2. Maximize anomaly detector concurrency (thread pool 2) Result—Reduces the number of consumers and therefore partitions needed and gives higher throughput—why? Don’t more partitions give higher throughput?! Answer in part 3. © Instaclustr Pty Limited, 2023
  • 29. Scalability Post-Tuning—7.5 to 19 Billion Checks/Day—2.5 Times Improvement 0 2 4 6 8 10 12 14 16 18 20 0 100 200 300 400 500 600 700 Billions checks/day Total Cores Total Cores vs. Billions of Checks/Day (pre-tuning) Billions of checks/day (pre-tuning) Billions of checks/day (post-tuning) © Instaclustr Pty Limited, 2023
  • 30. ©Instaclustr Pty Limited, 2023 Fury 3: Too Many Partitions What’s really going on under the Kafka Bonnet? (Source: Adobe Stock) (Source: Adobe Stock)
  • 31. Partitions = Pistons (Cylinders) ©Instaclustr Pty Limited, 2021 (Source: Getty Images) © Instaclustr Pty Limited, 2023
  • 32. But How Many? ©Instaclustr Pty Limited, 2021 Isetta 1-cylinder car (Source: Wikimedia) 1 piston isn’t very powerful: Isetta (bubble car) single-cylinder, 10HP, top speed 55MPH © Instaclustr Pty Limited, 2023
  • 33. …16 Pistons Is a Lot! ©Instaclustr Pty Limited, 2021 Cadillac V-16 175HP, 100 MPH! By Ramgeis - fotografiert von Ramgeis in Pebble Beach, Kalifornien im August 2004, CC BY-SA 3.0 © Instaclustr Pty Limited, 2023
  • 34. Can You Have “Too Many”? ©Instaclustr Pty Limited, 2021 Source: Wikimedia YES! Experimental 42-cylinder 2,350 hp (plane) engine! © Instaclustr Pty Limited, 2023
  • 35. Benchmarking (2020): Partitions and Replication Factor Are the Culprits 0 100000 200000 300000 400000 500000 600000 700000 800000 900000 1 10 100 1000 10000 TPS Partitions Kafka Partitions vs. Throughput Cluster: 3 nodes x 4 cores = 12 cores total Replication Factor 3 (TPS) Replication Factor 1 (TPS) You need sufficient partitions to benefit from the cluster concurrency— And not too many that the replication overhead impacts overall throughput © Instaclustr Pty Limited, 2023
  • 36. 2022 Kraft/Zookeeper Modes vs. 2020 Results Better, 1000 Partitions Is Ok You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 0 0.5 1 1.5 2 2.5 1 10 100 1000 10000 Partitions vs. Throughput (M TPS) ZK TPS (M) KRAFT TPS (M) 2020 TPS (M) 2022 - Better 2020 - Worse © Instaclustr Pty Limited, 2023
  • 37. ©Instaclustr Pty Limited, 2023 Fury 4: Single Threaded Consumers What’s New? Kafka Parallel Consumer = A New Engine! (Source: Adobe Stock) "La Jamais Contente", first car to reach 100 km/h in 1899 – 68hp electric! (Source: Wikimedia) NIO EP9 – Chinese Electric Super Car: 4 Engines, 1,341 HP, 0-300KM/H in 16s (Source: Wikimedia)
  • 38. Theory: Little’s Law: Concurrency = Throughput x Time You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse Rearranged: • Throughput = Concurrency/Time • Concurrency is Partitions = Consumers • Using default consumer the throughput drops with increasing time • Only solution is to increase partitions © Instaclustr Pty Limited, 2023
  • 39. For Given Target Throughput (1M TPS) Increasing Partitions With Increasing Time You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse © Instaclustr Pty Limited, 2023
  • 40. Order in Kafka Is Partition Based— So How To Increase Consumer Concurrency? You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse (Source: Adobe Stock) © Instaclustr Pty Limited, 2023
  • 41. Kafka Parallel Consumers: Multi-Threaded Consumer You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse • Multiple ordering options—c.f. default Kafka only guarantees order within partitions! PARTITION à KEY à UNORDERED Increasing concurrency à • Concurrency from 1 to lots —depends on client resources, and Partitions/Key space sizes • KEY has higher concurrency than Partition and is ordered by KEY—reasonable compromise • UNORDERED is unordered © Instaclustr Pty Limited, 2023
  • 42. Kafka Parallel Consumers Multi-Threaded Consumer = Buses You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse (Source: Getty Images) © Instaclustr Pty Limited, 2023
  • 43. Theoretical Improvement for Each Mode – max 3 orders of magnitude You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse 1,000 partitions 100 consumers max © Instaclustr Pty Limited, 2023
  • 44. Experimental Results: 3, 50, and 200 times improvement, unordered best You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse 1 consumer 10 partitions 100 keys 10ms latency © Instaclustr Pty Limited, 2023
  • 45. Watch Out for the Kafka Furies 2022 - Better 2020 - Worse Too many topics (= too many partitions) Too many consumer groups Slow consumers (= too many partitions) Insufficient/ too many partitions Single threaded consumer (= too many partitions) © Instaclustr Pty Limited, 2023
  • 46. Speed can be achieved with train-buses You need sufficient partitions to benefit from the cluster concurrency And not too many that the replication overhead impacts overall throughput 2022 - Better 2020 - Worse © Instaclustr Pty Limited, 2023 Source: Wikimedia Minimize Topics and Partitions = Tracks - Buses are fast & self-driving on tracks Minimize Consumer Groups = Interchanges - At interchanges, buses fan out onto roads, reducing passenger transfers Maximize Consumer Concurrency = Buses - Multiple passengers, integrated with road system Adelaide’s O-Bahn Busway train-bus system Track + Buses