SlideShare a Scribd company logo
1 of 68
Download to read offline
Apache ZooKeeper and Apache Curator:
Meet the Dining Philosophers
Paul Brebner
Instaclustr by NetApp
© Instaclustr Pty Limited, 2022
ApacheCon NA 2022
Who Am I?
Previously
§ R&D in distributed systems and performance engineering
Last 5 years
§ Technology Evangelist for Instaclustr by NetApp
§ 100+ Blogs, Demo Applications, Talks
§ Benchmarking and Performance Insights
§ Open Source Technologies including
o Apache Cassandra®, Spark™, ZooKeeper, Kafka®
o OpenSearch®, Redis™, PostgreSQL®
o Uber’s Cadence®
I live in Australia!
© Instaclustr Pty Limited, 2022
A ZooKeeper Walks Into a Pub…
© Instaclustr Pty Limited, 2022
The STORY
Actually an Outback Pub
Outback = “Out of the back of …”, e.g. “Back O’ Bourke”
© Instaclustr Pty Limited, 2022
Google Maps
Bourke
Outback
© Instaclustr Pty Limited, 2022
Daly Waters Pub in outback Northern Territory, Australia
(Source: Wikimedia Commons)
The Pub
And Meets Some Unexpected
Guests…
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
(Source: Wikimedia Commons)
…Who Appear to Be Fighting
Over Cutlery
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
(Source: Wikimedia Commons/Shutterstock)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Ludwig, I’m
hungry please
lend me a fork.”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Karl, I don’t fully
understand what
you mean by the
word ‘fork’.”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Ludwig, you are not
sharing fairly! Niccolo,
have you finished with
your fork yet?”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“No, it’s mine! I
will never ever
give you my fork.”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Who invented
these silly rules
anyway?”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Dijkstra—a
‘computer scientist’—
whatever that is!”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
“Why can’t I stage a
proletarian revolution and
take control of all the
means of production—
I mean forks?”
(Source: Wikimedia Commons)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
"Ritual leads to
greater social
harmony”
(Source: Wikimedia Commons)
The ZooKeeper, being
trained in the harmonious
running of zoos, goes
over to see if he can help
out with their resource
sharing problem…
© Instaclustr Pty Limited, 2022
Gentlemen,
perhaps I can be of
assistance. Given
my often complex
and dangerous job
as a ZooKeeper I’m
sure I can help!
© Instaclustr Pty Limited, 2022
None of
my Tasmanian
Devils have eaten any
of the Wallabies—yet!
Not Lunch
(Source: Shutterstock)
© Instaclustr Pty Limited, 2022
Karl (Marx)
Ludwig
(Wittgenstein)
Niccolo
(Machiavelli)
Aristotle
Confucius
Sir, thank you very much!
We accept your kind offer,
as we are very Hungry!
Although we have no conception
of how you will solve our
intractable problem.
But then at least we can get back
to the more serious matters of
Thinking and Eating, hopefully in
approximately equal amounts.
The Dining Philosophers Reply
(for that was the name of their club)
Introducing the ZooKeeper
§ A Mature Apache Technology
• De facto technology for distributed systems
coordination and meta-data storage
• Originated as a sub-project of Hadoop
• Used by lots of projects including Kafka
§ Instaclustr Uses ZooKeeper
• For managed Kafka clusters
• For our managed High Availability
PostgreSQL® service (distributed config
store for Patroni)
• And offers it as …
© Instaclustr Pty Limited, 2022
An Instaclustr Managed Service
© Instaclustr Pty Limited, 2022
Introducing the ZooKeeper
Optimized for
1. Consistency
2. Availability
3. Performance
© Instaclustr Pty Limited, 2022
Consistency With ZAB
§ Single leader for atomic writes
§ Multiple replicas
§ ZAB
• ZooKeeper Atomic Broadcast
• Protocol for sequential ordering and consistency
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Availability
© Instaclustr Pty Limited, 2022
§ No Clusters, but Ensembles
• 1 Leader, multiple replicas
§ If the leader fails then another replica takes over
§ Ensemble available if majority (> ½) of servers running
§ Ensembles should have an odd numbers of servers, e.g.
• 3 servers supports 1 server failure (2 out of 3 running)
• 5 servers support 2 server failures (3 out of 5 running)
• 7 servers support 3 server failures (4 out of 7 running)
§ Reads go to any server (so will succeed if >= 1)
§ Writes go to leader only (and succeed if > ½ servers running)
Performance
© Instaclustr Pty Limited, 2022
§ Objects are in memory
§ Reads are fast, but
§ Writes are slower due to writing disk log
§ Read intensive workloads are best
§ Objects should be small
• Written in Java
• GC can become an overhead
§ What are objects? (Source: Shutterstock)
ZNODES
© Instaclustr Pty Limited, 2022
In Zoology, a phylogenetic tree, or
“tree of life”, shows the relationship
between different species
(Source: Shutterstock)
§ Appropriately (for a ZooKeeper) the only data structure is a tree (of life)
§ Hierarchical namespace:
E.g. “/Animalia/Chordata/Mammalia/Marsupialia/Dasyuromorphia/Dasyuridae/Sarcophilus/TasmanianDevil”
§ C.f. File system
• Each node can have data and children
• Nodes are called ZNODES
• But “nodes” in an Ensemble are called ”servers”
• ZNODES have
o Version control
o Atomic reads/writes
o Replicated (in ensemble)
Trees
© Instaclustr Pty Limited, 2022
ZNODES
© Instaclustr Pty Limited, 2022
§ Persistent or Ephemeral
§ Persistent
• Default
• Outlive client connection that created them
§ Ephemeral
• Transient
• Exist only for the duration of the client session that
created them
• Useful for participant discovery and election triggering
once client dies
(Source: Shutterstock)
Operations on ZNODES
© Instaclustr Pty Limited, 2022
§ Operations
• Create
• Delete
• Exists
• getData
• getChildren
• setData
§ Operations Are Low Level
• Just create and delete ZNODES in hierarchical namespace and
• Read and write data to/from them atomically
• Enable simple naming, configuration and group membership services
• Example recipes for higher operations
(Source: Shutterstock)
Wombats are also “low-level” –
They live under ground.
ZNODES: Watches
© Instaclustr Pty Limited, 2022
§ Clients can set watches on ZNODES
§ Change to ZNODE triggers watch and sends
notification to client
§ But only once! A one-time trigger
§ Set another watch for further notifications
(Source: Shutterstock)
The Dining Philosophers Problem
© Instaclustr Pty Limited, 2022
1 5
3
4
2
The Dining Philosophers Problem
§ Classic distributed systems concurrency and synchronization problem
§ Dijkstra 1965
§ Limited resources—Forks (1 per Philosopher)
§ Multiple consumers (Philosophers)
§ N Philosophers (5) seated around circular table
§ Each has an (infinite) bowl of food in front of them
§ Fork on left and right, same number as Philosophers
§ To eat, a Philosopher must have both left and right forks
• No talking allowed
• No eating with fingers or only one fork (think of them as chopsticks)
• No stealing forks from your neighbour (or anyone else) if they are holding it
© Instaclustr Pty Limited, 2022
Algorithm
Rodin – The Thinker (Wikimedia Commons)
Thinking
Hungry
Eating
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Algorithm
Philosophers alternate between Thinking, Wanting to Eat (Hungry), and Eating
§ Think
• “Think” for a random period of time
§ Hungry
• Enter “Hungry” state
• Wait until the left fork is free and take it
• Wait until the right fork is free and take it
§ Eating
• “Eat” for a random period of time
• Put both forks back on the table
§ Start from Thinking again
Philosophers are silent, can’t “negotiate” or “demand” forks
§ Insufficient forks for everyone to be Eating at same time
§ So some Philosophers will have to wait for one or both forks while Hungry
© Instaclustr Pty Limited, 2022
Problems and Solutions
§ Problems
• Starvation
• Deadlock
• Race Conditions
• Fairness
§ Good Solutions
• Minimize Hungry time
• Maximize Eating concurrency
§ Simplest Solutions
• Semaphores, timeouts around fork waiting, and a centralized Waiter (or ZooKeeper)
• But watch out as the waiter can become the bottleneck
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
My Version Has a Boss
and Timeouts
Use leader election to elect a Boss Philosopher
• Think
§ Check if I am the leader, if I am, then announce a topic of my choice.
§ “Think” for a random period of time.
§ If I was the leader, give up the leadership.
• Hungry
§ Enter “Hungry” state.
§ Wait (with timeout) until the left fork is free and take it.
§ If I have the left fork then wait (with timeout) until the right fork is free and
take it.
• Eating
§ If I have both forks, then “Eat” for a random period of time.
§ If I have any forks then put them back on the table.
§ Start from the top again.
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Example Trace
© Instaclustr Pty Limited, 2022
§ Run Boss Election… [this takes some time]
§ Marx is Thinking…
§ Machiavelli is Thinking…
§ Wittgenstein is Thinking…
§ Aristotle is Thinking…
§ Confucius is Thinking…
Example Trace
© Instaclustr Pty Limited, 2022
wants
w
a
n
t
s
§ Marx is Hungry, wants left fork 1
§ Wittgenstein is Hungry, wants left fork 2
Example Trace
§ Marx got left fork 1
§ Marx wants right fork 2
§ Wittgenstein got left fork 2
• (Marx misses out!)
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
Example Trace
§ Wittgenstein wants right fork 3
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
wants
Example Trace
§ Wittgenstein got right fork 3
§ Wittgenstein is Eating
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
Example Trace
§ Confucius is Hungry, wants left fork 5
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
w
a
n
t
s
Example Trace
§ Confucius got left fork 5
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
Example Trace
§ Confucius wants right fork 1
© Instaclustr Pty Limited, 2022
w
a
n
t
s
1
2
5
4
3
w
a
n
t
s
Example Trace
§ *** Marx gave up waiting for right fork 2
§ Marx putting left fork back: 1
§ Marx is Thinking…
§ Marx is the BOSS!
§ Marx says “Everyone now think about Aesthetics”
© Instaclustr Pty Limited, 2022
2
5
4
3
w
a
n
t
s
1
Example Trace
§ Confucius got right fork 1
§ Confucius is Eating
© Instaclustr Pty Limited, 2022
2
5
4
3
1
Example Trace
© Instaclustr Pty Limited, 2022
§ Confucius finished eating!
§ Putting both forks back: 5, 1
§ Confucius is Thinking…
§ Wittgenstein finished eating!
§ Putting both forks back: 2, 3
§ Wittgenstein is Thinking…
Example Trace
© Instaclustr Pty Limited, 2022
§ Marx is no longer the BOSS
§ Run Boss Election
§ Marx is Hungry, wants left fork 1
wants
Example Trace
© Instaclustr Pty Limited, 2022
§ Marx got left fork 1
§ Marx wants right fork 2
5
4
3
2
1
w
a
n
t
s
Marx Is Finally Happy
© Instaclustr Pty Limited, 2022
§ Marx got right fork 2
§ Marx is Eating (finally!!!)
5
4
3
2
1
Implementation
with Apache Curator
Curators organize and interpret exhibits—
even Zoo animals and Zookeepers!
A Curator also walks into a Pub…
© Instaclustr Pty Limited, 2022
Implementation
with Apache Curator
Dangerous Animals:
Snakes
Stingers
Crocs
Spiders
Sharks
Bluebottles
Shells
Ticks
Ants
Centipedes
Cassowaries…
Don’t Panic!
You are actually more
likely to be injured by
a horse in Australia
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Apache Curator
§ High-level Java client for Apache ZooKeeper
Koalas are also “high-level” – they live in trees
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Client
Apache Curator
§ Lots of recipes
• Elections
• Locks
• Barriers
• Counters
• Catches
• Nodes/Watches
• Queues
Koalas only need 1 recipe: Eucalyptus leaves
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Create and Start Client
int sleepMsBetweenRetries = 100;
int maxRetries = 3;
String zk = "127.0.0.1:2181";
RetryPolicy retryPolicy = new
RetryNTimes(maxRetries, sleepMsBetweenRetries);
CuratorFramework client = CuratorFrameworkFactory
.newClient(zk, retryPolicy);
client.start();
© Instaclustr Pty Limited, 2022
Curator Recipes Used
§ Leader Latch for elections
§ Shared Lock for forks
§ Shared Counter for metrics
© Instaclustr Pty Limited, 2022
Leader Latch
// For each Philosopher thread
LeaderLatch leaderLatch = new LeaderLatch(client, "/mutex/leader/topic", threadName);
leaderLatch.start();
// Think
if (leaderLatch.hasLeadership())
System.out.println("I’m now the BOSS, think about X");
// Think for a while …
// If thread was the leader then relinquish leadership, but rejoin pool again.
if (leaderLatch.hasLeadership())
{
System.out.println("I’m no longer the BOSS");
leaderLatch.close();
leaderLatch = new LeaderLatch(client, "/mutex/leader/topic", threadName);
leaderLatch.start();
}
// Hungry, try to eat, etc.
© Instaclustr Pty Limited, 2022
Shared Lock for Fork Sharing
int numForks = 5; // number of forks = number of Philosophers
static InterProcessSemaphoreMutex[] forks = new InterProcessSemaphoreMutex[numForks];
// Create forks
for (int i=0; i < numForks; i++)
forks[i] = new InterProcessSemaphoreMutex(client, "/mutex/fork" + i);
// In each Philosopher thread:
// Hungry, want left fork
int leftFork = 0;
boolean gotLeft = forks[leftFork].acquire(maxForkWaitTime, TimeUnit.MILLISECONDS);
// to release a fork
forks[leftFork].release();
© Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Shared Counter for Metrics
(using optimistic concurrency – try until success)
int meals = new SharedCount(client, "/counters/meals", 0);
try {
meals.start();
meals.setCount(0);
} catch (Exception e1) { e1.printStackTrace(); }
...
// In each Philosopher thread:
// Each time a Philosopher eats, increment the meal counter
// trySetCount returns true if it succeeds, else false and try again
boolean success = false;
while (!success) {
int before = meals.getCount();
success = meals.trySetCount(before + 1); }
...
// At the end of Dining, print out total meals
System.out.println("Total meals = " + meals.getCount()); © Instaclustr Pty Limited, 2022
(Source: Shutterstock)
Results: Baseline, Single ZK Server
§ 5 Philosophers, very low load on server so “best case” for this version of the
algorithm
§ Utilization for each state:
• Philosophers Thinking = 61%
• Philosophers Hungry = 12%
• Philosophers Eating = 27%
§ In “perfect” world with no fork contention
• Thinking and Eating would be 50% and Hungry 0%
© Instaclustr Pty Limited, 2022
Results: 3 server ZK Ensemble
© Instaclustr Pty Limited, 2022
(Source: Maritza Rios – Wikimedia Commons)
Load Testing on Ensembles
§ 3 x large (2 vcpus) vs.
xlarge (4 vcpus) EC2
servers (6 cores vs. 12
cores)
§ 3600 meals (peak at 150
Philosophers) vs 6400
meals (peak at 200
Philosophers)
© Instaclustr Pty Limited, 2022
Increase in Hungry %:
Faster Increase for 2 cores
© Instaclustr Pty Limited, 2022
Zookeeper Throughput:
1,700 ops/s c.f. 3,600 ops/s
© Instaclustr Pty Limited, 2022
Notes
§ At maximum capacity, leader server saturated, replicas had spare capacity
• As write heavy workload
§ Availability
• Works as advertised
• Killing 1/3 server, leader failed over immediately to replica
• Killing 2/3 servers, Curator client error
© Instaclustr Pty Limited, 2022
Kafka leaves the Zoo(Keeper)
§ Kafka currently uses ZK to
• Store partition and broker metadata
• Elect a broker as Kafka Controller
§ KIP-500 Replace ZK with self-managed Metadata
Quorum KIP-595
• Kafka should be more scalable, faster and support millions of
partitions (!)
§ Other projects also leaving ZooKeeper?
• Flink, Pinot
• Cloud native deployments use Kubernetes etcd service
§ ZooKeeper vs. KRaft
• Latest results!
• Performance Engineering Track, Thursday after lunch
© Instaclustr Pty Limited, 2022
(Source: Wikipedia)
(Source: Shutterstock)
Instaclustr
Managed Platform
A complete ecosystem to support your
mission critical application.
Our other Managed Open Source
Technologies +
© Instaclustr Pty Limited, 2022
Blog: www.instaclustr.com/blog/apache-zookeeper-meets-the-
dining-philosophers/
Further Information
1
All My Blogs: www.instaclustr.com/paul-brebner/
2
Managed ZooKeeper: www.instaclustr.com/platform/managed-
apache-zookeeper/
3
© Instaclustr Pty Limited, 2022
www.instaclustr.com
info@instaclustr.com
@instaclustr
THANK
YOU!
© Instaclustr Pty Limited, 2022
© Instaclustr Pty Limited, 2022
https://www.instaclustr.com/company/policies/terms-conditions/
Except as permitted by the copyright law applicable to you, you may
not reproduce, distribute, publish, display, communicate or transmit
any of the content of this document, in any form, but any means,
without the prior written permission of Instaclustr Pty Limited

More Related Content

More from Paul Brebner

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...Paul Brebner
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaPaul Brebner
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Paul Brebner
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardPaul Brebner
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaPaul Brebner
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Paul Brebner
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Paul Brebner
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialPaul Brebner
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Paul Brebner
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Paul Brebner
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980'sPaul Brebner
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...Paul Brebner
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...Paul Brebner
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...Paul Brebner
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache KafkaPaul Brebner
 

More from Paul Brebner (20)

The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
The Impact of Hardware and Software Version Changes on Apache Kafka Performan...
 
Spinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache KafkaSpinning your Drones with Cadence Workflows and Apache Kafka
Spinning your Drones with Cadence Workflows and Apache Kafka
 
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
Change Data Capture (CDC) With Kafka Connect® and the Debezium PostgreSQL Sou...
 
Scaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/HardScaling Open Source Big Data Cloud Applications is Easy/Hard
Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/HardOPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
OPEN Talk: Scaling Open Source Big Data Cloud Applications is Easy/Hard
 
A Visual Introduction to Apache Kafka
A Visual Introduction to Apache KafkaA Visual Introduction to Apache Kafka
A Visual Introduction to Apache Kafka
 
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
Massively Scalable Real-time Geospatial Anomaly Detection with Apache Kafka a...
 
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
Building a real-time data processing pipeline using Apache Kafka, Kafka Conne...
 
Grid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and PotentialGrid Middleware – Principles, Practice and Potential
Grid Middleware – Principles, Practice and Potential
 
Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...Grid middleware is easy to install, configure, secure, debug and manage acros...
Grid middleware is easy to install, configure, secure, debug and manage acros...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
Melbourne Big Data Meetup Talk: Scaling a Real-Time Anomaly Detection Applica...
 
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
 
0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's0b101000 years of computing: a personal timeline - decade "0", the 1980's
0b101000 years of computing: a personal timeline - decade "0", the 1980's
 
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
ApacheCon Berlin 2019: Kongo:Building a Scalable Streaming IoT Application us...
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
ApacheCon2019 Talk: Improving the Observability of Cassandra, Kafka and Kuber...
 
How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...How to Improve the Observability of Apache Cassandra and Kafka applications...
How to Improve the Observability of Apache Cassandra and Kafka applications...
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 

Recently uploaded

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Recently uploaded (20)

A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers

  • 1. Apache ZooKeeper and Apache Curator: Meet the Dining Philosophers Paul Brebner Instaclustr by NetApp © Instaclustr Pty Limited, 2022 ApacheCon NA 2022
  • 2. Who Am I? Previously § R&D in distributed systems and performance engineering Last 5 years § Technology Evangelist for Instaclustr by NetApp § 100+ Blogs, Demo Applications, Talks § Benchmarking and Performance Insights § Open Source Technologies including o Apache Cassandra®, Spark™, ZooKeeper, Kafka® o OpenSearch®, Redis™, PostgreSQL® o Uber’s Cadence® I live in Australia! © Instaclustr Pty Limited, 2022
  • 3. A ZooKeeper Walks Into a Pub… © Instaclustr Pty Limited, 2022 The STORY
  • 4. Actually an Outback Pub Outback = “Out of the back of …”, e.g. “Back O’ Bourke” © Instaclustr Pty Limited, 2022 Google Maps Bourke Outback
  • 5. © Instaclustr Pty Limited, 2022 Daly Waters Pub in outback Northern Territory, Australia (Source: Wikimedia Commons) The Pub
  • 6. And Meets Some Unexpected Guests… © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius (Source: Wikimedia Commons)
  • 7. …Who Appear to Be Fighting Over Cutlery © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius (Source: Wikimedia Commons/Shutterstock)
  • 8. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Ludwig, I’m hungry please lend me a fork.” (Source: Wikimedia Commons)
  • 9. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Karl, I don’t fully understand what you mean by the word ‘fork’.” (Source: Wikimedia Commons)
  • 10. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Ludwig, you are not sharing fairly! Niccolo, have you finished with your fork yet?” (Source: Wikimedia Commons)
  • 11. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “No, it’s mine! I will never ever give you my fork.” (Source: Wikimedia Commons)
  • 12. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Who invented these silly rules anyway?” (Source: Wikimedia Commons)
  • 13. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Dijkstra—a ‘computer scientist’— whatever that is!” (Source: Wikimedia Commons)
  • 14. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius “Why can’t I stage a proletarian revolution and take control of all the means of production— I mean forks?” (Source: Wikimedia Commons)
  • 15. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius "Ritual leads to greater social harmony” (Source: Wikimedia Commons)
  • 16. The ZooKeeper, being trained in the harmonious running of zoos, goes over to see if he can help out with their resource sharing problem… © Instaclustr Pty Limited, 2022 Gentlemen, perhaps I can be of assistance. Given my often complex and dangerous job as a ZooKeeper I’m sure I can help!
  • 17. © Instaclustr Pty Limited, 2022 None of my Tasmanian Devils have eaten any of the Wallabies—yet! Not Lunch (Source: Shutterstock)
  • 18. © Instaclustr Pty Limited, 2022 Karl (Marx) Ludwig (Wittgenstein) Niccolo (Machiavelli) Aristotle Confucius Sir, thank you very much! We accept your kind offer, as we are very Hungry! Although we have no conception of how you will solve our intractable problem. But then at least we can get back to the more serious matters of Thinking and Eating, hopefully in approximately equal amounts. The Dining Philosophers Reply (for that was the name of their club)
  • 19. Introducing the ZooKeeper § A Mature Apache Technology • De facto technology for distributed systems coordination and meta-data storage • Originated as a sub-project of Hadoop • Used by lots of projects including Kafka § Instaclustr Uses ZooKeeper • For managed Kafka clusters • For our managed High Availability PostgreSQL® service (distributed config store for Patroni) • And offers it as … © Instaclustr Pty Limited, 2022
  • 20. An Instaclustr Managed Service © Instaclustr Pty Limited, 2022
  • 21. Introducing the ZooKeeper Optimized for 1. Consistency 2. Availability 3. Performance © Instaclustr Pty Limited, 2022
  • 22. Consistency With ZAB § Single leader for atomic writes § Multiple replicas § ZAB • ZooKeeper Atomic Broadcast • Protocol for sequential ordering and consistency © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 23. Availability © Instaclustr Pty Limited, 2022 § No Clusters, but Ensembles • 1 Leader, multiple replicas § If the leader fails then another replica takes over § Ensemble available if majority (> ½) of servers running § Ensembles should have an odd numbers of servers, e.g. • 3 servers supports 1 server failure (2 out of 3 running) • 5 servers support 2 server failures (3 out of 5 running) • 7 servers support 3 server failures (4 out of 7 running) § Reads go to any server (so will succeed if >= 1) § Writes go to leader only (and succeed if > ½ servers running)
  • 24. Performance © Instaclustr Pty Limited, 2022 § Objects are in memory § Reads are fast, but § Writes are slower due to writing disk log § Read intensive workloads are best § Objects should be small • Written in Java • GC can become an overhead § What are objects? (Source: Shutterstock)
  • 25. ZNODES © Instaclustr Pty Limited, 2022 In Zoology, a phylogenetic tree, or “tree of life”, shows the relationship between different species (Source: Shutterstock)
  • 26. § Appropriately (for a ZooKeeper) the only data structure is a tree (of life) § Hierarchical namespace: E.g. “/Animalia/Chordata/Mammalia/Marsupialia/Dasyuromorphia/Dasyuridae/Sarcophilus/TasmanianDevil” § C.f. File system • Each node can have data and children • Nodes are called ZNODES • But “nodes” in an Ensemble are called ”servers” • ZNODES have o Version control o Atomic reads/writes o Replicated (in ensemble) Trees © Instaclustr Pty Limited, 2022
  • 27. ZNODES © Instaclustr Pty Limited, 2022 § Persistent or Ephemeral § Persistent • Default • Outlive client connection that created them § Ephemeral • Transient • Exist only for the duration of the client session that created them • Useful for participant discovery and election triggering once client dies (Source: Shutterstock)
  • 28. Operations on ZNODES © Instaclustr Pty Limited, 2022 § Operations • Create • Delete • Exists • getData • getChildren • setData § Operations Are Low Level • Just create and delete ZNODES in hierarchical namespace and • Read and write data to/from them atomically • Enable simple naming, configuration and group membership services • Example recipes for higher operations (Source: Shutterstock) Wombats are also “low-level” – They live under ground.
  • 29. ZNODES: Watches © Instaclustr Pty Limited, 2022 § Clients can set watches on ZNODES § Change to ZNODE triggers watch and sends notification to client § But only once! A one-time trigger § Set another watch for further notifications (Source: Shutterstock)
  • 30. The Dining Philosophers Problem © Instaclustr Pty Limited, 2022 1 5 3 4 2
  • 31. The Dining Philosophers Problem § Classic distributed systems concurrency and synchronization problem § Dijkstra 1965 § Limited resources—Forks (1 per Philosopher) § Multiple consumers (Philosophers) § N Philosophers (5) seated around circular table § Each has an (infinite) bowl of food in front of them § Fork on left and right, same number as Philosophers § To eat, a Philosopher must have both left and right forks • No talking allowed • No eating with fingers or only one fork (think of them as chopsticks) • No stealing forks from your neighbour (or anyone else) if they are holding it © Instaclustr Pty Limited, 2022
  • 32. Algorithm Rodin – The Thinker (Wikimedia Commons) Thinking Hungry Eating © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 33. Algorithm Philosophers alternate between Thinking, Wanting to Eat (Hungry), and Eating § Think • “Think” for a random period of time § Hungry • Enter “Hungry” state • Wait until the left fork is free and take it • Wait until the right fork is free and take it § Eating • “Eat” for a random period of time • Put both forks back on the table § Start from Thinking again Philosophers are silent, can’t “negotiate” or “demand” forks § Insufficient forks for everyone to be Eating at same time § So some Philosophers will have to wait for one or both forks while Hungry © Instaclustr Pty Limited, 2022
  • 34. Problems and Solutions § Problems • Starvation • Deadlock • Race Conditions • Fairness § Good Solutions • Minimize Hungry time • Maximize Eating concurrency § Simplest Solutions • Semaphores, timeouts around fork waiting, and a centralized Waiter (or ZooKeeper) • But watch out as the waiter can become the bottleneck © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 35. My Version Has a Boss and Timeouts Use leader election to elect a Boss Philosopher • Think § Check if I am the leader, if I am, then announce a topic of my choice. § “Think” for a random period of time. § If I was the leader, give up the leadership. • Hungry § Enter “Hungry” state. § Wait (with timeout) until the left fork is free and take it. § If I have the left fork then wait (with timeout) until the right fork is free and take it. • Eating § If I have both forks, then “Eat” for a random period of time. § If I have any forks then put them back on the table. § Start from the top again. © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 36. Example Trace © Instaclustr Pty Limited, 2022 § Run Boss Election… [this takes some time] § Marx is Thinking… § Machiavelli is Thinking… § Wittgenstein is Thinking… § Aristotle is Thinking… § Confucius is Thinking…
  • 37. Example Trace © Instaclustr Pty Limited, 2022 wants w a n t s § Marx is Hungry, wants left fork 1 § Wittgenstein is Hungry, wants left fork 2
  • 38. Example Trace § Marx got left fork 1 § Marx wants right fork 2 § Wittgenstein got left fork 2 • (Marx misses out!) © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3
  • 39. Example Trace § Wittgenstein wants right fork 3 © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3 wants
  • 40. Example Trace § Wittgenstein got right fork 3 § Wittgenstein is Eating © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3
  • 41. Example Trace § Confucius is Hungry, wants left fork 5 © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3 w a n t s
  • 42. Example Trace § Confucius got left fork 5 © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3
  • 43. Example Trace § Confucius wants right fork 1 © Instaclustr Pty Limited, 2022 w a n t s 1 2 5 4 3 w a n t s
  • 44. Example Trace § *** Marx gave up waiting for right fork 2 § Marx putting left fork back: 1 § Marx is Thinking… § Marx is the BOSS! § Marx says “Everyone now think about Aesthetics” © Instaclustr Pty Limited, 2022 2 5 4 3 w a n t s 1
  • 45. Example Trace § Confucius got right fork 1 § Confucius is Eating © Instaclustr Pty Limited, 2022 2 5 4 3 1
  • 46. Example Trace © Instaclustr Pty Limited, 2022 § Confucius finished eating! § Putting both forks back: 5, 1 § Confucius is Thinking… § Wittgenstein finished eating! § Putting both forks back: 2, 3 § Wittgenstein is Thinking…
  • 47. Example Trace © Instaclustr Pty Limited, 2022 § Marx is no longer the BOSS § Run Boss Election § Marx is Hungry, wants left fork 1 wants
  • 48. Example Trace © Instaclustr Pty Limited, 2022 § Marx got left fork 1 § Marx wants right fork 2 5 4 3 2 1 w a n t s
  • 49. Marx Is Finally Happy © Instaclustr Pty Limited, 2022 § Marx got right fork 2 § Marx is Eating (finally!!!) 5 4 3 2 1
  • 50. Implementation with Apache Curator Curators organize and interpret exhibits— even Zoo animals and Zookeepers! A Curator also walks into a Pub… © Instaclustr Pty Limited, 2022
  • 51. Implementation with Apache Curator Dangerous Animals: Snakes Stingers Crocs Spiders Sharks Bluebottles Shells Ticks Ants Centipedes Cassowaries… Don’t Panic! You are actually more likely to be injured by a horse in Australia © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 52. Apache Curator § High-level Java client for Apache ZooKeeper Koalas are also “high-level” – they live in trees © Instaclustr Pty Limited, 2022 (Source: Shutterstock) Client
  • 53. Apache Curator § Lots of recipes • Elections • Locks • Barriers • Counters • Catches • Nodes/Watches • Queues Koalas only need 1 recipe: Eucalyptus leaves © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 54. Create and Start Client int sleepMsBetweenRetries = 100; int maxRetries = 3; String zk = "127.0.0.1:2181"; RetryPolicy retryPolicy = new RetryNTimes(maxRetries, sleepMsBetweenRetries); CuratorFramework client = CuratorFrameworkFactory .newClient(zk, retryPolicy); client.start(); © Instaclustr Pty Limited, 2022
  • 55. Curator Recipes Used § Leader Latch for elections § Shared Lock for forks § Shared Counter for metrics © Instaclustr Pty Limited, 2022
  • 56. Leader Latch // For each Philosopher thread LeaderLatch leaderLatch = new LeaderLatch(client, "/mutex/leader/topic", threadName); leaderLatch.start(); // Think if (leaderLatch.hasLeadership()) System.out.println("I’m now the BOSS, think about X"); // Think for a while … // If thread was the leader then relinquish leadership, but rejoin pool again. if (leaderLatch.hasLeadership()) { System.out.println("I’m no longer the BOSS"); leaderLatch.close(); leaderLatch = new LeaderLatch(client, "/mutex/leader/topic", threadName); leaderLatch.start(); } // Hungry, try to eat, etc. © Instaclustr Pty Limited, 2022
  • 57. Shared Lock for Fork Sharing int numForks = 5; // number of forks = number of Philosophers static InterProcessSemaphoreMutex[] forks = new InterProcessSemaphoreMutex[numForks]; // Create forks for (int i=0; i < numForks; i++) forks[i] = new InterProcessSemaphoreMutex(client, "/mutex/fork" + i); // In each Philosopher thread: // Hungry, want left fork int leftFork = 0; boolean gotLeft = forks[leftFork].acquire(maxForkWaitTime, TimeUnit.MILLISECONDS); // to release a fork forks[leftFork].release(); © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 58. Shared Counter for Metrics (using optimistic concurrency – try until success) int meals = new SharedCount(client, "/counters/meals", 0); try { meals.start(); meals.setCount(0); } catch (Exception e1) { e1.printStackTrace(); } ... // In each Philosopher thread: // Each time a Philosopher eats, increment the meal counter // trySetCount returns true if it succeeds, else false and try again boolean success = false; while (!success) { int before = meals.getCount(); success = meals.trySetCount(before + 1); } ... // At the end of Dining, print out total meals System.out.println("Total meals = " + meals.getCount()); © Instaclustr Pty Limited, 2022 (Source: Shutterstock)
  • 59. Results: Baseline, Single ZK Server § 5 Philosophers, very low load on server so “best case” for this version of the algorithm § Utilization for each state: • Philosophers Thinking = 61% • Philosophers Hungry = 12% • Philosophers Eating = 27% § In “perfect” world with no fork contention • Thinking and Eating would be 50% and Hungry 0% © Instaclustr Pty Limited, 2022
  • 60. Results: 3 server ZK Ensemble © Instaclustr Pty Limited, 2022 (Source: Maritza Rios – Wikimedia Commons)
  • 61. Load Testing on Ensembles § 3 x large (2 vcpus) vs. xlarge (4 vcpus) EC2 servers (6 cores vs. 12 cores) § 3600 meals (peak at 150 Philosophers) vs 6400 meals (peak at 200 Philosophers) © Instaclustr Pty Limited, 2022
  • 62. Increase in Hungry %: Faster Increase for 2 cores © Instaclustr Pty Limited, 2022
  • 63. Zookeeper Throughput: 1,700 ops/s c.f. 3,600 ops/s © Instaclustr Pty Limited, 2022
  • 64. Notes § At maximum capacity, leader server saturated, replicas had spare capacity • As write heavy workload § Availability • Works as advertised • Killing 1/3 server, leader failed over immediately to replica • Killing 2/3 servers, Curator client error © Instaclustr Pty Limited, 2022
  • 65. Kafka leaves the Zoo(Keeper) § Kafka currently uses ZK to • Store partition and broker metadata • Elect a broker as Kafka Controller § KIP-500 Replace ZK with self-managed Metadata Quorum KIP-595 • Kafka should be more scalable, faster and support millions of partitions (!) § Other projects also leaving ZooKeeper? • Flink, Pinot • Cloud native deployments use Kubernetes etcd service § ZooKeeper vs. KRaft • Latest results! • Performance Engineering Track, Thursday after lunch © Instaclustr Pty Limited, 2022 (Source: Wikipedia) (Source: Shutterstock)
  • 66. Instaclustr Managed Platform A complete ecosystem to support your mission critical application. Our other Managed Open Source Technologies + © Instaclustr Pty Limited, 2022
  • 67. Blog: www.instaclustr.com/blog/apache-zookeeper-meets-the- dining-philosophers/ Further Information 1 All My Blogs: www.instaclustr.com/paul-brebner/ 2 Managed ZooKeeper: www.instaclustr.com/platform/managed- apache-zookeeper/ 3 © Instaclustr Pty Limited, 2022
  • 68. www.instaclustr.com info@instaclustr.com @instaclustr THANK YOU! © Instaclustr Pty Limited, 2022 © Instaclustr Pty Limited, 2022 https://www.instaclustr.com/company/policies/terms-conditions/ Except as permitted by the copyright law applicable to you, you may not reproduce, distribute, publish, display, communicate or transmit any of the content of this document, in any form, but any means, without the prior written permission of Instaclustr Pty Limited