SlideShare a Scribd company logo
Coordination service:
A coordination service in a distributed world helps distributed applications to
offload some common challenges like
- Synchronization b/w the nodes of the cluster.
- Distributing a common configuration b/w the nodes of the cluster
- Grouping and naming services for each of the nodes of the cluster
- Leader election b/w the nodes.
Znodes:
Zookeeper(Zk) helps the nodes of the distributed applications coordinate with
each other by providing a common namespace.
Nodes can use this namespace to save and retrieve any shared info to help
coordination.
The namespace is hierarchical much like a tree d/s.
Each element in this namespace is called a znode associated with a name
separated by a (/) to indicate its hierarchical path from the root.
These namespace is stored in-memory and therefore provides faster access.
Ensemble:
Similar to the distributed application clients that it servers, zookeeper itself is
distributed i.e, a set of zookeeper nodes work together to achieve its goal. This
group of zookeeper nodes is called an Ensemble.
Clients can talk to any node within the ensemble (via zk client lib). Clients
periodically send heartbeats to server & receive an ack back to reaffirm its
connectivity.
Each node in the ensemble is aware of and talks to other nodes to share info.
Znodes - namespace
What is Zookeeper ?
Ensemble
Zookeeper Data Model
Each Znode stores a stat structure that contains zxid (transactionID),
version #, timestamp, ACL. Client receives this stats structure at the time
of read. The stat structure helps to validate the updates/deletes from
client.
With every creation/updates the stat structure is updated. Version #
increments, Zxid increments, reset timestamp etc..
Types of ZNodes:
● Ephemeral:
○ Exists as long as the session that created them also
exist.
○ Cannot have children.
● Persistent:
○ Unlike ephemeral, these are persisted across sessions.
● Sequential:
○ Contains a monotonically increasing no as a part of its
name. Helps keep its uniqueness.
Zookeeper Sessions
- Client lib configuration contains the list of all zookeeper servers.
- Client establishes connection to any random server from the list.
- The connected server sends an auth token upon successfully
connection.
- Both client and connected server periodically exchange
heartbeats to confirm that they are each alive.
- If the client loses connectivity, the client lib upon timeout will
connect to some other server from the config list. This switch is
transparent to the client application.
- During reconnection the auth token from the prev connection is
used for validity to attempt an connection to its lost session.
Zookeeper Watches
- Client ops like getData(), getChildren(), exits() etc.., has an
optional parameter to enable a watch on the target znode.
- Zookeeper servers notifies a single change event to the
watchers of the znode. Successive changes to the znode will
not be notify the watchers.
- There are 2 kinds of Watches
- Data watches: Watches for a change in data on a
znode.
getData(), exists() are set to watch for a change in
data.
Also create(), delete()
- Children watches: Watches for a add/deletion of a
child node for a parent znode.
getChildren() is set to watch for add/dele for child
znodes for a parent znode.
Also create(), delete()
- Server A & B creates ephemeral nodes 1 & 2
respectively.
- When A dies, B that’s watching 1 is notified
before 1 expires.
- B can now leverage the info to take evasive
action.
Read Path:
- In a Leader/Follower model, reads are eventually consistent.
- Client connects to one of the zk servers and request for znode along a
path.
- The connected zk server authenticates the client & servers the read from
its locally stored namespace.
- Since its a local copy, it can be stale.
- Zk servers choose availability over consistency hence each servers stores
its own copy of the namespace.
Zookeeper Data Access
Write Path:
- Client connects to one of the zk servers and requests a create/delete of a
znode along a path in the namespace.
- Since all writes are handled by the leader, the connected zk server
forwards the write to leader.
- The leader persists the data and broadcasts the write to all followers in the
cluster & awaits their response.
- If majority of them writes into their local namespace and responds back,
we then have a quorum & write is a success.
- The initial connected zk server responds the write request as a success to
the client.
Zk commands
Intent: Enforce a barricade while performing crucial
job.
● Client calls exits(/b, true), to check if barrier
exists and sets a watch.
● If barrier /b doesn’t exist, create a Ephemeral
node and proceed with the client job
● create(/b, EPHEMERAL)
● If barrier /b exists, client waits for the watch
trigger. At this point the there may be multiple
clients that are on wait n watch for the same
barrier /b.
● One the client job is done it can delete the
barrier.
● The delete of barrier node triggers notification
to all watchers.
● Other waiting clients can now retry with calls
to exits(/b, true).
Usage: Critical updates/housekeeping tasks to force
wait on other processes.
Recipe - Barrier
delete(/b,
Ephemeral)
Is
exists
(/b,
true)
Create(/b,
Ephemeral)
Run client
job
ClientClientClient
exists(/b, true)
Yes
No
Notify state
change to
watchers
Create & delete are atomic ops performed by leader upon agreement with quorum.
Leader guarantees order in the event of race condition for multiple creation requests
from different clients are sequential.
Notify state change
to watchers
Recipe - Cluster Management
Intent: Notify nodes about the arrival or departure of other nodes in the
cluster.
● Create a PERSISTENT parent node /member
● Each client sets a watch on the parent node /member
exists(/member, true)
● Each client creates EPHEMERAL child node under /member
create(/member/host1, false)
● Each client updates its status like CPU/memory/failure etc to its
node in the hierarchy.
● Watches are triggered to all watchers with a change to any child
node.
Usage: Cluster monitoring or management for elastic scaling.
Client c1
/member
Client c2
Client c3
Watches
parent
creates/
updates /member/c1
/member/c2
/member/c3
Notifies
watchers
When client c3 creates /member/c3, zk notifies the other
watches viz., c1 and c2.
Recipe - Queues
Intent: Creates a ordered data access FIFO
● Create a PERSISTENT parent node /queue
● Each client creates EPHEMERAL & SEQUENTIAL child node
under /queue. Since its sequential it appends a monotonically
increasing no at the end e.g., /queue/X-00001, /queue/X-0002...
create(/queue/X-, false)
● A client that wants to access the nodes in insertion order simply
invokes all its children.
getChildren(/queue, true)
By enabling the watch on the parent, the accessor client is
notified when a child is created or removed externally.
Useage: Cluster monitoring or management for elastic scaling.
Client c1
/queue
Client c2
Client c3
creates
/queue/x-0001
/queue/x-0002
/queue/x-0003
Client c4
getChildren
Watches for
changes to
children
Recipe - Locks
Intent: Avoid race condition by enforcing a lock/key pattern
1. Create a PERSISTENT parent node /lock
2. Each client creates EPHEMERAL & SEQUENTIAL child node
under /lock. Since its sequential it appends a monotonically
increasing no at the end e.g., /lock/X-00001, /queue/X-0002…
create(/lock/X-, false)
1. Locks are granted in the insertion order from smallest to largest.
Client wants to check if its the lowest, invokes
getChildren(/lock, false)
1. If 1st znode in the list of children is its very own, the lock is
acquired. Client proceeds to do its job. Upon completion,
releases the lock by deleting its znode.
delete(/lock/X-00001)
1. Else, waits for its turn by adding a watch of its predecessor
znode. (If its immediate predecessor doesn’t exists look for the
one before and so until you find one).
exists(/lock/X-00000n - 1)
1. When its predecessor znode is deleted/update the client is
notified.
2. When a node receives this event it goes to step 3.
Client c1
/lock
Client c2
Client c3
creates
/lock/x-0001
/lock/x-0002
/lock/x-0003
Watches it
predecessor
getChildren()
Checks for existence of its predecessor.
Also need to check with the parent if its the 1st existent
child, In the event its predecessor dies.
Recipe - Leader selection
Intent: Leader election
1. Create a PERSISTENT parent node /election
2. Each zk servers creates EPHEMERAL & SEQUENTIAL child
node under /election. Since its sequential it appends a
monotonically increasing no at the end e.g., /election/X-00001,
/election/X-0002…
create(/election/X-, false)
1. Each Zk server checks if it’s the smallest among all children
getChildren(/election, false)
1. If yes, it becomes the leader.
2. Else, it sets a watch on the znode just smaller that itself (smallest
and closest predecessor).
exists(/election/X-00000n - 1)
1. If the leader dies, so does it ephemeral znode triggering a watch
event to only its successor (next in line that watching it).
2. When a node receives this event it goes to step 3.
Zk 1
(Leader)
/election
Zk 2
Zk 3
creates
/election/x-0001
/election/x-0002
/election/x-0003
Watches it
predecessor
getChildren()
Checks for existence of its predecessor.
Also need to check with the parent if its the 1st existent
child, In the event its predecessor dies.
● Brief Architecture
https://data-flair.training/blogs/zookeeper-architecture/
● Datamodel
https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel
● Zk API
https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm
● Overview
https://www.slideshare.net/scottleber/apache-zookeeper
References

More Related Content

What's hot

Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
Chicago Hadoop Users Group
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
Altinity Ltd
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
Ilias Okacha
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


Cloudera, Inc.
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
confluent
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
Jaeyeon Kim
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
Jean-Paul Azar
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
Xiang Fu
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
Chandler Huang
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
DataWorks Summit/Hadoop Summit
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
Guozhang Wang
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
confluent
 
Autoscaling in Kubernetes
Autoscaling in KubernetesAutoscaling in Kubernetes
Autoscaling in Kubernetes
Hrishikesh Deodhar
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
Taras Matyashovsky
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
Apache kafka
Apache kafkaApache kafka
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
confluent
 

What's hot (20)

Everything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about OozieEverything you wanted to know, but were afraid to ask about Oozie
Everything you wanted to know, but were afraid to ask about Oozie
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
MLOps 플랫폼을 만드는 과정의 고민과 해결 사례 공유(feat. Kubeflow)
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Kafka Streams State Stores Being Persistent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Stream Processing made simple with Kafka
Stream Processing made simple with KafkaStream Processing made simple with Kafka
Stream Processing made simple with Kafka
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
 
Autoscaling in Kubernetes
Autoscaling in KubernetesAutoscaling in Kubernetes
Autoscaling in Kubernetes
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 

Similar to Zookeeper Architecture

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
Omid Vahdaty
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
Joydeep Banik Roy
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
John Brinnand
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
Jeff Smith
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
John Billings
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
Quach Tung
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
Anh Le
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
Jimmy Lai
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
jeetendra mandal
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
confluent
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
Ajith Narayanan
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
Abdulaziz AlMalki
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptx
SamPrem3
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
Michael Klishin
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
Athira Mukundan
 
Zookeeper
ZookeeperZookeeper
Zookeeper
SatyaHadoop
 

Similar to Zookeeper Architecture (20)

Introduction to apache zoo keeper
Introduction to apache zoo keeper Introduction to apache zoo keeper
Introduction to apache zoo keeper
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
 
SVCC-2014
SVCC-2014SVCC-2014
SVCC-2014
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
 
ZooKeeper Recipes and Solutions
ZooKeeper Recipes and SolutionsZooKeeper Recipes and Solutions
ZooKeeper Recipes and Solutions
 
How Yelp does Service Discovery
How Yelp does Service DiscoveryHow Yelp does Service Discovery
How Yelp does Service Discovery
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
 
Zookeeper big sonata
Zookeeper  big sonataZookeeper  big sonata
Zookeeper big sonata
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
When it Absolutely, Positively, Has to be There: Reliability Guarantees in Ka...
 
An introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methodsAn introduction to_rac_system_test_planning_methods
An introduction to_rac_system_test_planning_methods
 
Distributed fun with etcd
Distributed fun with etcdDistributed fun with etcd
Distributed fun with etcd
 
UNIT IV DIS.pptx
UNIT IV DIS.pptxUNIT IV DIS.pptx
UNIT IV DIS.pptx
 
Troubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issuesTroubleshooting common oslo.messaging and RabbitMQ issues
Troubleshooting common oslo.messaging and RabbitMQ issues
 
Storing the real world data
Storing the real world dataStoring the real world data
Storing the real world data
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 

Recently uploaded

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
lorraineandreiamcidl
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
SOCRadar
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
Google
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
timtebeek1
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 

Recently uploaded (20)

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptxLORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
LORRAINE ANDREI_LEQUIGAN_HOW TO USE WHATSAPP.pptx
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket ManagementUtilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
Utilocate provides Smarter, Better, Faster, Safer Locate Ticket Management
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
socradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdfsocradar-q1-2024-aviation-industry-report.pdf
socradar-q1-2024-aviation-industry-report.pdf
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppAI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
 
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdfAutomated software refactoring with OpenRewrite and Generative AI.pptx.pdf
Automated software refactoring with OpenRewrite and Generative AI.pptx.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 

Zookeeper Architecture

  • 1.
  • 2. Coordination service: A coordination service in a distributed world helps distributed applications to offload some common challenges like - Synchronization b/w the nodes of the cluster. - Distributing a common configuration b/w the nodes of the cluster - Grouping and naming services for each of the nodes of the cluster - Leader election b/w the nodes. Znodes: Zookeeper(Zk) helps the nodes of the distributed applications coordinate with each other by providing a common namespace. Nodes can use this namespace to save and retrieve any shared info to help coordination. The namespace is hierarchical much like a tree d/s. Each element in this namespace is called a znode associated with a name separated by a (/) to indicate its hierarchical path from the root. These namespace is stored in-memory and therefore provides faster access. Ensemble: Similar to the distributed application clients that it servers, zookeeper itself is distributed i.e, a set of zookeeper nodes work together to achieve its goal. This group of zookeeper nodes is called an Ensemble. Clients can talk to any node within the ensemble (via zk client lib). Clients periodically send heartbeats to server & receive an ack back to reaffirm its connectivity. Each node in the ensemble is aware of and talks to other nodes to share info. Znodes - namespace What is Zookeeper ? Ensemble
  • 3. Zookeeper Data Model Each Znode stores a stat structure that contains zxid (transactionID), version #, timestamp, ACL. Client receives this stats structure at the time of read. The stat structure helps to validate the updates/deletes from client. With every creation/updates the stat structure is updated. Version # increments, Zxid increments, reset timestamp etc.. Types of ZNodes: ● Ephemeral: ○ Exists as long as the session that created them also exist. ○ Cannot have children. ● Persistent: ○ Unlike ephemeral, these are persisted across sessions. ● Sequential: ○ Contains a monotonically increasing no as a part of its name. Helps keep its uniqueness.
  • 4. Zookeeper Sessions - Client lib configuration contains the list of all zookeeper servers. - Client establishes connection to any random server from the list. - The connected server sends an auth token upon successfully connection. - Both client and connected server periodically exchange heartbeats to confirm that they are each alive. - If the client loses connectivity, the client lib upon timeout will connect to some other server from the config list. This switch is transparent to the client application. - During reconnection the auth token from the prev connection is used for validity to attempt an connection to its lost session. Zookeeper Watches - Client ops like getData(), getChildren(), exits() etc.., has an optional parameter to enable a watch on the target znode. - Zookeeper servers notifies a single change event to the watchers of the znode. Successive changes to the znode will not be notify the watchers. - There are 2 kinds of Watches - Data watches: Watches for a change in data on a znode. getData(), exists() are set to watch for a change in data. Also create(), delete() - Children watches: Watches for a add/deletion of a child node for a parent znode. getChildren() is set to watch for add/dele for child znodes for a parent znode. Also create(), delete() - Server A & B creates ephemeral nodes 1 & 2 respectively. - When A dies, B that’s watching 1 is notified before 1 expires. - B can now leverage the info to take evasive action.
  • 5. Read Path: - In a Leader/Follower model, reads are eventually consistent. - Client connects to one of the zk servers and request for znode along a path. - The connected zk server authenticates the client & servers the read from its locally stored namespace. - Since its a local copy, it can be stale. - Zk servers choose availability over consistency hence each servers stores its own copy of the namespace. Zookeeper Data Access Write Path: - Client connects to one of the zk servers and requests a create/delete of a znode along a path in the namespace. - Since all writes are handled by the leader, the connected zk server forwards the write to leader. - The leader persists the data and broadcasts the write to all followers in the cluster & awaits their response. - If majority of them writes into their local namespace and responds back, we then have a quorum & write is a success. - The initial connected zk server responds the write request as a success to the client.
  • 7. Intent: Enforce a barricade while performing crucial job. ● Client calls exits(/b, true), to check if barrier exists and sets a watch. ● If barrier /b doesn’t exist, create a Ephemeral node and proceed with the client job ● create(/b, EPHEMERAL) ● If barrier /b exists, client waits for the watch trigger. At this point the there may be multiple clients that are on wait n watch for the same barrier /b. ● One the client job is done it can delete the barrier. ● The delete of barrier node triggers notification to all watchers. ● Other waiting clients can now retry with calls to exits(/b, true). Usage: Critical updates/housekeeping tasks to force wait on other processes. Recipe - Barrier delete(/b, Ephemeral) Is exists (/b, true) Create(/b, Ephemeral) Run client job ClientClientClient exists(/b, true) Yes No Notify state change to watchers Create & delete are atomic ops performed by leader upon agreement with quorum. Leader guarantees order in the event of race condition for multiple creation requests from different clients are sequential. Notify state change to watchers
  • 8. Recipe - Cluster Management Intent: Notify nodes about the arrival or departure of other nodes in the cluster. ● Create a PERSISTENT parent node /member ● Each client sets a watch on the parent node /member exists(/member, true) ● Each client creates EPHEMERAL child node under /member create(/member/host1, false) ● Each client updates its status like CPU/memory/failure etc to its node in the hierarchy. ● Watches are triggered to all watchers with a change to any child node. Usage: Cluster monitoring or management for elastic scaling. Client c1 /member Client c2 Client c3 Watches parent creates/ updates /member/c1 /member/c2 /member/c3 Notifies watchers When client c3 creates /member/c3, zk notifies the other watches viz., c1 and c2.
  • 9. Recipe - Queues Intent: Creates a ordered data access FIFO ● Create a PERSISTENT parent node /queue ● Each client creates EPHEMERAL & SEQUENTIAL child node under /queue. Since its sequential it appends a monotonically increasing no at the end e.g., /queue/X-00001, /queue/X-0002... create(/queue/X-, false) ● A client that wants to access the nodes in insertion order simply invokes all its children. getChildren(/queue, true) By enabling the watch on the parent, the accessor client is notified when a child is created or removed externally. Useage: Cluster monitoring or management for elastic scaling. Client c1 /queue Client c2 Client c3 creates /queue/x-0001 /queue/x-0002 /queue/x-0003 Client c4 getChildren Watches for changes to children
  • 10. Recipe - Locks Intent: Avoid race condition by enforcing a lock/key pattern 1. Create a PERSISTENT parent node /lock 2. Each client creates EPHEMERAL & SEQUENTIAL child node under /lock. Since its sequential it appends a monotonically increasing no at the end e.g., /lock/X-00001, /queue/X-0002… create(/lock/X-, false) 1. Locks are granted in the insertion order from smallest to largest. Client wants to check if its the lowest, invokes getChildren(/lock, false) 1. If 1st znode in the list of children is its very own, the lock is acquired. Client proceeds to do its job. Upon completion, releases the lock by deleting its znode. delete(/lock/X-00001) 1. Else, waits for its turn by adding a watch of its predecessor znode. (If its immediate predecessor doesn’t exists look for the one before and so until you find one). exists(/lock/X-00000n - 1) 1. When its predecessor znode is deleted/update the client is notified. 2. When a node receives this event it goes to step 3. Client c1 /lock Client c2 Client c3 creates /lock/x-0001 /lock/x-0002 /lock/x-0003 Watches it predecessor getChildren() Checks for existence of its predecessor. Also need to check with the parent if its the 1st existent child, In the event its predecessor dies.
  • 11. Recipe - Leader selection Intent: Leader election 1. Create a PERSISTENT parent node /election 2. Each zk servers creates EPHEMERAL & SEQUENTIAL child node under /election. Since its sequential it appends a monotonically increasing no at the end e.g., /election/X-00001, /election/X-0002… create(/election/X-, false) 1. Each Zk server checks if it’s the smallest among all children getChildren(/election, false) 1. If yes, it becomes the leader. 2. Else, it sets a watch on the znode just smaller that itself (smallest and closest predecessor). exists(/election/X-00000n - 1) 1. If the leader dies, so does it ephemeral znode triggering a watch event to only its successor (next in line that watching it). 2. When a node receives this event it goes to step 3. Zk 1 (Leader) /election Zk 2 Zk 3 creates /election/x-0001 /election/x-0002 /election/x-0003 Watches it predecessor getChildren() Checks for existence of its predecessor. Also need to check with the parent if its the 1st existent child, In the event its predecessor dies.
  • 12. ● Brief Architecture https://data-flair.training/blogs/zookeeper-architecture/ ● Datamodel https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#ch_zkDataModel ● Zk API https://www.tutorialspoint.com/zookeeper/zookeeper_api.htm ● Overview https://www.slideshare.net/scottleber/apache-zookeeper References