2. What is a Distributed System?
A distributed system consists of multiple
computers that communicate through a
computer network and interact with each
other to achieve a common goal.
-Wikipedia
3. Coordination in a Distributed System?
Coordination: An act that multiple nodes must perform together.
Examples:
Leader Election
Managing group membership
Managing metadata
Synchronization (Semaphore, Mutex...)
4. Coordination in a Distributed System?
To coordinate, processes can
Exchange messages through network
Read/Write using shared storage
Use distributed locks
Problems for exchanging messages
Message delays
Processor speed
Clock drift
5. Use case for Master-Work Applications
Problems
Master crashes
Worker crashes
Communication failures
6. Use case for Master-Work Applications
Problems for Master Crashes
Use a backup master
Recover the latest state ?
Backup master may suspect the primary master has crashed ?
!> Split Brain scenario
7. Use case for Master-Work Applications
Problems for Worker Crashes
Master must detect worker crashes
Recover assigned tasks
Problems for Communication Failures
Execute a same task only once
8. Introduction to ZooKeeper
An open source, performant coordination
service for distributed applications
Was a sub project of Hadoop but is now a
Apache top-level project
Exposes common services in simple interface
Leader Election
Naming
Configuration management
Locks & Synchronization
Group Service
→ Don't have to write them from scratch
9. ZooKeeper Use cases
Distributed Cluster Management
Node join/leave
Node statuses in real time
Distributed synchronization
Locks
Barriers
Queues
10. ZooKeeper Use cases
Apache Hbase use ZooKeeper to
Elect a cluster master
Keep track of available servers
Keep cluster metadata
Apache Kafka use Zookeeper to
Detect crashes
Implement topic discovery
Maintain state for topics
11. ZooKeeper Guarantees
Sequential Consistency: Updates are applied in order
Atomicity: Updates either succeed or fail
Single System Image: A client sees the same view of the service
regardless of the ZK server it connects to.
Reliability: Updates persists once applied, till overwritten by some
clients.
Timeliness: The clients’ view of the system is guaranteed to be up-
to-date within a certain time bound. (Eventual Consistency)
12. ZooKeeper Services
All machines store a copy of the data (in memory)
A leader is elected on service startup
Clients only connect to a single server & maintains a TCP
connection.
Client can read from any server, writes go through the leader &
needs majority consensus.
13. ZooKeeper Data Model
ZooKeeper has a hierarchal name space.
Each node is called as a ZNode.
Every ZNode has data (given as byte[])
ZNode paths:
canonical, absolute, slash-separated
no relative references.
names can have Unicode characters
14. ZNode
Maintain a stat structure with version
numbers for data changes, ACL changes
and timestamps.
Version numbers increases with
changes
Data is read and written in its entirety
15. ZNode types
Persistent Nodes
exists till explicitly deleted
Ephemeral Nodes
exists as long as the session is active
can’t have children
Sequence Nodes (Unique Naming)
append a monotonically increasing counter to the end of path
applies to both persistent & ephemeral nodes
16. ZNode watches
Clients can set watches on znodes:
NodeChildrenChanged
NodeCreated
NodeDataChanged
NodeDeleted
Changes to a znode trigger the watch and ZooKeeper sends
the client a notification.
Watches are one time triggers.
Watches are always ordered.
Client sees watched event before new ZNode data.
17. ZNode APIs
String create(path, data, acl, flags)
void delete(path, expectedVersion)
Stat setData(path, data, expectedVersion)
(data, Stat) getData(path, watch)
Stat exists(path, watch)
String[] getChildren(path, watch)
→ Each API has its own asynchronous version also
20. Recipe: Leader Election
Continuous watching on znodes requires reset of watches
after every events / triggers
Too many watches on a single znode creates the “herd
effect” - causing bursts of traffic and limiting scalability
21. Recipe: Leader Election (Improved)
1.All participants create an ephemeral-sequential
node on the same election path.
2.The node with the smallest sequence number is
the leader.
3.Each “follower” node listens to the node with the
next lower seq. number
4.Upon leader removal go to
election-path and find a new leader,
or become the leader if it has the lowest sequence
number.
1.Upon session expiration check the election state
and go to election if needed
25. Zookeeper Programming
Difficult to use Zookeeper APIs
Connection Issues:
Initial connection: Requires a handshake before executing any
operations (create(), delete()...)
Session expiration: Clients are expected to watch for this state
and close and re-create the ZooKeeper instance.
26. Zookeeper Programming
Difficult to use Zookeeper APIs
Recoverable Errors:
When creating a sequential ZNode on the server, there is the
possibility that the server will successfully create the ZNode but
crash prior to returning the node name to the client.
There are several recoverable exceptions thrown by the
ZooKeeper client. Users are expected to catch these exceptions
and retry the operation.
27. Zookeeper Programming
Difficult to use Zookeeper APIs
Recipes:
The standard ZooKeeper "recipes" (locks, leaders, etc.) are only
minimally described and subtly difficult to write correctly..