An Introduction to Apache Zookeeper


An Introduction to Apache Zookeeper, what is it used for ?
How does it work and what is it's architecture ?

  1. 1. Apache Zookeeper ● What is it ? ● How does it work ? ● Problems ● Architecture ● Reliability
  2. 2. Zookeeper – What is it ? ● A coordination service ● For distributed systems ● Makes cluster coordination – Fast – Scalable ● Runs on a cluster of servers ● Offers pre defined functionality ● No need to “re invent the wheel” ● Best for read dominant work loads
  3. 3. Zookeeper – How does it work ? ● Works via an ensemble of servers ● Clients write changes to ensemble ● Changes processed strictly in order received ● Ensemble selects a leader ● Leader re elected in case of failure ● Designed for small data volumes / meta data
  4. 4. Zookeeper – How does it work ? ● Change considered successful if – It reaches a quorum – Quorum is at least half of ensemble ● A server will disconnect clients if – It cant contact quorum within a given time ● Cluster will function if half of servers up ● When a failed server re starts it will – Re sync with the ensemble
  5. 5. Distributed System Problems What type of problems do distributed systems encounter ● Inconsistency – Configuration inconsistencies across cluster ● Race conditions – Unexpected behaviour due to timing of events ● Dead locks – Contention for resources
  6. 6. Zookeeper – Name Space ● Zookeeper nodes use a hierarchical name space ● Like a file system but for low data volumes ● Path elements separated by a slash ( “/” ) ● Every node in name space identified by a path
