SlideShare a Scribd company logo
Introduction to Zookeeper
Anh Le
@BigSonata
What is a Distributed System?
A distributed system consists of multiple
computers that communicate through a
computer network and interact with each
other to achieve a common goal.
-Wikipedia
Coordination in a Distributed System?
Coordination: An act that multiple nodes must perform together.
Examples:
Leader Election
Managing group membership
Managing metadata
Synchronization (Semaphore, Mutex...)
Coordination in a Distributed System?
To coordinate, processes can
Exchange messages through network
Read/Write using shared storage
Use distributed locks
Problems for exchanging messages
Message delays
Processor speed
Clock drift
Use case for Master-Work Applications
Problems
Master crashes
Worker crashes
Communication failures
Use case for Master-Work Applications
Problems for Master Crashes
Use a backup master
Recover the latest state ?
Backup master may suspect the primary master has crashed ?
!> Split Brain scenario
Use case for Master-Work Applications
Problems for Worker Crashes
Master must detect worker crashes
Recover assigned tasks
Problems for Communication Failures
Execute a same task only once
Introduction to ZooKeeper
An open source, performant coordination
service for distributed applications
Was a sub project of Hadoop but is now a
Apache top-level project
Exposes common services in simple interface
Leader Election
Naming
Configuration management
Locks & Synchronization
Group Service
→ Don't have to write them from scratch
ZooKeeper Use cases
Distributed Cluster Management
Node join/leave
Node statuses in real time
Distributed synchronization
Locks
Barriers
Queues
ZooKeeper Use cases
Apache Hbase use ZooKeeper to
Elect a cluster master
Keep track of available servers
Keep cluster metadata
Apache Kafka use Zookeeper to
Detect crashes
Implement topic discovery
Maintain state for topics
ZooKeeper Guarantees
Sequential Consistency: Updates are applied in order
Atomicity: Updates either succeed or fail
Single System Image: A client sees the same view of the service
regardless of the ZK server it connects to.
Reliability: Updates persists once applied, till overwritten by some
clients.
Timeliness: The clients’ view of the system is guaranteed to be up-
to-date within a certain time bound. (Eventual Consistency)
ZooKeeper Services
All machines store a copy of the data (in memory)
A leader is elected on service startup
Clients only connect to a single server & maintains a TCP
connection.
Client can read from any server, writes go through the leader &
needs majority consensus.
ZooKeeper Data Model
ZooKeeper has a hierarchal name space.
Each node is called as a ZNode.
Every ZNode has data (given as byte[])
ZNode paths:
canonical, absolute, slash-separated
no relative references.
names can have Unicode characters
ZNode
Maintain a stat structure with version
numbers for data changes, ACL changes
and timestamps.
Version numbers increases with
changes
Data is read and written in its entirety
ZNode types
Persistent Nodes
exists till explicitly deleted
Ephemeral Nodes
exists as long as the session is active
can’t have children
Sequence Nodes (Unique Naming)
append a monotonically increasing counter to the end of path
applies to both persistent & ephemeral nodes
ZNode watches
Clients can set watches on znodes:
NodeChildrenChanged
NodeCreated
NodeDataChanged
NodeDeleted
Changes to a znode trigger the watch and ZooKeeper sends
the client a notification.
Watches are one time triggers.
Watches are always ordered.
Client sees watched event before new ZNode data.
ZNode APIs
String create(path, data, acl, flags)
void delete(path, expectedVersion)
Stat setData(path, data, expectedVersion)
(data, Stat) getData(path, watch)
Stat exists(path, watch)
String[] getChildren(path, watch)
→ Each API has its own asynchronous version also
ZooKeeper Recipes
Recipe: Leader Election
/master
Recipe: Leader Election
Continuous watching on znodes requires reset of watches
after every events / triggers
Too many watches on a single znode creates the “herd
effect” - causing bursts of traffic and limiting scalability
Recipe: Leader Election (Improved)
1.All participants create an ephemeral-sequential
node on the same election path.
2.The node with the smallest sequence number is
the leader.
3.Each “follower” node listens to the node with the
next lower seq. number
4.Upon leader removal go to
election-path and find a new leader,
or become the leader if it has the lowest sequence
number.
1.Upon session expiration check the election state
and go to election if needed
Zookeeper Programming
https://github.com/anhldbk/Zookeeper-
Demo
Zookeeper Programming
Zookeeper Programming
Zookeeper Programming
Difficult to use Zookeeper APIs
Connection Issues:
Initial connection: Requires a handshake before executing any
operations (create(), delete()...)
Session expiration: Clients are expected to watch for this state
and close and re-create the ZooKeeper instance.
Zookeeper Programming
Difficult to use Zookeeper APIs
Recoverable Errors:
When creating a sequential ZNode on the server, there is the
possibility that the server will successfully create the ZNode but
crash prior to returning the node name to the client.
There are several recoverable exceptions thrown by the
ZooKeeper client. Users are expected to catch these exceptions
and retry the operation.
Zookeeper Programming
Difficult to use Zookeeper APIs
Recipes:
The standard ZooKeeper "recipes" (locks, leaders, etc.) are only
minimally described and subtly difficult to write correctly..
Zookeeper Programming with Curator
lCurator- The Netflix Zookeeper library
Zookeeper Programming with Curator
Zookeeper Programming with Curator
Zookeeper for Our Systems
Zookeeper  big sonata

More Related Content

What's hot

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Clement Demonchy
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
Slim Baltagi
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Chen-en Lu
 
Apache flink
Apache flinkApache flink
Apache flink
pranay kumar
 
Protocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at TwitterProtocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at Twitter
Kevin Weil
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
MIJIN AN
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
jeetendra mandal
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
elliando dias
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
Quach Tung
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
MongoDB
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
Newvewm
 

What's hot (20)

Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
A visual introduction to Apache Kafka
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014
 
Apache flink
Apache flinkApache flink
Apache flink
 
Protocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at TwitterProtocol Buffers and Hadoop at Twitter
Protocol Buffers and Hadoop at Twitter
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Apache zookeeper 101
Apache zookeeper 101Apache zookeeper 101
Apache zookeeper 101
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
 

Similar to Zookeeper big sonata

CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
Raphael do Vale
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
Ding Li
 
Networking threads
Networking threadsNetworking threads
Networking threads
Nilesh Pawar
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
Léopold Gault
 
Multi threaded programming
Multi threaded programmingMulti threaded programming
Multi threaded programming
AnyapuPranav
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
Will Gage
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
Ajeng Savitri
 
Threads
ThreadsThreads
Threads
SURBHI SAROHA
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
Joydeep Banik Roy
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
jhao niu
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
Enoch Joshua
 
Scheduling Thread
Scheduling  ThreadScheduling  Thread
Scheduling Thread
MuhammadBilal187526
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
Saptarshi Chatterjee
 
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGYDistributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
reginamutio48
 
OS Module-2.pptx
OS Module-2.pptxOS Module-2.pptx
OS Module-2.pptx
bleh23
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptx
lencho3d
 
Developing Actors in Azure with .net
Developing Actors in Azure with .netDeveloping Actors in Azure with .net
Developing Actors in Azure with .net
Marco Parenzan
 
Asynchronous Python with Twisted
Asynchronous Python with TwistedAsynchronous Python with Twisted
Asynchronous Python with Twisted
Adam Englander
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
Kathirvel Ayyaswamy
 
dos slide share.pptx
dos slide share.pptxdos slide share.pptx
dos slide share.pptx
NagaVarthini
 

Similar to Zookeeper big sonata (20)

CrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked dataCrawlerLD - Distributed crawler for linked data
CrawlerLD - Distributed crawler for linked data
 
Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Networking threads
Networking threadsNetworking threads
Networking threads
 
Leo's Notes about Apache Kafka
Leo's Notes about Apache KafkaLeo's Notes about Apache Kafka
Leo's Notes about Apache Kafka
 
Multi threaded programming
Multi threaded programmingMulti threaded programming
Multi threaded programming
 
Shopzilla On Concurrency
Shopzilla On ConcurrencyShopzilla On Concurrency
Shopzilla On Concurrency
 
Parallel Processing (Part 2)
Parallel Processing (Part 2)Parallel Processing (Part 2)
Parallel Processing (Part 2)
 
Threads
ThreadsThreads
Threads
 
Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!Winter is coming? Not if ZooKeeper is there!
Winter is coming? Not if ZooKeeper is there!
 
Zookeeper Introduce
Zookeeper IntroduceZookeeper Introduce
Zookeeper Introduce
 
NodeJS guide for beginners
NodeJS guide for beginnersNodeJS guide for beginners
NodeJS guide for beginners
 
Scheduling Thread
Scheduling  ThreadScheduling  Thread
Scheduling Thread
 
Talon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategyTalon systems - Distributed multi master replication strategy
Talon systems - Distributed multi master replication strategy
 
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGYDistributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
Distributed OPERATING SYSTEM FOR BACHELOR OF BUSINESS INFORMATION TECHNOLOGY
 
OS Module-2.pptx
OS Module-2.pptxOS Module-2.pptx
OS Module-2.pptx
 
distributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptxdistributed-systemsfghjjjijoijioj-chap3.pptx
distributed-systemsfghjjjijoijioj-chap3.pptx
 
Developing Actors in Azure with .net
Developing Actors in Azure with .netDeveloping Actors in Azure with .net
Developing Actors in Azure with .net
 
Asynchronous Python with Twisted
Asynchronous Python with TwistedAsynchronous Python with Twisted
Asynchronous Python with Twisted
 
CS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMSCS9222 ADVANCED OPERATING SYSTEMS
CS9222 ADVANCED OPERATING SYSTEMS
 
dos slide share.pptx
dos slide share.pptxdos slide share.pptx
dos slide share.pptx
 

Zookeeper big sonata

  • 2. What is a Distributed System? A distributed system consists of multiple computers that communicate through a computer network and interact with each other to achieve a common goal. -Wikipedia
  • 3. Coordination in a Distributed System? Coordination: An act that multiple nodes must perform together. Examples: Leader Election Managing group membership Managing metadata Synchronization (Semaphore, Mutex...)
  • 4. Coordination in a Distributed System? To coordinate, processes can Exchange messages through network Read/Write using shared storage Use distributed locks Problems for exchanging messages Message delays Processor speed Clock drift
  • 5. Use case for Master-Work Applications Problems Master crashes Worker crashes Communication failures
  • 6. Use case for Master-Work Applications Problems for Master Crashes Use a backup master Recover the latest state ? Backup master may suspect the primary master has crashed ? !> Split Brain scenario
  • 7. Use case for Master-Work Applications Problems for Worker Crashes Master must detect worker crashes Recover assigned tasks Problems for Communication Failures Execute a same task only once
  • 8. Introduction to ZooKeeper An open source, performant coordination service for distributed applications Was a sub project of Hadoop but is now a Apache top-level project Exposes common services in simple interface Leader Election Naming Configuration management Locks & Synchronization Group Service → Don't have to write them from scratch
  • 9. ZooKeeper Use cases Distributed Cluster Management Node join/leave Node statuses in real time Distributed synchronization Locks Barriers Queues
  • 10. ZooKeeper Use cases Apache Hbase use ZooKeeper to Elect a cluster master Keep track of available servers Keep cluster metadata Apache Kafka use Zookeeper to Detect crashes Implement topic discovery Maintain state for topics
  • 11. ZooKeeper Guarantees Sequential Consistency: Updates are applied in order Atomicity: Updates either succeed or fail Single System Image: A client sees the same view of the service regardless of the ZK server it connects to. Reliability: Updates persists once applied, till overwritten by some clients. Timeliness: The clients’ view of the system is guaranteed to be up- to-date within a certain time bound. (Eventual Consistency)
  • 12. ZooKeeper Services All machines store a copy of the data (in memory) A leader is elected on service startup Clients only connect to a single server & maintains a TCP connection. Client can read from any server, writes go through the leader & needs majority consensus.
  • 13. ZooKeeper Data Model ZooKeeper has a hierarchal name space. Each node is called as a ZNode. Every ZNode has data (given as byte[]) ZNode paths: canonical, absolute, slash-separated no relative references. names can have Unicode characters
  • 14. ZNode Maintain a stat structure with version numbers for data changes, ACL changes and timestamps. Version numbers increases with changes Data is read and written in its entirety
  • 15. ZNode types Persistent Nodes exists till explicitly deleted Ephemeral Nodes exists as long as the session is active can’t have children Sequence Nodes (Unique Naming) append a monotonically increasing counter to the end of path applies to both persistent & ephemeral nodes
  • 16. ZNode watches Clients can set watches on znodes: NodeChildrenChanged NodeCreated NodeDataChanged NodeDeleted Changes to a znode trigger the watch and ZooKeeper sends the client a notification. Watches are one time triggers. Watches are always ordered. Client sees watched event before new ZNode data.
  • 17. ZNode APIs String create(path, data, acl, flags) void delete(path, expectedVersion) Stat setData(path, data, expectedVersion) (data, Stat) getData(path, watch) Stat exists(path, watch) String[] getChildren(path, watch) → Each API has its own asynchronous version also
  • 20. Recipe: Leader Election Continuous watching on znodes requires reset of watches after every events / triggers Too many watches on a single znode creates the “herd effect” - causing bursts of traffic and limiting scalability
  • 21. Recipe: Leader Election (Improved) 1.All participants create an ephemeral-sequential node on the same election path. 2.The node with the smallest sequence number is the leader. 3.Each “follower” node listens to the node with the next lower seq. number 4.Upon leader removal go to election-path and find a new leader, or become the leader if it has the lowest sequence number. 1.Upon session expiration check the election state and go to election if needed
  • 25. Zookeeper Programming Difficult to use Zookeeper APIs Connection Issues: Initial connection: Requires a handshake before executing any operations (create(), delete()...) Session expiration: Clients are expected to watch for this state and close and re-create the ZooKeeper instance.
  • 26. Zookeeper Programming Difficult to use Zookeeper APIs Recoverable Errors: When creating a sequential ZNode on the server, there is the possibility that the server will successfully create the ZNode but crash prior to returning the node name to the client. There are several recoverable exceptions thrown by the ZooKeeper client. Users are expected to catch these exceptions and retry the operation.
  • 27. Zookeeper Programming Difficult to use Zookeeper APIs Recipes: The standard ZooKeeper "recipes" (locks, leaders, etc.) are only minimally described and subtly difficult to write correctly..
  • 28. Zookeeper Programming with Curator lCurator- The Netflix Zookeeper library
  • 31. Zookeeper for Our Systems