SlideShare a Scribd company logo
1 of 30
Download to read offline
Distributed Coordination
with ZooKeeper and Curator
Tibor Sulyán
tibor_sulyan@epam.com
April 25, 2015
2CONFIDENTIAL
CAP Theorem by Eric Brewer
• Consistency
• Availability
• Partition Tolerance
Introduction
1
2
12
write
read
?1
2
2
3CONFIDENTIAL
Agenda
What is ZooKeeper?1
ZooKeeper features2
Coordination Recipes3
Using Curator with ZooKeeper4
Deploying ZooKeeper clusters5
4CONFIDENTIAL
„ZooKeeper is a centralized service for maintaining configuration information, naming,
providing distributed synchronization, and providing group services.”
What is ZooKeeper about?
P
P
P
ES
/root
/root/data
/root/state
/root/state/service-000000001
zk
zk
zk zk
client client client
zk
ZK API ZK API ZK API
5CONFIDENTIAL
• Filesystem-like hierarchical structure
– Elements are called zNodes
• zNode operations
– Basic CRUD
– Transactional execution of multiple operations
– Watches
– Versioned changes
• zNode metadata
– Data
– Children
– Metadata (Stat structure)
• zNode types
ZooKeeper Data Model
P persistent
E ephemeral
PS persistent sequential
ES ephemeral sequential
6CONFIDENTIAL
• Ephemeral zNodes
– Session-scoped
– Exists as long as the ephemeral owner's
session is active
– Not persisted
– No children
• Sequence (sequential) zNodes
– Upon creation, zNode name is suffixed
by an integer value
– The value is unique in the zNode path
• Watches
– Can be set on read operations
(getData(), getChildren(), exists())
– One-time trigger when a zNode changes
ZooKeeper Data Model
P
P
/
servers
E server_A
E server_B
P leader
ES server_A0000000001
ES server_B0000000002
7CONFIDENTIAL
// this class will act as default watcher
class ZooKeeperClient implements Watcher {
...
// connect to the ensemble. 'this' refers to a watcher (aka default watcher)
ZooKeeper zooKeeper = new ZooKeeper("localhost:2181,localhost:2182,localhost:2183",
30_000, this);
@Override
public void process(WatchedEvent event) {
// zNode changes & connection state changes
// can be invoked before the constructor returns!
}
}
ZooKeeper API – connect, default watcher
8CONFIDENTIAL
// snyhronous node creation
try {
Stat stat = zooKeeper.create("/test", "data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE,
CreateMode.EPHEMERAL);
} catch (KeeperException e) {
switch (e.code()) {
case CONNECTIONLOSS:
// retry operation
break;
}
}
// asynchronous node creation
zooKeeper.create("/test", "data".getBytes(), OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL, new
StringCallback() {
@Override
public void processResult(int rc, String path, Object ctx, String name) {
switch (Code.get(rc)) {
// handle errors (retry on CONNECTIONLOSS)
}
}
}, null /* no context passed to callback*/);
ZooKeeper API – create operations &
recoverable errors
9CONFIDENTIAL
// snyhronous update: sets "newdata" for /test1
// error handling omitted
Stat stat = zooKeeper.setData("/test", "newdata".getBytes(), -1);
// sets "newerdata" only if data version is 5
zooKeeper.setData("/test", "newerdata".getBytes(), 5);
ZooKeeper API – versioned update operations
10CONFIDENTIAL
// check if zNode exists using the default watcher
// error handling omitted
Stat stat = zooKeeper.exists("/parent/child1", false);
// get data & set default watcher
Stat stat = new Stat();
byte[] data = zooKeeper.getData("/parent/child1", true, stat);
// Use a separate Watcher
stat = zooKeeper.exists("/parent/child2", new Watcher() {
@Override
public void process(WatchedEvent event) {
// react to node deletion
}
});
ZooKeeper API – read operations & setting
watches
11CONFIDENTIAL
• Atomic updates
• Sequential Consistency
• Single System Image
• Timeliness
• Reliability
• Availability
ZooKeeper Guarantees
zk 5
zk 1
zk 2 zk 3
client 1 client 2 client 3
zk 4
12CONFIDENTIAL
notifycommitvoteproposepropagate
Sequential Consistency
client
follower
leader
follower
follower
setData
sync return
callback called
watch triggered
time
13CONFIDENTIAL
propagate, propose commit, notify
Timeliness
client 2
follower
leader
follower
follower
client 1
setData (v2)
v2
v2
time
14CONFIDENTIAL
• ZooKeeper process failures are tolerated if
a quorum is present
• Simplest quorum: majority-based
• Avoids split-brain scenarios
Availability
zk 5
zk 1
zk 2 zk 3
client 1 client 2 client 3
zk 4
behaviour on follower failures
15CONFIDENTIAL
• ZooKeeper process failures are tolerated if
a quorum is present
• Simplest quorum: majority-based
• Avoids split-brain scenarios
Availability
zk 5
zk 1
zk 2 zk 3
client 1 client 2 client 3
zk 4
behaviour on leader failure
zk 1
zk 2
16CONFIDENTIAL
ZooKeeper Recipes
17CONFIDENTIAL
Distributed Coordination Recipes
Shared Data Group Membership
P
P
/
serviceInstances
E serverA
E serverB
Service Discovery
P
P
/
service
E serviceInfo
Lock
Mutex
Leader Election
P
P
/
service
ES service_0000000001
ES service_0000000002
18CONFIDENTIAL
Leader Election Recipe
P
P
/
service
ES service_0000000001
ES service_0000000002
zk 5
zk 1
zk 2 zk 3
service service service
zk 4
ES service_0000000003
service
watch
service_0000000001
watch
service_0000000001
watch
service_0000000001
watch
service_0000000001
service
watch
service_0000000002
n-1 watches are set on the same node
Improvment: watch the last sequence node
instead of the first one
19CONFIDENTIAL
Improved Leader Election Recipe
P
P
/
service
ES service_0000000001
ES service_0000000002
zk 5
zk 1
zk 2 zk 3
service service service
zk 4
ES service_0000000003
service
watch
service_0000000001
watch
service_0000000002
watch
service_0000000001
service
20CONFIDENTIAL
• Higher level Client API to
ZooKeeper
• Hides most of the complexity of
communicating with ZK ensemble
• Implemented recipes
Curator and ZooKeeper
zk
zk
zk zk
client client client
zk
Curator
ZK API
Curator
ZK API
Curator
ZK API
21CONFIDENTIAL
// create & start framework instance
CuratorFramework framework =
CuratorFrameworkFactory.newClient("localhost:2181,localhost:2182,localhost:2183",
new ExponentialBackoffRetry(1000, 20));
framework.start()
// foreground operation
Stat stat = framework.setData().forPath("/a/b/c/d", "testdata".getBytes());
// background operation
framework.setData().inBackground().forPath("/a/b/c/d/e", "testdata".getBytes());
Curator API
22CONFIDENTIAL
Protected EPHEMERAL_SEQUENTIAL nodes
Curator Features
zk 1
zk 2 zk 3
client 1 client 2 client 3
zk 4
P
P
/
cluster
framework.create().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath("/cluster/service");
ES service_0000000002
ES service_0000000003
zk 5
connection loss – reconnect attempt beginsreconnect successful within session timeout – retrying path creation
23CONFIDENTIAL
Protected EPHEMERAL_SEQUENTIAL nodes
Curator Features
zk 1
zk 2 zk 3
client 1 client 2 client 3
zk 4
P
P
/
cluster
framework.create().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).withProtection().forPath("/cluster/service");
ES _c_16c39a25-87b4-4a54-bd05-1666a3e718de_service_0000000002
zk 5
connection loss – reconnect attempt beginsreconnect successful within session timeout – checkning zNode with same GUIDno extra zNode created
24CONFIDENTIAL
• Performance Considerations
• Using Observers to scale
• Using Hierarchical Quorums for multi-datacenter setup
• Surviving network partition with read-only mode
Zookeeper in the real world
25CONFIDENTIAL
• Replicated data is kept entirely in-memory by zookeper processes
• full GC can drop out a server from the ensemble
• Synchronous filesystem writes in commit phase
• can take seconds on an overloaded storage device
• use dedicated device for zookeeper transaction logs
• Maximum zNode size is 1M by default
• data + metadata should fit in
• configurable using a system property, but increasing it is not recommended
• Watches and performance
• Too many watches on a single node – herd effect
• Too many watches overall – increases memory footprint
Performance considerations
26CONFIDENTIAL
notifycommitvoteproposepropagate
Using Observers to scale
client
follower
leader
follower
follower
setData
sync return
callback called
watch triggered
observer
observers:
• no proposals
• no votes
• can’t be leaders
time
27CONFIDENTIAL
Hierarchical Quorums
zk5
zk4
zk6
zk8
zk7
zk9
zk2
zk1
zk3
Majority quorums:
• any 4 zk failures are tolerated
A datacenter goes down
• remaining ensemble becomes
much less resilient
Hierarchical quorums:
• Disjoint groups are formed
• Quorum requires majority of votes
from the majority of groups
• 5 failures can be tolerated
• Better for clusters spanning
multiple datacenters
group 1 group 2
group 3
28CONFIDENTIAL
Read-only mode
zk5
zk4
zk6
zk8
zk7
zk9
zk2
zk1
zk3
Network partitions,
a datacenter gets detached
Partitioned zookeepers can operate
in read-only mode
• not connected to the ensemble
• no writes allowed
• read requests are still served
By default read-only mode is disabled
zk2
zk1
zk3
29CONFIDENTIAL
• ACLs
• Quota support
• Authentication support
• Transaction logging
• Connection state handling
• Weighted hierarchical quorums
• Configuration
• Dynamic reconfiguration
• ...
• More info:
• ZooKeeper documentation
http://zookeeper.apache.org/doc/trunk/index.html
• Curator resources
http://curator.apache.org
• ZAB protocol in detail
http://web.stanford.edu/class/cs347/reading/zab.pdf
http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-
protocol.2008.pdf
• ZooKeeper book
http://shop.oreilly.com/product/0636920028901.do
Topics not covered
30CONFIDENTIAL
THANK YOU!

More Related Content

What's hot

Akka Cluster in Java - JCConf 2015
Akka Cluster in Java - JCConf 2015Akka Cluster in Java - JCConf 2015
Akka Cluster in Java - JCConf 2015Jiayun Zhou
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Jimmy Lai
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testingFoundationDB
 
Apache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdateApache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdatePhil Steitz
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialJeff Smith
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpAll Things Open
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitivesBartosz Sypytkowski
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profileraragozin
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive HookMinwoo Kim
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourselfaragozin
 
Non Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaNon Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaFrank Lyaruu
 
Csw2016 gawlik bypassing_differentdefenseschemes
Csw2016 gawlik bypassing_differentdefenseschemesCsw2016 gawlik bypassing_differentdefenseschemes
Csw2016 gawlik bypassing_differentdefenseschemesCanSecWest
 
First glance at Akka 2.0
First glance at Akka 2.0First glance at Akka 2.0
First glance at Akka 2.0Vasil Remeniuk
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka featuresGrzegorz Duda
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Alexey Fyodorov
 
Jersey framework
Jersey frameworkJersey framework
Jersey frameworkknight1128
 

What's hot (20)

Akka Cluster in Java - JCConf 2015
Akka Cluster in Java - JCConf 2015Akka Cluster in Java - JCConf 2015
Akka Cluster in Java - JCConf 2015
 
Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...Distributed system coordination by zookeeper and introduction to kazoo python...
Distributed system coordination by zookeeper and introduction to kazoo python...
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testing
 
Apache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 UpdateApache Commons Pool and DBCP - Version 2 Update
Apache Commons Pool and DBCP - Version 2 Update
 
Programming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorialProgramming with ZooKeeper - A basic tutorial
Programming with ZooKeeper - A basic tutorial
 
Javascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and GulpJavascript TDD with Jasmine, Karma, and Gulp
Javascript TDD with Jasmine, Karma, and Gulp
 
Behind modern concurrency primitives
Behind modern concurrency primitivesBehind modern concurrency primitives
Behind modern concurrency primitives
 
DIY Java Profiler
DIY Java ProfilerDIY Java Profiler
DIY Java Profiler
 
Apache Hive Hook
Apache Hive HookApache Hive Hook
Apache Hive Hook
 
Everything as a code
Everything as a codeEverything as a code
Everything as a code
 
Java profiling Do It Yourself
Java profiling Do It YourselfJava profiling Do It Yourself
Java profiling Do It Yourself
 
Internal Hive
Internal HiveInternal Hive
Internal Hive
 
Non Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJavaNon Blocking I/O for Everyone with RxJava
Non Blocking I/O for Everyone with RxJava
 
Csw2016 gawlik bypassing_differentdefenseschemes
Csw2016 gawlik bypassing_differentdefenseschemesCsw2016 gawlik bypassing_differentdefenseschemes
Csw2016 gawlik bypassing_differentdefenseschemes
 
First glance at Akka 2.0
First glance at Akka 2.0First glance at Akka 2.0
First glance at Akka 2.0
 
Advanced akka features
Advanced akka featuresAdvanced akka features
Advanced akka features
 
Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)Counter Wars (JEEConf 2016)
Counter Wars (JEEConf 2016)
 
Jersey framework
Jersey frameworkJersey framework
Jersey framework
 
Thinking Beyond ORM in JPA
Thinking Beyond ORM in JPAThinking Beyond ORM in JPA
Thinking Beyond ORM in JPA
 
Zookeeper
ZookeeperZookeeper
Zookeeper
 

Similar to Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zookeeper

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...DataStax
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To PrometheusEtienne Coutaud
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scalethelabdude
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and ArchitectureSidney Chen
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier Hakka Labs
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012mumrah
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneEnkitec
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionTanel Poder
 
Oracle real application clusters system tests with demo
Oracle real application clusters system tests with demoOracle real application clusters system tests with demo
Oracle real application clusters system tests with demoAjith Narayanan
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in ActionSveta Smirnova
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Spark Summit
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performanceEngine Yard
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance SmackdownDataWorks Summit
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Nagios
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsMonal Daxini
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines
 

Similar to Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zookeeper (20)

Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
Cassandra Tools and Distributed Administration (Jeffrey Berger, Knewton) | C*...
 
Regain Control Thanks To Prometheus
Regain Control Thanks To PrometheusRegain Control Thanks To Prometheus
Regain Control Thanks To Prometheus
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Oracle Basics and Architecture
Oracle Basics and ArchitectureOracle Basics and Architecture
Oracle Basics and Architecture
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
UVM TUTORIAL;
UVM TUTORIAL;UVM TUTORIAL;
UVM TUTORIAL;
 
Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012Introduction to ZooKeeper - TriHUG May 22, 2012
Introduction to ZooKeeper - TriHUG May 22, 2012
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
Securing Hadoop @eBay
Securing Hadoop @eBaySecuring Hadoop @eBay
Securing Hadoop @eBay
 
Oracle real application clusters system tests with demo
Oracle real application clusters system tests with demoOracle real application clusters system tests with demo
Oracle real application clusters system tests with demo
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
Monitoring the Dynamic Resource Usage of Scala and Python Spark Jobs in Yarn:...
 
6 tips for improving ruby performance
6 tips for improving ruby performance6 tips for improving ruby performance
6 tips for improving ruby performance
 
Hadoop Query Performance Smackdown
Hadoop Query Performance SmackdownHadoop Query Performance Smackdown
Hadoop Query Performance Smackdown
 
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
 
Declarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data modelsDeclarative benchmarking of cassandra and it's data models
Declarative benchmarking of cassandra and it's data models
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part X
 
Curator intro
Curator introCurator intro
Curator intro
 

More from EPAM_Systems_Bulgaria

Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...
Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...
Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...EPAM_Systems_Bulgaria
 
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8EPAM_Systems_Bulgaria
 
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...EPAM_Systems_Bulgaria
 
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...EPAM_Systems_Bulgaria
 
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB database
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB databaseTech Talk_25.04.15_Session 2_Martin Toshev_KDB database
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB databaseEPAM_Systems_Bulgaria
 
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJ
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJTech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJ
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJEPAM_Systems_Bulgaria
 

More from EPAM_Systems_Bulgaria (6)

Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...
Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...
Tech Talks_04.07.15_Session 4_Vladimir Iliev_Inter-thread Messaging With Disr...
 
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8
Tech Talks_04.07.15_Session 3_Martin Toshev_Concurrency Utilities In Java 8
 
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...
Tech Talks_04.07.15_Session 2_Danail Branekov_Avoiding And Diagnosing Deadloc...
 
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...
Tech Talks_04.07.15_Session 1_Jeni Markishka & Martin Hristov_Concurrent Prog...
 
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB database
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB databaseTech Talk_25.04.15_Session 2_Martin Toshev_KDB database
Tech Talk_25.04.15_Session 2_Martin Toshev_KDB database
 
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJ
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJTech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJ
Tech Talks_25.04.15_Session 1_Balazs Kollar FIX_QFJ
 

Recently uploaded

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Recently uploaded (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

Tech Talks_25.04.15_Session 3_Tibor Sulyan_Distributed coordination with zookeeper

  • 1. Distributed Coordination with ZooKeeper and Curator Tibor Sulyán tibor_sulyan@epam.com April 25, 2015
  • 2. 2CONFIDENTIAL CAP Theorem by Eric Brewer • Consistency • Availability • Partition Tolerance Introduction 1 2 12 write read ?1 2 2
  • 3. 3CONFIDENTIAL Agenda What is ZooKeeper?1 ZooKeeper features2 Coordination Recipes3 Using Curator with ZooKeeper4 Deploying ZooKeeper clusters5
  • 4. 4CONFIDENTIAL „ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.” What is ZooKeeper about? P P P ES /root /root/data /root/state /root/state/service-000000001 zk zk zk zk client client client zk ZK API ZK API ZK API
  • 5. 5CONFIDENTIAL • Filesystem-like hierarchical structure – Elements are called zNodes • zNode operations – Basic CRUD – Transactional execution of multiple operations – Watches – Versioned changes • zNode metadata – Data – Children – Metadata (Stat structure) • zNode types ZooKeeper Data Model P persistent E ephemeral PS persistent sequential ES ephemeral sequential
  • 6. 6CONFIDENTIAL • Ephemeral zNodes – Session-scoped – Exists as long as the ephemeral owner's session is active – Not persisted – No children • Sequence (sequential) zNodes – Upon creation, zNode name is suffixed by an integer value – The value is unique in the zNode path • Watches – Can be set on read operations (getData(), getChildren(), exists()) – One-time trigger when a zNode changes ZooKeeper Data Model P P / servers E server_A E server_B P leader ES server_A0000000001 ES server_B0000000002
  • 7. 7CONFIDENTIAL // this class will act as default watcher class ZooKeeperClient implements Watcher { ... // connect to the ensemble. 'this' refers to a watcher (aka default watcher) ZooKeeper zooKeeper = new ZooKeeper("localhost:2181,localhost:2182,localhost:2183", 30_000, this); @Override public void process(WatchedEvent event) { // zNode changes & connection state changes // can be invoked before the constructor returns! } } ZooKeeper API – connect, default watcher
  • 8. 8CONFIDENTIAL // snyhronous node creation try { Stat stat = zooKeeper.create("/test", "data".getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL); } catch (KeeperException e) { switch (e.code()) { case CONNECTIONLOSS: // retry operation break; } } // asynchronous node creation zooKeeper.create("/test", "data".getBytes(), OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL, new StringCallback() { @Override public void processResult(int rc, String path, Object ctx, String name) { switch (Code.get(rc)) { // handle errors (retry on CONNECTIONLOSS) } } }, null /* no context passed to callback*/); ZooKeeper API – create operations & recoverable errors
  • 9. 9CONFIDENTIAL // snyhronous update: sets "newdata" for /test1 // error handling omitted Stat stat = zooKeeper.setData("/test", "newdata".getBytes(), -1); // sets "newerdata" only if data version is 5 zooKeeper.setData("/test", "newerdata".getBytes(), 5); ZooKeeper API – versioned update operations
  • 10. 10CONFIDENTIAL // check if zNode exists using the default watcher // error handling omitted Stat stat = zooKeeper.exists("/parent/child1", false); // get data & set default watcher Stat stat = new Stat(); byte[] data = zooKeeper.getData("/parent/child1", true, stat); // Use a separate Watcher stat = zooKeeper.exists("/parent/child2", new Watcher() { @Override public void process(WatchedEvent event) { // react to node deletion } }); ZooKeeper API – read operations & setting watches
  • 11. 11CONFIDENTIAL • Atomic updates • Sequential Consistency • Single System Image • Timeliness • Reliability • Availability ZooKeeper Guarantees zk 5 zk 1 zk 2 zk 3 client 1 client 2 client 3 zk 4
  • 13. 13CONFIDENTIAL propagate, propose commit, notify Timeliness client 2 follower leader follower follower client 1 setData (v2) v2 v2 time
  • 14. 14CONFIDENTIAL • ZooKeeper process failures are tolerated if a quorum is present • Simplest quorum: majority-based • Avoids split-brain scenarios Availability zk 5 zk 1 zk 2 zk 3 client 1 client 2 client 3 zk 4 behaviour on follower failures
  • 15. 15CONFIDENTIAL • ZooKeeper process failures are tolerated if a quorum is present • Simplest quorum: majority-based • Avoids split-brain scenarios Availability zk 5 zk 1 zk 2 zk 3 client 1 client 2 client 3 zk 4 behaviour on leader failure zk 1 zk 2
  • 17. 17CONFIDENTIAL Distributed Coordination Recipes Shared Data Group Membership P P / serviceInstances E serverA E serverB Service Discovery P P / service E serviceInfo Lock Mutex Leader Election P P / service ES service_0000000001 ES service_0000000002
  • 18. 18CONFIDENTIAL Leader Election Recipe P P / service ES service_0000000001 ES service_0000000002 zk 5 zk 1 zk 2 zk 3 service service service zk 4 ES service_0000000003 service watch service_0000000001 watch service_0000000001 watch service_0000000001 watch service_0000000001 service watch service_0000000002 n-1 watches are set on the same node Improvment: watch the last sequence node instead of the first one
  • 19. 19CONFIDENTIAL Improved Leader Election Recipe P P / service ES service_0000000001 ES service_0000000002 zk 5 zk 1 zk 2 zk 3 service service service zk 4 ES service_0000000003 service watch service_0000000001 watch service_0000000002 watch service_0000000001 service
  • 20. 20CONFIDENTIAL • Higher level Client API to ZooKeeper • Hides most of the complexity of communicating with ZK ensemble • Implemented recipes Curator and ZooKeeper zk zk zk zk client client client zk Curator ZK API Curator ZK API Curator ZK API
  • 21. 21CONFIDENTIAL // create & start framework instance CuratorFramework framework = CuratorFrameworkFactory.newClient("localhost:2181,localhost:2182,localhost:2183", new ExponentialBackoffRetry(1000, 20)); framework.start() // foreground operation Stat stat = framework.setData().forPath("/a/b/c/d", "testdata".getBytes()); // background operation framework.setData().inBackground().forPath("/a/b/c/d/e", "testdata".getBytes()); Curator API
  • 22. 22CONFIDENTIAL Protected EPHEMERAL_SEQUENTIAL nodes Curator Features zk 1 zk 2 zk 3 client 1 client 2 client 3 zk 4 P P / cluster framework.create().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).forPath("/cluster/service"); ES service_0000000002 ES service_0000000003 zk 5 connection loss – reconnect attempt beginsreconnect successful within session timeout – retrying path creation
  • 23. 23CONFIDENTIAL Protected EPHEMERAL_SEQUENTIAL nodes Curator Features zk 1 zk 2 zk 3 client 1 client 2 client 3 zk 4 P P / cluster framework.create().withMode(CreateMode.EPHEMERAL_SEQUENTIAL).withProtection().forPath("/cluster/service"); ES _c_16c39a25-87b4-4a54-bd05-1666a3e718de_service_0000000002 zk 5 connection loss – reconnect attempt beginsreconnect successful within session timeout – checkning zNode with same GUIDno extra zNode created
  • 24. 24CONFIDENTIAL • Performance Considerations • Using Observers to scale • Using Hierarchical Quorums for multi-datacenter setup • Surviving network partition with read-only mode Zookeeper in the real world
  • 25. 25CONFIDENTIAL • Replicated data is kept entirely in-memory by zookeper processes • full GC can drop out a server from the ensemble • Synchronous filesystem writes in commit phase • can take seconds on an overloaded storage device • use dedicated device for zookeeper transaction logs • Maximum zNode size is 1M by default • data + metadata should fit in • configurable using a system property, but increasing it is not recommended • Watches and performance • Too many watches on a single node – herd effect • Too many watches overall – increases memory footprint Performance considerations
  • 26. 26CONFIDENTIAL notifycommitvoteproposepropagate Using Observers to scale client follower leader follower follower setData sync return callback called watch triggered observer observers: • no proposals • no votes • can’t be leaders time
  • 27. 27CONFIDENTIAL Hierarchical Quorums zk5 zk4 zk6 zk8 zk7 zk9 zk2 zk1 zk3 Majority quorums: • any 4 zk failures are tolerated A datacenter goes down • remaining ensemble becomes much less resilient Hierarchical quorums: • Disjoint groups are formed • Quorum requires majority of votes from the majority of groups • 5 failures can be tolerated • Better for clusters spanning multiple datacenters group 1 group 2 group 3
  • 28. 28CONFIDENTIAL Read-only mode zk5 zk4 zk6 zk8 zk7 zk9 zk2 zk1 zk3 Network partitions, a datacenter gets detached Partitioned zookeepers can operate in read-only mode • not connected to the ensemble • no writes allowed • read requests are still served By default read-only mode is disabled zk2 zk1 zk3
  • 29. 29CONFIDENTIAL • ACLs • Quota support • Authentication support • Transaction logging • Connection state handling • Weighted hierarchical quorums • Configuration • Dynamic reconfiguration • ... • More info: • ZooKeeper documentation http://zookeeper.apache.org/doc/trunk/index.html • Curator resources http://curator.apache.org • ZAB protocol in detail http://web.stanford.edu/class/cs347/reading/zab.pdf http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast- protocol.2008.pdf • ZooKeeper book http://shop.oreilly.com/product/0636920028901.do Topics not covered