Zookeeper vs ETCD 3 vs
Other Distributed
Coordination Systems
ETCD v3
API
● RAFT - Consensus algorithm
● Put
● Get
● Range - Get values of keys from a one key to another key
● Transactions - Read, compare, modify, write combinations
● Watch - On a key or a range. Streaming API
● Example : https://coreos.com/etcd/docs/latest/rfc/v3api.html
Guarantees
● Atomicity
● Consistency
● Sequential Consistency https://en.wikipedia.org/wiki/Sequential_consistency
● Serializable Isolation
● Durability
● Linearizability (Except for watches)
● References:
○ Guarantees - https://coreos.com/etcd/docs/latest/api_v3.html#kv-api-guarantees
Pros ● Has a Java Client
● Has a distributed lock implementation in v3
client. Will be moved to server side in 3.2
release.
● Incremental Snapshots - Avoid
pauses when creating snapshots.
● No garbage collection pauses - Off-
heap storage
Pros ...
● Performance of etcd 3 with zookeeper(snapshots disabled)
● Low latency
● Low storage usage
● Watchers are redesigned, replacing the older event model with one that
streams and multiplexes events over key intervals.
● gRPC is 2x faster than JSON parsing in etcd2
● Leases for TTL. Leases are also multiplexed in single stream.
Pros
● Unlike ZooKeeper or Consul that return one event per watch request, etcd can
continuously watch from the current revision.
● multiplexes watches on a single connection.
● Zookeeper looses old events, while etcd3 holds a sliding window to keep old
events so that a client’s disconnection will not lose all events occurred till it
connect back.
Cons ● Note that the client may be uncertain about the
status of an operation if it times out, or there is
a network disruption between the client and the
etcd member. etcd may also abort operations
when there is a leader election. etcd does not
send abort responses to clients’ outstanding
requests in this event.
● In a network split, minority side may still serve
(serialized) read requests.
Cons
● Java Client don’t support watches yet. Java client is immature and not tested
much.
● Serialized read requests will continue to be served in a network split where
the minority partition has the leader at the time of split.
References
● etcd CTO’s presentation and slides
● Complete API of etcd
● etcd blog post
Apache Curator
(Zookeeper)
Reference Material
● Zookeeper Tech Notes
● Usually ~10K write ops. More important is that write speed does not scale
when you increase number of servers. Read speed scales.
● ZAB for consensus.
● Even though Paxos is beautifully elegant in describing the essence of
distributed consensus, the absence of a comprehensive and prescriptive
specification has rendered it inaccessible and notoriously difficult to
implement in practical systems.
Pros
● Non-blocking full snapshots (to make eventually consistent)
● Efficient memory management.
● Reliable, has been there for a long time.
● A simplified API
● Automatic ZooKeeper connection management with retries
● Complete, well-tested implementations of ZooKeeper recipes
● A framework that makes writing new ZooKeeper recipes much easier.
● Event support
Pros ...
● In a network partition, both minority and majority partitions will start a leader
election. Therefore the minority partition will stop operations.
● In the above scenario, the watchers registered in the minority partition will be
notified with a “KeeperState.Disconnected” event. So they can connect back
to the operating partition later.
Cons
● Snapshots (where the data is written to disk, if enabled) cause zookeeper to
vary its performance and sometimes pauses (leader election + snapshot
creation).
Cons ...
● Garbage Collection
● Pauses when creating
snapshots
Consul
Overview
● Use serf. A solution for cluster membership, failure detection and
orchestration.
● Broadcast custom events.
● Gossip protocol for communication.
● Client to server -> rpc
● Complete architecture
● Ability to fire and listen events
Features
● Has a distributed key, value(KV) store for storing Service database.
● Provides comprehensive service health checking using both in-built solutions
as well as user provided custom solutions.
● Provides REST based HTTP api for interaction.
● Service database can be queried using DNS.
● Does dynamic load balancing.
● Supports single data center and can be scaled to support multiple data
centers.
● Integrates well with Docker.
Pros
● Has a java client
● Supports multiple data centers
● Focused on service discovery
● Consul use clients (too) to do health checks allowing the developers to create
large clusters without concentrating load on a small set of servers.
Cons
● Use multicasting and unicasting for member discovery. This can cause the
network to flood.
Hazlecast
Topology
● “One of the main features of Hazelcast is that it does not have a master
member. Each cluster member is configured to be the same in terms of
functionality. The oldest member (the first member created in the cluster)
automatically performs the data assignment to cluster members. If the oldest
member dies, the second oldest member takes over.”
● “Lite members are intended for use in computationally-heavy task executions
and listener registrations. Although they do not own any partitions, they can
access partitions that are owned by other members in the cluster.”
Pros
● Java Client
● Distributed implementations of java.util.{Queue, Set, List, Map}.
● Distributed implementation of java.util.concurrent.locks.Lock.
● Distributed implementation of java.util.concurrent.ExecutorService.
● Distributed MultiMap for one-to-many relationships.
● Distributed Topic for publish/subscribe messaging.
● Distributed Query, MapReduce and Aggregators.
● Synchronous (write-through) and asynchronous (write-behind) persistence.
● Transaction support.
● Specification compliant JCache implementation.
● Native Java, .NET, C++ clients, Memcache and REST clients.
● Socket level encryption support for secure clusters.
● Second level cache provider for Hibernate.
● Monitoring and management of the cluster via JMX.
● Dynamic HTTP session clustering.
● Support for cluster info and membership events.
● Dynamic discovery, scaling, partitioning with backups and fail-over.
Pros ...
● Has inbuilt Event Listeners. We can write new listeners as well.
● Awesome docs
● Almost all the features we want are inbuilt.
● No external dependencies. 1 jar. Written in java.
● Peer-to-peer
● keeps the backup of each data entry on multiple members
Cons
● USE ONLY HEAP MEMORY. NO PERSISTENT STORAGE SUPPORT FOR OPEN
SOURCE EDITION.
● Does sharding, but the docs say that they are keeping redundant copies.
What to chose?
● Extremely depends on the requirement and the used developer eco-system.
○ For java based/related environments Zookeeper will be better.
○ For Go lang related environments, etcd will be better
● If you need other services like service discovery, consul or hazelcast will be
better.
● This presentation is intended to list out pros and cons. Since this presentation
was made in the last quarter of 2016, these technologies/tools may have
changed a lot by now.
○ For example, etcd has been massively improved throughout.
Thank you!

Comparison between zookeeper, etcd 3 and other distributed coordination systems

  • 1.
    Zookeeper vs ETCD3 vs Other Distributed Coordination Systems
  • 2.
  • 3.
    API ● RAFT -Consensus algorithm ● Put ● Get ● Range - Get values of keys from a one key to another key ● Transactions - Read, compare, modify, write combinations ● Watch - On a key or a range. Streaming API ● Example : https://coreos.com/etcd/docs/latest/rfc/v3api.html
  • 4.
    Guarantees ● Atomicity ● Consistency ●Sequential Consistency https://en.wikipedia.org/wiki/Sequential_consistency ● Serializable Isolation ● Durability ● Linearizability (Except for watches) ● References: ○ Guarantees - https://coreos.com/etcd/docs/latest/api_v3.html#kv-api-guarantees
  • 5.
    Pros ● Hasa Java Client ● Has a distributed lock implementation in v3 client. Will be moved to server side in 3.2 release. ● Incremental Snapshots - Avoid pauses when creating snapshots. ● No garbage collection pauses - Off- heap storage
  • 6.
    Pros ... ● Performanceof etcd 3 with zookeeper(snapshots disabled) ● Low latency ● Low storage usage ● Watchers are redesigned, replacing the older event model with one that streams and multiplexes events over key intervals. ● gRPC is 2x faster than JSON parsing in etcd2 ● Leases for TTL. Leases are also multiplexed in single stream.
  • 7.
    Pros ● Unlike ZooKeeperor Consul that return one event per watch request, etcd can continuously watch from the current revision. ● multiplexes watches on a single connection. ● Zookeeper looses old events, while etcd3 holds a sliding window to keep old events so that a client’s disconnection will not lose all events occurred till it connect back.
  • 8.
    Cons ● Notethat the client may be uncertain about the status of an operation if it times out, or there is a network disruption between the client and the etcd member. etcd may also abort operations when there is a leader election. etcd does not send abort responses to clients’ outstanding requests in this event. ● In a network split, minority side may still serve (serialized) read requests.
  • 9.
    Cons ● Java Clientdon’t support watches yet. Java client is immature and not tested much. ● Serialized read requests will continue to be served in a network split where the minority partition has the leader at the time of split.
  • 10.
    References ● etcd CTO’spresentation and slides ● Complete API of etcd ● etcd blog post
  • 11.
  • 12.
    Reference Material ● ZookeeperTech Notes ● Usually ~10K write ops. More important is that write speed does not scale when you increase number of servers. Read speed scales. ● ZAB for consensus. ● Even though Paxos is beautifully elegant in describing the essence of distributed consensus, the absence of a comprehensive and prescriptive specification has rendered it inaccessible and notoriously difficult to implement in practical systems.
  • 13.
    Pros ● Non-blocking fullsnapshots (to make eventually consistent) ● Efficient memory management. ● Reliable, has been there for a long time. ● A simplified API ● Automatic ZooKeeper connection management with retries ● Complete, well-tested implementations of ZooKeeper recipes ● A framework that makes writing new ZooKeeper recipes much easier. ● Event support
  • 14.
    Pros ... ● Ina network partition, both minority and majority partitions will start a leader election. Therefore the minority partition will stop operations. ● In the above scenario, the watchers registered in the minority partition will be notified with a “KeeperState.Disconnected” event. So they can connect back to the operating partition later.
  • 15.
    Cons ● Snapshots (wherethe data is written to disk, if enabled) cause zookeeper to vary its performance and sometimes pauses (leader election + snapshot creation).
  • 16.
    Cons ... ● GarbageCollection ● Pauses when creating snapshots
  • 17.
  • 18.
    Overview ● Use serf.A solution for cluster membership, failure detection and orchestration. ● Broadcast custom events. ● Gossip protocol for communication. ● Client to server -> rpc ● Complete architecture ● Ability to fire and listen events
  • 19.
    Features ● Has adistributed key, value(KV) store for storing Service database. ● Provides comprehensive service health checking using both in-built solutions as well as user provided custom solutions. ● Provides REST based HTTP api for interaction. ● Service database can be queried using DNS. ● Does dynamic load balancing. ● Supports single data center and can be scaled to support multiple data centers. ● Integrates well with Docker.
  • 20.
    Pros ● Has ajava client ● Supports multiple data centers ● Focused on service discovery ● Consul use clients (too) to do health checks allowing the developers to create large clusters without concentrating load on a small set of servers.
  • 21.
    Cons ● Use multicastingand unicasting for member discovery. This can cause the network to flood.
  • 22.
  • 23.
    Topology ● “One ofthe main features of Hazelcast is that it does not have a master member. Each cluster member is configured to be the same in terms of functionality. The oldest member (the first member created in the cluster) automatically performs the data assignment to cluster members. If the oldest member dies, the second oldest member takes over.” ● “Lite members are intended for use in computationally-heavy task executions and listener registrations. Although they do not own any partitions, they can access partitions that are owned by other members in the cluster.”
  • 24.
    Pros ● Java Client ●Distributed implementations of java.util.{Queue, Set, List, Map}. ● Distributed implementation of java.util.concurrent.locks.Lock. ● Distributed implementation of java.util.concurrent.ExecutorService. ● Distributed MultiMap for one-to-many relationships. ● Distributed Topic for publish/subscribe messaging. ● Distributed Query, MapReduce and Aggregators. ● Synchronous (write-through) and asynchronous (write-behind) persistence. ● Transaction support. ● Specification compliant JCache implementation. ● Native Java, .NET, C++ clients, Memcache and REST clients. ● Socket level encryption support for secure clusters. ● Second level cache provider for Hibernate. ● Monitoring and management of the cluster via JMX. ● Dynamic HTTP session clustering. ● Support for cluster info and membership events. ● Dynamic discovery, scaling, partitioning with backups and fail-over.
  • 25.
    Pros ... ● Hasinbuilt Event Listeners. We can write new listeners as well. ● Awesome docs ● Almost all the features we want are inbuilt. ● No external dependencies. 1 jar. Written in java. ● Peer-to-peer ● keeps the backup of each data entry on multiple members
  • 26.
    Cons ● USE ONLYHEAP MEMORY. NO PERSISTENT STORAGE SUPPORT FOR OPEN SOURCE EDITION. ● Does sharding, but the docs say that they are keeping redundant copies.
  • 27.
    What to chose? ●Extremely depends on the requirement and the used developer eco-system. ○ For java based/related environments Zookeeper will be better. ○ For Go lang related environments, etcd will be better ● If you need other services like service discovery, consul or hazelcast will be better. ● This presentation is intended to list out pros and cons. Since this presentation was made in the last quarter of 2016, these technologies/tools may have changed a lot by now. ○ For example, etcd has been massively improved throughout.
  • 28.