Introduction to ZooKeeper - TriHUG May 22, 2012


Published on

Presentation given at TriHUG (Triangle Hadoop User Group) on May 22, 2012. Gives a basic overview of Apache ZooKeeper as well as some common use cases, 3rd party libraries, and "gotchas"

Demo code available at

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to ZooKeeper - TriHUG May 22, 2012

  1. 1. Apache ZooKeeperAn Introduction and Practical Use Cases
  2. 2. Who am I● David Arthur● Engineer at Lucid Imagination● Hadoop user● Python enthusiast● Father● Gardener
  3. 3. Play along!Grab the source for this presentation at need Java, Ant, and bash.
  4. 4. Apache ZooKeeper● Formerly a Hadoop sub-project● ASF TLP (top level project) since Nov 2010● 7 PMC members, 8 committers - most from Yahoo! and Cloudera● Ugly logo
  5. 5. One liner"ZooKeeper allows distributed processes tocoordinate with each other through a sharedhierarchical name space of data registers"- ZooKeeper wiki
  6. 6. Who uses it?Everyone*● Yahoo!● HBase● Solr● LinkedIn (Kafka, Hedwig)● Many more*
  7. 7. What is it good for?● Configuration management - machines bootstrap config from a centralized source, facilitates simpler deployment/provisioning● Naming service - like DNS, mappings of names to addresses● Distributed synchronization - locks, barriers, queues● Leader election - a common problem in distributed coordination● Centralized and highly reliable (simple) data registry
  8. 8. Namespace (ZNodes)parent : "foo"|-- child1 : "bar"|-- child2 : "spam"`-- child3 : "eggs" `-- grandchild1 : "42"Every znode has data (given as byte[]) and canoptionally have children.
  9. 9. Sequential znodeNodes created in "sequential" mode willappend a 10 digit zero padded monotonicallyincreasing number to the name.create("/demo/seq-", ..., ..., PERSISTENT_SEQUENTIAL) x4/demo|-- seq-0000000000|-- seq-0000000001|-- seq-0000000002`-- seq-0000000003
  10. 10. Ephemeral znodeNodes created in "ephemeral" mode will bedeleted when the originating client goes away.create("/demo/foo", ..., ..., PERSISTENT);create("/demo/bar", ..., ..., EPHEMERAL); Connected Disconnected /demo /demo |-- foo `-- foo `-- bar
  11. 11. Simple APIPretty much everything lives under theZooKeeper class● create● exists● delete● getData● setData● getChildren
  12. 12. Synchronicitysync and async version of API methodsexists("/demo", null);exists("/demo", null, new StatCallback() { @Override public processResult(int rc, String path, Object ctx, Stat stat) { ... }}, null);
  13. 13. WatchesWatches are a one-shot callback mechanismfor changes on connection and znode state● Client connects/disconnects● ZNode data changes● ZNode children change
  14. 14. Demo time!For those playing along, youll need to getZooKeeper running. Using the default port(2181), run: ant zkOr specify a port like: ant zk -Dzk.port=2181
  15. 15. Things to "watch" out for● Watches are one-shot - if you want continuous monitoring of a znode, you have to reset the watch after each event● Too many clients watches on a single znode creates a "herd effect" - lots of clients get notifications at the same time and cause spikes in load● Potential for missing changes● All watches are executed in a single, separate thread (be careful about synchronization)
  16. 16. Building blocks● Hierarchical nodes● Parent and leaf nodes can have data● Two special types of nodes - ephemeral and sequential● Watch mechanism● Consistency guarantees ○ Order of updates is maintained ○ Updates are atomic ○ Znodes are versioned for MVCC ○ Many more
  17. 17. The Fun StuffRecipes:● Lock● Barrier● Queue● Two-phase commit● Leader election● Group membership
  18. 18. Demo Time!Group membership (i.e., the easy one)Recipe:● Members register a sequential ephemeral node under the group node● Everyone keeps a watch on the group node for new children
  19. 19. Lots of boilerplate● Synchronize the asynchronous connection (using a latch or something)● Handling disconnects/reconnects● Exception handling● Ensuring paths exist (nothing like mkdir -p)● Resetting watches● Cleaning up
  20. 20. What happens?● Everyone writes their own high level wrapper/connection manager ○ ZooKeeperWrapper ○ ZooKeeperSession ○ (w+)ZooKeeper ○ ZooKeeper(w+)
  21. 21. Open Source, FTW!Luckily, some smart people have open sourcedtheir ZooKeeper utilities/wrappers● Netflix Curator - Netflix/curator● Linkedin - linkedin/linkedin-zookeeper● Many others
  22. 22. Netflix Curator● Handles the connection management● Implements many recipes ○ leader election ○ locks, queues, and barriers ○ counters ○ path cache● Bonus: service discovery implementation (we use this)
  23. 23. Demo Time!Group membership refactored with Curator● EnsurePath is nice● Robust connection management is awesome● Exceptions are more sane
  24. 24. Thoughts on Curatori.e., my non-expert subjective opinions● Good level of abstraction - doesnt do anything "magical"● Doesnt hide ZooKeeper● Weird API design (builder soup)● Extensive, well tested recipe support● It works!
  25. 25. ZooKeeper in the wildSome use cases
  26. 26. Use case: Solr 4.0Used in "Solr cloud" mode for:● Cluster management - what machines are available and where are they located● Leader election - used for picking a shard as the "leader"● Consolidated config storage● Watches allow for very non-chatty steady- state● Herd effect not really an issue
  27. 27. Use case: Kafka● Linkedins distributed pub/sub system● Queues are persistent● Clients request a slice of a queue (offset, length)● Brokers are registered in ZooKeeper, clients load balance requests among live brokers● Client state (last consumed offset) is stored in ZooKeeper● Client rebalancing algorithm, similar to leader election
  28. 28. Use case: LucidWorks Big Data● We use Curators service discovery to register REST services● Nice for SOA● Took 1 dev (me) 1 day to get something functional (mostly reading Curator docs)● So far, so good!
  29. 29. Review of "gotchas"● Watch execution is single threaded and synchronized● Cant reliably get every change for a znode● Excessive watchers on the same znode (herd effect) Some new ones● GC pauses: if your application is prone to long GC pauses, make sure your session timeout is sufficiently long● Catch-all watches: if you use one Watcher for everything, it can be tedious to infer exactly what happened
  30. 30. Four letter wordsThe ZooKeeper server responds to a few "fourletter word" commands via TCP or Telnet* > echo ruok | nc localhost 2181 imokIm glad youre OK, ZooKeeper - really I am.*
  31. 31. QuorumsIn a multi-node deployment (aka, ZooKeeperQuorum), it is best to use an odd number ofmachines.ZooKeeper uses majority voting, so it cantolerate ceil(N/2)-1 machine failures andstill function properly.
  32. 32. Multi-tenancyZooKeeper supports "chroot" at the session level. You canadd a path to the connection string that will be implicitlyprefixed to everything you do: new ZooKeeper("localhost:2181/my/app");Curator also supports this, but at the application level: CuratorFrameworkFactory.builder() .namespace("/my/app");
  33. 33. Python clientDumb wrapper around C client, not veryPythonicimport zookeeperzk_handle = zookeeper.init("localhost:2181")zookeeper.exists(zk_handle, "/demo")zookeeper.get_children(zk_handle, "/demo")Stuff in contrib didnt work for me, I used astatically linked version: zc-zookeeper-static
  34. 34. Other clientsIncluded in ZooKeeper under src/contrib:● C (this is what the Python client uses)● Perl (again, using the C client)● REST (JAX-RS via Jersey)● FUSE? (strange)3rd-party client implementations:● Scala, courtesy of Twitter● Several others
  35. 35. Overview● Basics of ZooKeeper (znode types, watches)● High-level recipes (group membership, et al.)● Lots of boilerplate for basic functionality● 3rd party helpers (Curator, et al.)● Gotchas and other miscellany
  36. 36. Questions?David