Apache ZooKeeper TechTuesday


Published on

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Apache ZooKeeper TechTuesday

  1. 1. Apache ZooKeeper Why use it? What to expect in the future? Andrei Savu @TechTuesday
  2. 2. Outline Why use it? Crash Course Practical Example What to expect in the future (3.4.0 release)? GSoC 2010 New Contrib Work in Progress
  3. 3. Crash Course
  4. 4. What is ZooKeeper? A highly available, scalable, distributed, configuration, consensus, group membership, leader election, naming and coordination service.
  5. 5. What is ZooKeeper? (2) replicated in memory tree data structure somehow similar to a file system no partial read / writes no renames ordered updates strong persistence guarantees conditional updates (version) watches for data changes ephemeral nodes generated file names
  6. 6. ZooKeeper Data Model hierarchical namespace each znode has data and children data is read and written in its entirety
  7. 7. Basic ZooKeeper API string create(path, data, acl, flags) delete(path, expected_version) stat set_data(path, data, expected_version) (data, stat) get_data(path, watch) stat exists(path, watch) string[] get_children(path, watch)
  8. 8. ZooKeeper Service Facts: 1) all servers store a copy of the data in memory 2) the leader is elected at startup 3) followers respond to clients 4) all updates go through the leader 5) responses are sent when a majority of servers have persisted the change
  9. 9. Practical Example
  10. 10. Distributed Queue (Python) http://www.cloudera.com/blog/2009/05/building-a- distributed-concurrent-queue-with-apache-zookeeper/ http://github.com/henryr/pyzk-recipes Retry operation on ConnectionLoss: http://github.com/andreisavu/pyzk-recipes
  11. 11. GSoC 2010 3 projects / 5 months
  12. 12. 1. Monitoring & Web-based interface Status: Committed to the trunk 1. JIRA: ZOOKEEPER-701 2. Progress Tracking Wiki 3. Monitoring for Ganglia, Nagios and Cacti 1. contrib / monitoring 2. 'mntr' 4letter word 4. Web interface available as a Hue application 1. contrib / huebrowser 2. complete install instructions 3. requirements: rest gateway, Hue 1.0
  13. 13. 2. Read-Only Mode Status: Under Review (Ready to be committed) 1. JIRA: ZOOKEEPER-704 2. Progress Tracking Wiki 3. Description: "When a ZooKeeper server loses contact with over half of the other servers in an ensemble ('loses a quorum'), it stops responding to client requests. For some applications, it would be beneficial if a server still responded to read requests when the quorum is lost, but caused an error condition when a write request was attempted."
  14. 14. 3. Failure Detector Model Status: Under Review 1. JIRA: ZOOKEEPER-702 2. Progress Tracking Wiki 3. Detectors: Phi Accrual, Chen, Bertier, Fixed Heartbeat 4. Why? Check the concluding remarks on the wiki. 5. Conclusion snippet: "in scenarios where we have a changing network behavior, such in a WAN, the adaptive methods can be a good pick"
  15. 15. Contrib
  16. 16. Large Scale Pub/Sub (hedwig) 1. JIRA: ZOOKEEPER-775 2. Uses ZooKeeper and BookKeeper 3. Committed to the trunk 4. Developed at Yahoo! Research 5. Used for PNUTS cross data center replication 6. http://vimeo.com/13282102
  17. 17. Work in Progress only some interesting JIRAs
  18. 18. #834 Children for ephemerals JIRA: ZOOKEEPER-834 Allow ephemeral nodes to have children owned by the same session. Useful when publishing status information. No need to do serialization for basic data structures (hash tables) Similar to /proc in *nix systems. Examples: /agent-01/ip, /agent-01/memory, /agent-01/load
  19. 19. #829 /zookeeper/sessions/*  JIRA: ZOOKEEPER-829 Requested by HBase developers: " we'd like the ability to forcible expire someone else's ZK session "
  20. 20. Plenty of bug fixes Join the community!
  21. 21. Resources http://hadoop.apache.org/zookeeper/ http://wiki.apache.org/hadoop/ZooKeeper/ProjectDescription http://wiki.apache.org/hadoop/ZooKeeper/Tao
  22. 22. Thanks! Questions?