Your SlideShare is downloading. ×
A Python Petting Zoo
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Python Petting Zoo

2,259
views

Published on

Python and ZooKeeper

Python and ZooKeeper

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,259
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Python Petting Zoo Python and ZooKeeper Devon Jones Senior Software Engineer knewt.ly/pettingzoo-slides
  • 2. Knewton is an education technology company with a goal ofbringing adaptive education to the masses. Knewton makes itpossible to break courses into tiny parts that are delivered toeach student as personalized, real-time recommendations.Knewton recommends the best work for an individual learner bycalculating data on what we know about a student, similarstudents, the learning objective, and the content itself at a givenpoint in time.The Knewton platform today has tens of thousands of students,will have over 600k students starting Sept. 2012, and will soonhave millions of students.
  • 3. What is this about?● What is ZooKeeper, how is it useful● State of ZooKeeper on python● The release of PettingZoo, Knewtons ZooKeeper recipes for managing a distributed machine learning cluster
  • 4. What is ZooKeeper?According to ZooKeepers Apache Site:ZooKeeper is a centralized service for maintaining configuration information,naming, providing distributed synchronization, and providing group services. Allof these kinds of services are used in some form or another by distributedapplications. Each time they are implemented there is a lot of work that goesinto fixing the bugs and race conditions that are inevitable. Because of thedifficulty of implementing these kinds of services, applications initially usuallyskimp on them, which make them brittle in the presence of change and difficultto manage. Even when done correctly, different implementations of theseservices lead to management complexity when the applications are deployed. Right, but what is it?
  • 5. What is ZooKeeper?ZooKeeper is a distributed filesystem based onthe PAXOS algorithim with a few valuablefeatures.● No single point of failure● Strictly ordered, observable state● Events● Sequential and ephemeral primitives● ACLs
  • 6. PAXOShttp://groups.csail.mit.edu/tds/papers/DePrisco/WDAG97.pdf● Distributed Finite State Machine● Pairwise connections among hosts● Rounds of Consensus Propositions conducted by leader● Time bound contracts on interactions● Useful to have internal concurrency contracts
  • 7. What is ZooKeeper for?ZooKeeper is a platform for creating protocolsfor synchronization of distributed systems.Uses include:● Distributed configuration● Queues● Implementation of distributed concurrency primitives such as locks, barriers, latches, counters, etc.In short, its a system for managing sharedstate between distributed systems.
  • 8. Consistency GuaranteesZooKeeper makes a number of promises:● Sequential Consistency● Atomicity● Single System Image● Reliability● Timeliness
  • 9. Watches● Can be set by any read operation● Will fire an event to the client who set them once and only once● Can be set on nodes or on their data.● Will always be sent to you in a fixed order
  • 10. Sequential Nodes● Appends a monotonically increasing number to the end of the znode (file)● Can be used directly for leader election● Provides communication of servers view of ordering
  • 11. Ephemeral Nodes● Only exist for the life of a connection● If your connection does not respond to a keepalive request, will disappear● Used to ensure reliability against service disruption for most recipes● Used to trigger events, such as reconfiguration if a service goes down that published discovery configs
  • 12. RecipesZooKeepers documentation explains how toimplement a number of recipes:● Barriers● Locks● Queues● Counters● Two Phase Commit● Leader Election
  • 13. RecipesAs a community, we need well tested versionsof these recipes as well as other valuableprotocols built on ZooKeeper
  • 14. State of ZooKeeper● Very low level● Coding for it is very complex● Lots of edge cases● ZooKeeper needs high level, well tested libraries● Very few complete, high level solutions exist
  • 15. ZooKeeper LibrariesThe first high level library with significant recipeimplementations has emerged from Netflix. Itsname is Curator.Unfortunately for us, its in Java, not Python.
  • 16. Curatorhttps://github.com/Netfix/curator● State ○ High level api ○ Non-Resilient Client ○ Documentation ○ Tests, Embeddable ZooKeeper for testing● Recipes ○ Leader Latch, Leader Election ○ Multiple Locks and Semaphores ○ Multiple Queues ○ Barrier, Double Barrier ○ Shared Counter/Distributed Atomic Long
  • 17. State of ZooKeeper on Python● Current state is in a lot of flux● Went from only low level bindings to a number of incomplete bindings in first half of 2012● So far nothing like Curator has emerged (but it appears to be brewing)
  • 18. One of the top ranked results for Python ZooKeeper
  • 19. Summary of Python ZooKeeper bindings● There are about 10 presently● Many suffer from not handling known edge cases in ZooKeeper● Some suffer problems with resilient connections● The following is derived from the python ZooKeeper binding census of Ben Bangert
  • 20. Official Bindings● State ○ Complete access to the ZooKeeper C bindings ○ Full of sharp edges ○ Not a resilient client ○ No recipes ○ Threads communication with ZooKeeper in a C thread ○ Foundation for most other libraries ○ Very low level
  • 21. BindingsTwisted:● txzookeepergevent:● gevent-zookeeper● kazoo
  • 22. Bindings (Cont)High level clients:● zc.zk● kazoo● twitter zookeeperOthers:● zkpython (low level)● pykeeper (not resiliant)● zoop (not resiliant)
  • 23. Bindings (Cont)Recipes only:● zktools● PettingZoo
  • 24. State of ZooKeeper on Python: Kind of a MessA project exists to merge some of the high levelbindings in an attempt to create a pythonequivalent of Curator: https://github.com/python-zkStarted by Ben Bangert to merge Kazoo & zc.zk with an attempt to implement all Curatorrecipes.
  • 25. PettingZoohttps://github.com/Knewton/pettingzoo-python● State ○ Relies on zc.zk ○ Documented, doc strings ○ Tests (mock ZooKeeper) ○ All recipes implemented in a Java version as well● Recipes ○ Distributed Config ○ Distributed Bag ○ Leader Queue ○ Role Match
  • 26. PettingZoo● In heavy development● Distributed Discovery, Distributed Bag are well tested and used in production● Leader Queue and Role Match are tested, but undeployed● PettingZoo will be ported to or merged with the kazoo effort when it is ready
  • 27. Our ProblemNeed to be able to do stream processing ofobservations of student interactions with coursematerial. This involves multiple models thathave interdependent parameters. This requires:● Sharding along different axes dependent upon the models● Subscriptions between models for parameters● Dynamic reconfiguration of the environment to deal with current load
  • 28. Distributed DiscoveryAllows services in a dynamic, distributedenvironment to be able to be quickly alerted ofservice address changes.● Most service discovery recipes only contain host:port, Distributed Discovery can share arbitrary data as well (using yaml)● Can handle load balancing through random selection of config● Handles rebalancing on pool change
  • 29. How does this help us scale?● Makes discovery of dependencies simple● Adds to reliability of system by quickly removing dead resources● Makes dynamic reconfiguration simple as additional resources become available
  • 30. Distributed Discovery Example: Writefrom pettingzoo.discovery import write_distributed_configfrom pettingzoo.utils import connect_to_zkconn = connect_to_zk( localhost:2181 )config = { header: { service_class : kestrel, metadata : { protocol : memcached , version: 1.0 } }, host: localhost , port: 22133}write_distributed_config(conn, kestrel, platform , config)
  • 31. Distributed Discovery Example: Readfrom pettingzoo.discovery import DistributedMultiDiscoveryfrom pettingzoo.utils import connect_to_zkconn = connect_to_zk( localhost:2181 )dmd = DistributedMultiDiscovery(conn)conf = Nonedef update_configs (path, updated_configs): conf.update(updated_configs)conf = dmd.load_config( kestrel, platform , callback=update_configs)
  • 32. Distributed BagRecipe for a distributed bag (dbag) that allowsprocesses to share a collection. Anyparticipant can post or remove data, alerting allothers. ● Used as a part of Role Match ● Useful for any case where processes need to share configuration determined at runtime
  • 33. How does this help us scale?● Can quickly alert processes as to who is subscribing to them● Reduces load by quickly yanking dead subscriptions● Provides event based subscriptions, making implementation simpler
  • 34. Distributed Bag ● Sequential items Item contain the actual data Items 1Item 2Item ● Can be ephemeral 3 ● Clients set delete watch on discrete items<bag> ● Token is set to id of highest item ● Clients set a child Tokens Token watch on the "Tokens" 3 node ● Can determine exact adds and deletes with a constant number of messages per delta
  • 35. Distributed Bag Exampleimport yamlfrom pettingzoo.dbag import DistributedBagfrom pettingzoo.utils import connect_to_zk...conn = connect_to_zk( localhost:2181 )dbag = DistributedBag(conn, /some/data )docs = {}def acb(cb_dbag, cb_id): docs[cb_id] = cb_dbag.get(cb_id)def rcb(cb_dbag, cb_id): docs.remove(cb.id)dbag.add_listeners(add_callback=acb, remove_callback=rcb)add_id = dbag.add(yaml.load(document), ephemeral=True)docs[add_id] = document
  • 36. Leader QueueRecipe is similar to Leader Election, but makesit easy to monitor your spare capacity.● Used in Role Match● As services are ready to do work, they create an ephemeral, sequential node in the queue.● Any member always knows if either they are in the queue or at the front● Watch lets leader know when it is elected
  • 37. How does this help us scale?● Gives a convenient method of assigning work● Makes monitoring current excess capacity easy
  • 38. Leader Queue ● Candidates register with sequential, ephemeral nodes C_1 ● Candidate sets delete watch on predecessor ● Candidate is elected when it is the smallest node<queue> C_3 ● When elected, candidate takes over its new role ● When ready, candidate removes itself from the queue C_4 ● Only one candidate needs to call get_children upon any node exiting
  • 39. Leader Queue Usage Examplefrom pettingzoo.leader_queue import LeaderQueue, Candidatefrom pettingzoo.utils import connect_to_zkclass SomeCandidate (Candidate): def on_elected (self): <do something sexy>conn = connect_to_zk( localhost:2181 )leaderq = LeaderQueue(conn)leaderq.add_candidate(SomeCandidate())
  • 40. Role MatchAllows systems to expose needed, long livedjobs, and for services to take over those jobsuntil all are filled.● Dbag used to expose jobs● Leader queue used to hold applicants● Records which jobs are presently held with ephemeral node● Lets a new process take over if a worker dies● We use it for sharding/segmentation to dynamically adjust the shards as needed
  • 41. How does this help us scale?● Core of our ability to dynamically adjust shards● Lets the controlling process adjust problem spaces and have those tasks become automatically filled● Monitoring is easy to identify who is working on what, when
  • 42. Role Match job Distributed ● Leader monitors for Bag open jobs ● Job holder creates an ephemeral assignment<match> applicant Leader Queue ● Assignment id matches job id, indicating that it is Assgn claimed assignment 1 A_2
  • 43. Future: Distributed ConfigNext project is Distributed Config.● Allows service config to be recorded and changed with a yaml config● Every process that connects creates a child node of the appropriate service● Any change in a child nodes config overrides the overall service config for that process● Any change of the parent or child fires a watch to let the process know that its config has changed
  • 44. Questions? devon@knewton.comknewt.ly/pettingzoo-slides knewt.ly/pettingzoo
  • 45. Appendix: Official Bindings● State ○ Complete access to the ZooKeeper C bindings ○ Full of sharp edges ○ Not a resilient client ○ No recipes ○ Threads communication with ZooKeeper in a C thread ○ Foundation for most other libraries ○ Very low level
  • 46. Appendix: zkpythonhttps://github.com/duncf/zkpython/improvements to a fork of the official bindings● Status ○ Resilient Client ○ No docs beyond the official bindings ○ Good tests● Recipes ○ Basic Lock (Using unique id rather than UUID)
  • 47. Appendix: pykeeperhttps://github.com/nkvoll/pykeeper● State ○ Non-resilient Client (not resilient to errors) ○ High level client ○ Documented, No doc strings ○ Tests (Requires running ZooKeeper)
  • 48. Appendix: zoophttps://github.com/davidmiller/zoop● State ○ Doesnt handle node create edge-case ○ Doesnt handle retryable exceptions ○ Documented, doc strings ○ Tests (Requires running ZooKeeper)● Recipes ○ Revocable Lock (Doesnt handle create node edge- case, uses a permanent node instead of ephemeral)
  • 49. Appendix: txzookeeperhttps://launchpad.net/txzookeeper● State ○ Resilient Client ○ Doc strings, no additional documentation ○ Supports twisted (only) ○ Well tested● Recipes ○ Basic Lock (Not using UUID) ○ Queue ○ ReliableQueue ○ SerializedQueue
  • 50. Appendix: twitter zookeeperhttps://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper● State ○ Resilient Client ○ Handles node create edge-case ○ Some documentation ○ Well Tested ○ Tied to a lot of twitter commons code, difficult to extract● Recipes ○ Service Registration/Discovery
  • 51. Appendix: gevent-zookeeperhttps://github.com/jrydberg/gevent-zookeeper/● State ○ Supports gevent ○ No documentation ○ No tests
  • 52. Appendix: Kazoohttps://github.com/nimbusproject/kazoo● Status ○ Resilient Client ○ Doc strings, no additional documentation ○ Supports gevent ○ Minimal tests● Recipes ○ Basic Lock (Uses UUID properly)
  • 53. Appendix: zc.zkhttps://github.com/python-zk/zc.zk● State ○ Resilient Client ○ High level client ○ Higher level automatic watch functionality ○ Higher level restoration of ephemeral nodes ○ Tests (Mock ZooKeeper) ○ Documentation, no doc strings● Recipes ○ Service Registration/Discovery
  • 54. Appendix: zktoolshttps://github.com/mozilla-services/zktools● State ○ Relies on zc.zk ○ Documented, doc strings ○ Tests (Requires running ZooKeepe)● Recipes ○ Shared Read/Write Locks ○ AsyncLock ○ Revocable Locks
  • 55. Appendix: PettingZoohttps://github.com/Knewton/pettingzoo-python● State ○ Relies on zc.zk ○ Documented, doc strings ○ Tests (mock ZooKeeper) ○ All Recipes implemented in a java version as well● Recipes ○ Distributed Config ○ Distributed Bag ○ Leader Queue ○ Role Match

×