DISTRIBUTED COORDINATION
WITH PYTHON
BenBangert
mozilla
Toolsof theTrade
DISTRIBUTED COORDINATION IS NOT...
• Distributed Databases (Cassandra, Riak)
• Distributed Computing (Hadoop, etc.)
• Dist...
TheCommonElement
Apache Zookeeper
ZooKeeperisacentralizedservicefor
maintainingconfigurationinformation,
naming,providingdistributedsynchronization,
andprov...
ZOOKEEPER
WHY NOT USE...
• Memcached?
• MongoDB?
• Postgres/MySQL?
Hierarchical data structure in znodes
• Session Based
• Znode watches
• Ephemeral and Sequential Znodes
• Last for duration of client session
• Session dies when connection is closed or expires
• Can’t have children znodes
EPH...
SEQUENTIAL ZNODES
• Supply a node name (or not), get node name back with a trailing sequence
number (0001, 0002, 0003, etc...
BASIC COMMANDS
• create(PATH, DATA...)
• get(PATH...)
• get_children(PATH...)
• set(PATH, DATA...)
• delete(PATH...)
PYTHON CLIENTS
• txzookeeper
• kazoo
• unified client that works with gevent
• implements wire protocol in pure Python
USE KAZOO
EASY TO USE
from kazoo.client import KazooClient
client = KazooClient()
client.start()
USE CASES
CONFIGURATION
• Store settings in node data
• Organize node structure
• Set watches on nodes of interest
PARTY MEMBERSHIP
• Join a party, find out who else is around
• Elect a leader if desired
• Recipe in Kazoo
LOCKS
• Lock a resource for a single client
• Lock a resource for multiple clients (Semaphore)
• Hard to write properly
• ...
BUILDING HIGHER LEVEL
ABSTRACTIONS
ON
ZOOKEEPER
CAVEAT
DO NOT IMPLEMENT YOURSELF
USE THE RECIPE
BASIC STEPS
• Create lock parent node if needed
• Create ephemeral+sequence node under parent, store node name
returned
• ...
THINGS TO WATCH OUT FOR
• Avoid the thundering herd, use watches only when needed
• When our node isn’t the lowest, watch ...
HANDLING FAILURE
ROBUST CODE TAKES EFFORT
• What happens when a server fails?
• What happens when the client fails?
• What happens when we ...
STOPPING WHEN UNCERTAIN
A BIT BETTER VERSION...
EVEN BETTER
FAILURE WILL HAPPEN
• Fail fast, fail completely.
• Session expiration is a good time to sys.exit
• Always include jitter ...
• Distributed systems are hard
• Use existing battle-proven tools (Zookeeper, Kazoo)
• Always consider everything that can...
FIN
QUESTIONS?
Distributed Coordination with Python
Distributed Coordination with Python
Distributed Coordination with Python
Distributed Coordination with Python
Distributed Coordination with Python
Distributed Coordination with Python
Upcoming SlideShare
Loading in...5
×

Distributed Coordination with Python

1,199

Published on

This talk covers why Apache Zookeeper is a good fit for coordinating processes in a distributed environment, prior Python attempts at a client and the current state of the art Python client library, how unifying development efforts to merge several Python client libraries has paid off, features available to Python processes, and how to gracefully handle failures in a set of distributed processes.

Published in: Technology
0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,199
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
32
Comments
0
Likes
6
Embeds 0
No embeds

No notes for slide

Distributed Coordination with Python

  1. 1. DISTRIBUTED COORDINATION WITH PYTHON BenBangert mozilla
  2. 2. Toolsof theTrade
  3. 3. DISTRIBUTED COORDINATION IS NOT... • Distributed Databases (Cassandra, Riak) • Distributed Computing (Hadoop, etc.) • Distributed Event Analysis (Storm)
  4. 4. TheCommonElement
  5. 5. Apache Zookeeper
  6. 6. ZooKeeperisacentralizedservicefor maintainingconfigurationinformation, naming,providingdistributedsynchronization, andprovidinggroupservices.
  7. 7. ZOOKEEPER
  8. 8. WHY NOT USE... • Memcached? • MongoDB? • Postgres/MySQL?
  9. 9. Hierarchical data structure in znodes
  10. 10. • Session Based • Znode watches • Ephemeral and Sequential Znodes
  11. 11. • Last for duration of client session • Session dies when connection is closed or expires • Can’t have children znodes EPHEMERAL ZNODES
  12. 12. SEQUENTIAL ZNODES • Supply a node name (or not), get node name back with a trailing sequence number (0001, 0002, 0003, etc.) • Can be combined with ephemeral flag
  13. 13. BASIC COMMANDS • create(PATH, DATA...) • get(PATH...) • get_children(PATH...) • set(PATH, DATA...) • delete(PATH...)
  14. 14. PYTHON CLIENTS • txzookeeper • kazoo • unified client that works with gevent • implements wire protocol in pure Python
  15. 15. USE KAZOO
  16. 16. EASY TO USE from kazoo.client import KazooClient client = KazooClient() client.start()
  17. 17. USE CASES
  18. 18. CONFIGURATION • Store settings in node data • Organize node structure • Set watches on nodes of interest
  19. 19. PARTY MEMBERSHIP • Join a party, find out who else is around • Elect a leader if desired • Recipe in Kazoo
  20. 20. LOCKS • Lock a resource for a single client • Lock a resource for multiple clients (Semaphore) • Hard to write properly • Recipe in Kazoo
  21. 21. BUILDING HIGHER LEVEL ABSTRACTIONS ON ZOOKEEPER
  22. 22. CAVEAT
  23. 23. DO NOT IMPLEMENT YOURSELF USE THE RECIPE
  24. 24. BASIC STEPS • Create lock parent node if needed • Create ephemeral+sequence node under parent, store node name returned • Get children of lock node • Sort children list by sequence number • First child in the list has the lock!
  25. 25. THINGS TO WATCH OUT FOR • Avoid the thundering herd, use watches only when needed • When our node isn’t the lowest, watch the one in front of us • Only one client wanting a lock is ‘woken’ when the lock is released by a different client
  26. 26. HANDLING FAILURE
  27. 27. ROBUST CODE TAKES EFFORT • What happens when a server fails? • What happens when the client fails? • What happens when we don’t know if the server has failed?
  28. 28. STOPPING WHEN UNCERTAIN
  29. 29. A BIT BETTER VERSION...
  30. 30. EVEN BETTER
  31. 31. FAILURE WILL HAPPEN • Fail fast, fail completely. • Session expiration is a good time to sys.exit • Always include jitter (kazoo includes jitter on its connection and command retry operations) • Consider what exceptions can occur in any code relying on a distributed system
  32. 32. • Distributed systems are hard • Use existing battle-proven tools (Zookeeper, Kazoo) • Always consider everything that can fail, and how • Be wary of tools that don’t tell you how they fail • Read Kyle Kingsbury’s Jepsen posts to see examples of systems failing: http://aphyr.com/tags/jepsen
  33. 33. FIN
  34. 34. QUESTIONS?
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×