Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

9X5u87KWa267pP7aGX3K

413 views

Published on

Published in: Spiritual, Technology
  • Be the first to comment

  • Be the first to like this

9X5u87KWa267pP7aGX3K

  1. 1. Patterns in Distributed Computing Mike Perham http://mikeperham.com
  2. 2. Me?DataFabric
  3. 3. seti@homedistributed.net folding@home
  4. 4. They have nothing to do with us.
  5. 5. Say Hello to ProductionApp1 App2 App3 DB
  6. 6. You have a Recurring Task
  7. 7. Cron? Which machine?
  8. 8. Add a dedicated slice? App1 App2 App3Jobs DB
  9. 9. Problem:Scaling past One Machine App1 App2 App3 DB
  10. 10. Assumption:Our solution is a redundant set of processes
  11. 11. Let’s get Formal
  12. 12. DifficultiesAsynchronyLocalityFailureByzantine?
  13. 13. Hurdle:Group Membership App1 App2 App3
  14. 14. Group MembershipTruly Anonymous is impossibleAlgorithm scalability? 10 nodes = 100 messages = 20kb 1000 nodes = 1,000,000 messages = 200mb!Can run often!
  15. 15. Now that everyone is in the same room...
  16. 16. Hurdle: Consensus the process of agreeing on oneresult among a group of participants. (wikipedia)
  17. 17. Leader Election
  18. 18. Leader Election
  19. 19. Leader ElectionBreaking symmetryPerformance vs Reliability
  20. 20. ConsensusOnce we have consensus, everything elsefollowsBut at what cost?
  21. 21. Cold Reality: This stuff is unpredictable, hard totest, full of nasty edge cases.
  22. 22. So what do we do?
  23. 23. Get Real!Start making Trade-offs.
  24. 24. What are your actualreliability requirements?
  25. 25. Reliability Scale
  26. 26. Formally Correct CodeReliability Scale
  27. 27. Formally My CorrectCode Code Reliability Scale
  28. 28. Formally My Correct Code Code(and probably yours too) Reliability Scale
  29. 29. Formally My Correct Code Memcached Code(and probably yours too) Reliability Scale
  30. 30. Single Point of Failure? Oh noes!
  31. 31. Politics http://github.com/mperham/politics a Ruby library providing utilities andalgorithms for solving common distributed computing problems.
  32. 32. ProblemDatabase frequently scanned for events ofinterest, don’t want to create event dupes
  33. 33. TokenWorkerLeader Election via Memcached
  34. 34. TokenWorker NotesM processes, 1 becomes leader for the giventime period and performs the workFault tolerant, not scalableAs reliable as memcachedThe trick is the memcached::add API
  35. 35. 4 module Aggregator 5 class Engine 6 include Politics::TokenWorker 7 8 def initialize 9 register_worker dash-aggregator, :iteration_length =>60.seconds, :servers => memcached_servers10 end1112 def start13 process do14 MetricAggregator.new.aggregate15 end16 end
  36. 36. See TokenWorker ScreenCast
  37. 37. ProblemHave a known space to scan regularly, divisibleinto N partsHave 12 databases which need to be scannedfor events every two minutes
  38. 38. StaticQueueWorkerPeer-to-Peer Work Coordination
  39. 39. StaticQueueWorker Notes Work is divided into N buckets M-1 processes can be working on those N buckets concurrently Scalable and fault tolerant Peers discover via Bonjour and communicate via DRb
  40. 40. 16 module Politics 17 class QueueWorkerExample 18 include Politics::StaticQueueWorker 19 TOTAL_BUCKETS = 20 20 21 def initialize 22 register_worker ‘queue-example, TOTAL_BUCKETS, :iteration_length =>60, :servers => memcached_servers 23 end 24 25 def start 26 process_bucket do |bucket| 27 puts "PID #{$$} processing bucket #{bucket}/#{TOTAL_BUCKETS} at#{Time.now}..." 28 sleep 1.5 29 end 30 end
  41. 41. See StaticQueueWorker ScreenCast
  42. 42. What if you need the good stuff, with no SPOF?
  43. 43. Paxosa family of protocols for solving consensus in a network of unreliable processors. (wikipedia)
  44. 44. PaxosThe Holy Grail of Distributed AlgorithmsGoogle has spent tens of man-years on their C++ implementation.“Paxos Made Live”
  45. 45. PaxosPhase 1: CPrepare A B
  46. 46. PaxosPhase 1: CPrepare 24 A B 24
  47. 47. PaxosPhase 1: CPrepare 24 “Ok” “Ok” A B 24
  48. 48. PaxosPhase 2: C Accept 24/A A B 24/A
  49. 49. PaxosPhase 2: C Accept 24/A “Ok” “Ok” A B 24/A
  50. 50. Paxos NotesHappy path is easy.Recovering from failures is hard.Slow, because we need consensus.Started implementing, quickly stopped.
  51. 51. The FuturePlan to continue work on Paxos, if there isinterestOther ideas for utilities around distributedcomputing themes, contact me
  52. 52. SummaryFormal Group Membership and ConsensusInformal TokenWorker and StaticQueueWorkerPaxos
  53. 53. EOF
  54. 54. github.com/mperham/politics mikeperham.com mperham@gmail.com

×