HOW TO RUN FROM AZOMBIE: CLOUDSTACKDISTRIBUTED PROCESSMANAGEMENTJohn Burwell(jburwell@apache.org | jburwell@basho.com@john...
I Am Not A Zombie• Apache CloudStack PMC Member• Consulting Engineer @ Basho Technologies• Ran operations and designed aut...
Current Process Management• No consistent system-wide model• Fail slowly, fail quietly• Resource overcommitment issues• La...
What is a cloud?Tuesday, June 25, 13
Tuesday, June 25, 13
Hopefully not ...Tuesday, June 25, 13
Tuesday, June 25, 13
Tuesday, June 25, 13
Tuesday, June 25, 13
HostsVirtualRoutersVirtualMachinesPrimaryStorageNetworksSecondaryStorageLoad
 BalancersZoneCluster PodTuesday, June 25, 13
ResourceProcess StateA
 “thing”
 with
 a
 bounded
 capacityPartitionOrchestrationTuesday, June 25, 13
At it’s core, CloudStack ...Integrates infrastructure componentsManages resourcesTuesday, June 25, 13
Tuesday, June 25, 13
ConsistencyAvailabilityPartition
 TolerancePICK 2Tuesday, June 25, 13
CloudStack provides zones, clusters,and pods to partition resources.Tuesday, June 25, 13
Orchestration operations are eventually consistentTuesday, June 25, 13
Tuesday, June 25, 13
... but resource operations must beconsistent  serialized.Tuesday, June 25, 13
Tuesday, June 25, 13
A system can not be simultaneouslyconsistent and available.Tuesday, June 25, 13
Orchestration
 ProcessesAPCP Resource
 Management
 ProcessesTuesday, June 25, 13
CP Resource?• Ordered/Serialized operations• Prevent overcommitment• Execution location independent• Lock freeTuesday, Jun...
Orchestration Coordination1. Build a list of commands to be executed against a resource2. Enqueue the list of commands to ...
ResourceProcess StateQueue11Unit
 of
 Work11ExclusiveConsumerTuesday, June 25, 13
Unit Of Work (UoW)• Definition:A ordered list of commands executed against a oneand only one resource.• Created in the Orch...
Upcoming SlideShare
Loading in …5
×

How to Run from a Zombie: CloudStack Distributed Process Management

356 views
285 views

Published on

Exploration of CloudStack's distributed process management requirements and the challenges they present in the context of CAP theorem. These challenges will be addressed through a distributed process model that emphasizes efficiency, fault tolerance, and operational transparency.

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
356
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

How to Run from a Zombie: CloudStack Distributed Process Management

  1. 1. HOW TO RUN FROM AZOMBIE: CLOUDSTACKDISTRIBUTED PROCESSMANAGEMENTJohn Burwell(jburwell@apache.org | jburwell@basho.com@john_burwell)Tuesday, June 25, 13
  2. 2. I Am Not A Zombie• Apache CloudStack PMC Member• Consulting Engineer @ Basho Technologies• Ran operations and designed automated provisioning for hybridanalytic/virtualization clouds• Led architectural design and server-side development of a SaaSphysical security platformTuesday, June 25, 13
  3. 3. Current Process Management• No consistent system-wide model• Fail slowly, fail quietly• Resource overcommitment issues• Lack of instrumentationTuesday, June 25, 13
  4. 4. What is a cloud?Tuesday, June 25, 13
  5. 5. Tuesday, June 25, 13
  6. 6. Hopefully not ...Tuesday, June 25, 13
  7. 7. Tuesday, June 25, 13
  8. 8. Tuesday, June 25, 13
  9. 9. Tuesday, June 25, 13
  10. 10. HostsVirtualRoutersVirtualMachinesPrimaryStorageNetworksSecondaryStorageLoad
  11. 11.  BalancersZoneCluster PodTuesday, June 25, 13
  12. 12. ResourceProcess StateA
  13. 13.  “thing”
  14. 14.  with
  15. 15.  a
  16. 16.  bounded
  17. 17.  capacityPartitionOrchestrationTuesday, June 25, 13
  18. 18. At it’s core, CloudStack ...Integrates infrastructure componentsManages resourcesTuesday, June 25, 13
  19. 19. Tuesday, June 25, 13
  20. 20. ConsistencyAvailabilityPartition
  21. 21.  TolerancePICK 2Tuesday, June 25, 13
  22. 22. CloudStack provides zones, clusters,and pods to partition resources.Tuesday, June 25, 13
  23. 23. Orchestration operations are eventually consistentTuesday, June 25, 13
  24. 24. Tuesday, June 25, 13
  25. 25. ... but resource operations must beconsistent serialized.Tuesday, June 25, 13
  26. 26. Tuesday, June 25, 13
  27. 27. A system can not be simultaneouslyconsistent and available.Tuesday, June 25, 13
  28. 28. Orchestration
  29. 29.  ProcessesAPCP Resource
  30. 30.  Management
  31. 31.  ProcessesTuesday, June 25, 13
  32. 32. CP Resource?• Ordered/Serialized operations• Prevent overcommitment• Execution location independent• Lock freeTuesday, June 25, 13
  33. 33. Orchestration Coordination1. Build a list of commands to be executed against a resource2. Enqueue the list of commands to the resource managementlayer for execution3. A process applies the commands to the resource4. Aggregate the results from the replyTuesday, June 25, 13
  34. 34. ResourceProcess StateQueue11Unit
  35. 35.  of
  36. 36.  Work11ExclusiveConsumerTuesday, June 25, 13
  37. 37. Unit Of Work (UoW)• Definition:A ordered list of commands executed against a oneand only one resource.• Created in the Orchestration layer• Executed by processes in the resource management layer• Failure of a command halts UoW executionTuesday, June 25, 13
  38. 38. Instrumentation• Collect and report statistics on a per resource basis• Inspect and remove pending UoWs for a resource• Kill a running process• View a history of UoWs completed by a resourceTuesday, June 25, 13
  39. 39. • Process execution fails• Resources become unavailable• Slow consumersWhen Gravity FailsTuesday, June 25, 13
  40. 40. Fail Fast; Fail Loudly• If the resource can be returned to a consistent state, reply withthe process failure• If the resource can not be returned to a consistent state, changethe transition the resource to a failure state, drain the queue ofpending UoWs, and reply with the process failure for each UoW• The orchestration layer will determine the appropriate recoverystrategy (e.g. retry request on another resource)Tuesday, June 25, 13
  41. 41. Preventing A Logjam• Bounded Queues• Request and Message Timeouts• A failure to enqueue a request or a request timeout trigger a theresource’s circuit breakerTuesday, June 25, 13
  42. 42. How could we implement this model?Tuesday, June 25, 13
  43. 43. Lightweight ThreadsA thread that is not scheduled by theoperating system -- avoiding contextswitch overhead.Tuesday, June 25, 13
  44. 44. Actor Model• An actor represents state and behavior• Communicate by message passing• Each actor is allocated a lightweight thread and mailbox• Location independentTuesday, June 25, 13
  45. 45. MailboxResourceActorFSMOrchestrationUnit
  46. 46.  of
  47. 47.  WorkTuesday, June 25, 13
  48. 48. Java Actor Frameworks• Akka (http://akka.io)• Quasar (https://github.com/puniverse/quasar)Tuesday, June 25, 13
  49. 49. Summary• Orchestration and Resource Management must be properlydivided to satisfy CAP• To provide resource serialization guarantees, assign a queueand a process to each resource• Fast fast, fail loudly• An Actor Model based on lightweight threads may provide thescalability required to dedicate a queue and process perresourceTuesday, June 25, 13
  50. 50. Thoughts? Questions?Tuesday, June 25, 13
  51. 51. Thank you!Slides available @ http://speakerdeck.com/jburwellTuesday, June 25, 13

×