Diego: Re-envisioning the Elastic Runtime (Cloud Foundry Summit 2014)

8,637 views

Published on

Keynote delivered by Onsi Fakhouri, Engineering Manager at Pivotal.

Diego is a ground-up rewrite of the DEA - a major component of the Cloud Foundry Elastic Runtime. This talk will motivate the need for Diego, the philosophy behind Diego, and present a few choice technical details to illustrate some of the more interesting ideas we've been playing with.

Published in: Technology, Business
0 Comments
13 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
8,637
On SlideShare
0
From Embeds
0
Number of Embeds
3,456
Actions
Shares
0
Downloads
337
Comments
0
Likes
13
Embeds 0
No embeds

No notes for slide

Diego: Re-envisioning the Elastic Runtime (Cloud Foundry Summit 2014)

  1. 1. Onsi Fakhouri DIEGOElasticRuntime 2.0 TECHNICAL
  2. 2. What? Why? Show me… The future DIEGOElasticRuntime 2.0
  3. 3. DIEGOElasticRuntime 2.0 What? Why? Show me… The future
  4. 4. Cloud Controller What is being rewritten? Stage App Run n App Instances (and keep them running) http://… Push App > cf Route to App
  5. 5. DEA Pool (Droplet Execution Agent) What is being rewritten? http://… Push App > cf Cloud Controller Router (API)
  6. 6. What is being rewritten? http://… Push App > cf Cloud Controller Router DEA Pool (Droplet Execution Agent) (API)
  7. 7. What is being rewritten? http://… Push App > cf Cloud Controller Router DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps(API)
  8. 8. What is being rewritten? http://… Push App > cf Cloud Controller Router DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization (API)
  9. 9. What is being rewritten? http://… Push App > cf Cloud Controller Router DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization Health Manager (API)
  10. 10. What is being rewritten? Push App http://… > cf Cloud Controller Router Health Manager DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization NATS (message bus) (API)
  11. 11. What is being rewritten? Push App http://… > cf Cloud Controller Router Health Manager DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization NATS (message bus) (API)
  12. 12. What? Why? Show me… The future DIEGOElasticRuntime 2.0
  13. 13. Why rewrite? Push App http://… > cf Cloud Controller Router Health Manager NATS (message bus) DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization
  14. 14. Why rewrite? Hard to add new features Hard to maintain existing features Why?
  15. 15. Why rewrite? Cloud Controller Router Health Manager NATS (message bus) DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization Tight Coupling Poor separation of concerns Or ch es tr at ion
  16. 16. Why rewrite? Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale
  17. 17. Why rewrite? Tight Coupling Poor separation of concerns Or ch es tr at ion Cloud Controller > cf scale
  18. 18. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale “Make it so”
  19. 19. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale start/stop
  20. 20. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden
  21. 21. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden
  22. 22. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden
  23. 23. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start start
  24. 24. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start start
  25. 25. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start start
  26. 26. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start fails
  27. 27. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start fails
  28. 28. Why rewrite? Cloud Controller Tight Coupling Poor separation of concerns Or ch es tr at ion > cf scale DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden start fails Too much responsiblity
  29. 29. Why rewrite? Tight Coupling Poor separation of concerns Cloud Controller DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden DEA Warden Triangular Dependencies
  30. 30. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden DEA Warden Cloud Controller DEA Warden
  31. 31. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden DEA Warden Cloud Controller DEA Warden When it’s time to upgrade the DEAs When it’s time to upgrade the DEAs
  32. 32. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden DEA Warden Cloud Controller DEA Warden When it’s time to upgrade the DEAs we perform a rolling deploy
  33. 33. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden DEA Warden DEA Warden Cloud Controller
  34. 34. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden DEA Warden DEA Warden Cloud Controller “bye!”
  35. 35. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller “bye!” DEA Warden DEA Warden
  36. 36. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden
  37. 37. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start!
  38. 38. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start!
  39. 39. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear!
  40. 40. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear!
  41. 41. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic
  42. 42. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic
  43. 43. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic ?? ??
  44. 44. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic
  45. 45. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic
  46. 46. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden start! all clear! Problematic
  47. 47. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden all clear! Problematic start!
  48. 48. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden all clear! Problematic start!
  49. 49. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Health Manager DEA Warden DEA Warden Cloud Controller start! “bye!” DEA Warden DEA Warden all clear! Problematic start!
  50. 50. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Or ch es tr at ion
  51. 51. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Or ch es tr at ion complex interactions
  52. 52. Why rewrite? Tight Coupling Poor separation of concerns Triangular Dependencies Or ch es tr at ion hard to test complex interactions
  53. 53. Why rewrite? Tight Coupling Poor separation of concerns hard to test hard to reason through complex interactions Triangular Dependencies Or ch es tr at ion
  54. 54. Why rewrite? Domain Specific (app, app, app, app)
  55. 55. Why rewrite? Domain Specific (app, app, app, app) Push App http://… > cf Cloud Controller Router Health Manager NATS (message bus) DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization App
  56. 56. Push App http://… > cf Cloud Controller Router Health Manager NATS (message bus) DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization App Why rewrite? Domain Specific (app, app, app, app) App App Apps Apps App App App App App App App App App App App App App App App App App App
  57. 57. Why rewrite? Domain Specific (app, app, app, app) Hard to extend to new domains (e.g. cron-like jobs) Push App http://… > cf Cloud Controller Router Health Manager NATS (message bus) DEA Pool (Droplet Execution Agent) DEA Staging Apps Running Apps Warden Containerization App App App Apps Apps App App App App App App App App App App App App App App App App App App
  58. 58. Why rewrite? Platform Specific
  59. 59. DEA Staging Apps Running Apps Warden Containerization Why rewrite? Platform Specific
  60. 60. DEA Staging Apps Running Apps Warden Containerization Why rewrite? Platform Specific
  61. 61. DEA Staging Apps Running Apps Warden Containerization Why rewrite? Platform Specific DEA Staging Apps Running Apps Warden Containerization
  62. 62. DEA Staging Apps Running Apps Warden Containerization DEA Staging Apps Running Apps Warden Containerization Why rewrite? Platform Specific
  63. 63. DEA Staging Apps Running Apps Warden Containerization DEA Staging Apps Running Apps Warden Containerization Why rewrite? Platform Specific hard to maintain
  64. 64. DEA Staging Apps Running Apps Warden Containerization Why rewrite? Long-lived processes Tons of concurrency Low-level os interactions
  65. 65. Why rewrite? Platform Specific Domain Specific (app, app, app, app) Tight Coupling Poor separation of concerns Or ch es tr at ion Triangular Dependencies Hard to add new features to maintain existing features
  66. 66. What? Why? Show me… The future DIEGOElasticRuntime 2.0
  67. 67. Show me Diego Strong concurrency support Written in Golang Strongly typed Explicit error handling Promotes developer discipline Strong low-level OS support
  68. 68. Show me Diego Domain Specific (app, app, app, app) One-off Tasks (guaranteed to only run once) Long Running Processes (n monitored instances) The Right(?) Abstraction
  69. 69. Cloud Controller Show me Diego The Right(?) Abstraction
  70. 70. Cloud Controller Show me Diego The Right(?) Abstraction Executor Pool Run Tasks Launch Long Running Processes
  71. 71. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction Run Tasks Launch Long Running Processes Stager Stage App Run Task
  72. 72. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction Run Tasks Launch Long Running Processes App-Manager Run App Launch LRP Stager Stage App Run Task
  73. 73. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Run App Launch LRP Run Tasks Launch Long Running Processes Stager Stage App Run Task Express specific domain
  74. 74. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Launch LRP Run Tasks Launch Long Running Processes Stager Run Task Express specific domain In terms of generic recipes Run App Stage App
  75. 75. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Stager Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Launch LRP Run Task Run App Stage App
  76. 76. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Stager Express specific domain In terms of generic recipes Exec Recipes Exec Run Tasks Launch LRPs Rep Launch LRP Run Task Run App Stage App
  77. 77. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Stager Express specific domain In terms of generic recipes Exec Recipes Exec Garden Manage Containers Run Tasks Launch LRPs Rep Launch LRP Run Task Run App Stage App
  78. 78. Cloud Controller Executor Pool Show me Diego The Right(?) Abstraction App-Manager Stager Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers Launch LRP Run Task Run App Stage App
  79. 79. Cloud Controller Executor Pool Show me Diego App-Manager Stager Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers GenericSpecific Launch LRP Run Task Run App Stage App
  80. 80. Cloud Controller Executor Pool Show me Diego App-Manager Stager Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers GenericSpecific Launch LRP Run Task Run App Stage App New features go here! (e.g. cron-like tasks)
  81. 81. Cloud Controller Executor Pool Show me Diego App-Manager Stager Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers GenericSpecific Flexibility Launch LRP Run Task Run App Stage App New features go here! (e.g. cron-like tasks)
  82. 82. Show me Diego Platform Specific
  83. 83. Show me Diego Platform Independent ✓ Cloud Controller Executor Pool App-Manager Run App Launch LRP Stager Stage App Run Task Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers
  84. 84. Cloud Controller Executor Pool App-Manager Run App Launch LRP Stager Stage App Run Task Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers Show me Diego Platform Independent ✓ ✓ ✓ ✓ ✓ ✓ ✓
  85. 85. Cloud Controller Executor Pool App-Manager Run App Launch LRP Stager Stage App Run Task Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers Show me Diego ✓ ✓ ✓ ✓ ✓ ✓ Platform Independent ✓
  86. 86. Show me Diego Linux Backend Run Containers Win Backend Run Containers Just 2 Things: Platform Independent ✓
  87. 87. Show me Diego Linux Backend Run Containers Win Backend Run Containers Just 2 Things: Platform Independent ✓
  88. 88. Tight Coupling Poor separation of concerns Or ch es tr at ion Triangular Dependencies Show me Diego
  89. 89. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Or ch es tr at ion
  90. 90. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Start! Start! Stop! Or ch es tr at ion
  91. 91. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Or ch es tr at ion
  92. 92. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Hold auctions… Or ch es tr at ion
  93. 93. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Hold auctions… … to distribute LRPs Or ch es tr at ion
  94. 94. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Hold auctions… … to distribute LRPs
  95. 95. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Hold auctions… … to distribute LRPs Triangular Dependencies
  96. 96. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 Triangular Dependencies self managing monitoring healing
  97. 97. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 self managing monitoring healing Triangular Dependencies
  98. 98. Health Manager Cloud Controller Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec Want 3 self managing monitoring healing eventually consistent Triangular Dependencies
  99. 99. Show me Diego Cloud Controller Rep Exec Rep Exec Rep Exec Rep Exec Want 3 self managing monitoring healing eventually consistent
  100. 100. Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec self managing monitoring healing eventually consistent robust Cloud Controller Want 3
  101. 101. Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec but…
  102. 102. Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec distributed auction is complex emergent behavior
  103. 103. Show me Diego Rep Exec Rep Exec Rep Exec Rep Exec distributed auction is complex emergent behavior Simulation-Driven Development
  104. 104. Show me Diego complex interactions hard to test hard to reason through
  105. 105. Show me Diego simulation driven complex interactions hard to test hard to reason through
  106. 106. complex interactions hard to test hard to reason through Show me Diego simulation driven Cloud Controller Executor Pool App-Manager Run App Launch LRP Stager Stage App Run Task Express specific domain In terms of generic recipes Run Tasks Launch LRPs Rep Exec Recipes Exec Garden Manage Containers Linux Backend Run Containers
  107. 107. Show me Diego executor rep stager 14 small single-responsibility components! app-manager auctioneer converger etcd-metrics-server etcd file-server garden linux-circus metricz route-emitter tps simulation driven complex interactions hard to test hard to reason through
  108. 108. Show me Diego executor rep stager app-manager auctioneer converger etcd-metrics-server etcd file-server garden linux-circus metricz route-emitter tps ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓✓ ✓ ✓ ✓ unit-tested✓ simulation driven complex interactions hard to test hard to reason through
  109. 109. Show me Diego executor rep stager app-manager auctioneer converger etcd-metrics-server etcd file-server garden linux-circus metricz route-emitter tps ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓✓ ✓ ✓ ✓ ?unit-tested✓ simulation driven complex interactions hard to test hard to reason through
  110. 110. Show me Diego rep✓ garden ✓ linux-circus✓ auctioneer✓ metricz✓ route-emitter✓ stager✓ app-manager✓ executor✓ file-server ✓ tps✓ etcd✓ converger ✓ etcd-metrics-server✓ unit-tested✓ simulation driven Actors complex interactions hard to test hard to reason through
  111. 111. Show me Diego unit-tested✓ simulation driven Diego is a play Actors rep✓ garden ✓ linux-circus✓ auctioneer✓ metricz✓ route-emitter✓ stager✓ app-manager✓ executor✓ file-server ✓ tps✓ etcd✓ converger ✓ etcd-metrics-server✓ complex interactions hard to test hard to reason through
  112. 112. Show me Diego rep✓ garden ✓ linux-circus✓ auctioneer✓ metricz✓ route-emitter✓ stager✓ app-manager✓ executor✓ file-server ✓ tps✓ etcd✓ converger ✓ etcd-metrics-server✓ communication and role encoded via shared library script shared narrative unit-tested✓ simulation driven Diego is a play Actors complex interactions hard to test hard to reason through
  113. 113. Show me Diego executor rep stager app-manager auctioneer converger etcd-metrics-server etcd file-server garden linux-circus metricz route-emitter tps ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ communication and role encoded via shared library script ✓integration tests ✓ Diego is a play Actors shared narrative unit-tested✓ simulation driven complex interactions hard to test hard to reason through
  114. 114. Show me Diego complexity in a distributed system of this scope is real and necessary Diego embraces this and tries to make its complexity: explicit transparent ∴ easier to reason about integration tests ✓ shared narrative unit-tested✓ simulation driven complex interactions hard to test hard to reason through
  115. 115. Show me Diego flexible abstraction extensible robust agile Tasks/LRPs Platform-Independent SELFManaging Handle on Complexity
  116. 116. What? Why? Show me… The future DIEGOElasticRuntime 2.0
  117. 117. The future staging running + buildpacks placement pools .NET process types auto-rebalancing 0-downtime deploys dockerfiles custom health-checks shell access persistent disk
  118. 118. DIEGOElasticRuntime 2.0 Rep Exec Rep Exec Rep Exec Rep Exec

×