Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Control Theory in Container Fleet Management


Published on

Video and slides synchronized, mp3 and slide download available at URL

Vallery Lancey covers basic principles of observing systems, controller design, and PID controllers. In particular, she dives into container scaling controllers, using both first principles and proven designs from Kubernetes and Mesos. Filmed at

Vallery Lancey is currently the Lead DevOps Engineer at Checkfront, working on infrastructure automation and reliability.

Published in: Technology
  • Be the first to comment

Control Theory in Container Fleet Management

  1. 1. Control Theory In Container Orchestration Vallery Lancey Lead DevOps Engineer, Checkfront
  2. 2. News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on! controllers-observing-systems
  3. 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco
  4. 4. @vllry Container Orchestration Fundamentals
  5. 5. @vllry Goals of Container Management ● Reproducibility. ● Cohabitation. ● Auto-management of instances.
  6. 6. @vllry System Management Traditional: a sysadmin examines the system, makes a judgement, and performs an action. Automatic: the system tracks its own state, and translates the state to some internal action.
  7. 7. @vllry Key Auto-Management Features ● Allocate appropriate resources. ● Manage network based on container health & state. ● Reap unhealthy containers. ● Maintain container headcount. ● Auto-scale container groups.
  8. 8. @vllry Control Theory
  9. 9. @vllry What is Control Theory? ● Engineering topic: how to manage a system using human and internal controls. ● Used heavily in... ○ Physical device design ○ Plant/factory management ○ Electrical engineering
  10. 10. @vllry A Controller ● Inputs dictate what the controller should do (setpoint). ● Outputs dictate what the controlled process should do.
  11. 11. @vllry Open Loop Controllers ● A controller with only inputs and outputs is an open loop controller. ● Can’t respond to feedback from the controlled process.
  12. 12. @vllry Closed Loop Controller ● Contains feedback from the process to the controller. ● The controller is able to self-correct to achieve the desired outcome.
  13. 13. @vllry
  14. 14. @vllry The Math is Unfortunate ● Control theory is split into linear (PV changes linearly with control) and nonlinear problems. ● Most of our problems are nonlinear. ● Nonlinear problems have fewer known methods, and are often reduced to simplified linear problems.
  15. 15. @vllry Applying Control Theory To Containers
  16. 16. @vllry while True { currentState = getCurrentState() desiredState = getDesiredState() makeConform(currentState, desiredState) } Process Variable Setpoint
  17. 17. @vllry Container Lifecycle: Readiness Probe ● When a container is launched, we don’t want to serve it traffic before it’s ready. ● A readiness probe uses some “OK” response (EG HTTP 200) to decide when. ● What do we need to build this? ○ Container lifecycle status ○ Probe destination ○ Probe behaviour config
  18. 18. @vllry
  19. 19. @vllry
  20. 20. @vllry
  21. 21. @vllry
  22. 22. @vllry
  23. 23. @vllry Replica Headcount ● How do we ensure the right number of container copies exist? ● Need to maintain the desired replica count (input). ● Need to check the current number of containers (feedback). ● Need to create or terminate containers accordingly (output).
  24. 24. @vllry Replication Controller
  25. 25. @vllry
  26. 26. @vllry Autoscaling
  27. 27. @vllry Autoscaling Deployments ● Need to track a specified metric (CPU use, network I/O, etc). ● Need to increase or decrease replicas if the metric is sufficiently above or below the target. ● Should respond quickly and without overcompensating.
  28. 28. @vllry Bang-Bang Controller ● Controller with upper and lower bounds, where the set point is never exactly met. ● Process is turned on when one extreme is hit, and turns off when the other is hit.
  29. 29. @vllry
  30. 30. @vllry
  31. 31. @vllry Challenges in Designing a Controller ● Accepting a “close enough” error, rather than thrashing. ● Responding quickly without overcompensating. ○ Predict the right replica setpoint. ○ Account for the delay in SP->PV propagation.
  32. 32. @vllry Delayed Response
  33. 33. @vllry Bootup Time ● Containers take time to boot (surprise!) ○ Resource allocation. ○ Image pull & app startup.
  34. 34. @vllry
  35. 35. @vllry
  36. 36. @vllry Accounting for the Delay ● Must guess if no context. ○ Can wait out the grace period, or... ○ Can define some % of the grace period to overscale after. ● Custom controllers can allow context. ○ Can have a statistical explanation of boot time. ○ Can use a custom readiness probe that shows progress (whitebox).
  37. 37. @vllry Matching Demand
  38. 38. @vllry Scale Ramp-Up ● Scaling up quickly is especially important. ● Typical controller approaches: ○ Immediately add enough replicas to satisfy load/replicas for current load. ○ Keep scaling up each loop, until satisfied. ● Can we keep scaling both fast and precise?
  39. 39. @vllry
  40. 40. @vllry
  41. 41. @vllry PID: Proportional The proportional component is a linear response to the magnitude of the error.
  42. 42. @vllry PID: Integral The integral component is a compensator. It responds to the magnitude and duration of the error.
  43. 43. @vllry PID: Derivative The derivative component is a predictor of the future error, based on the trend of the current error.
  44. 44. @vllry PID Controllers ● Use the proportional, integral, and derivative components to react, compensate, and predict for required output. ● Each component is tuned using a constant.
  45. 45. @vllry Autoscaling With a PID Controller ● Proportional and integral components drive scaling. ● Integral and derivative components increase scale speed, at the cost of overcompensating. ● Derivative is “less accurate” but can help in sharp raises/drops.
  46. 46. @vllry Autoscaling in Kubernetes ● Kubernetes uses a proportional controller (with a lot of checks and balances. ● Prioritizes gradual resolution over unstable resolution. ● Scaling (Horizontal Pod Autoscaler) updates Deployment spec - doesn’t touch pods itself.
  47. 47. @vllry In Summary ● Ensure any controller has the necessary feedback to properly achieve its outcome. ● Strictly define expectations of any controller. ● Build discrete, transparent, and testable controllers. ● Ensure shared state has a single source (CP). ● Custom controllers are common based on app behaviour and expectations.
  48. 48. @vllry Oh Yeah, Hi! ● I’m a software/systems person at Checkfront (online bookings) ● I work with Kubernetes & “cloud stuff”.
  49. 49. @vllry Thank You! Brian Liles & coordinators & staff Joe Beda Tim St. Clair
  50. 50. @vllry Audience Questions @vllry
  51. 51. Watch the video with slide synchronization on! controllers-observing-systems