Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced Container Scheduling

108 views

Published on

Advanced Container Scheduling

  • Be the first to comment

  • Be the first to like this

Advanced Container Scheduling

  1. 1. Advanced container scheduling Ric Harvey, Technical Developer Evangelist @ric__Harvey With a massive thanks to Abby Fuller, Senior Technical Evangelist @abbyfuller
  2. 2. What is container scheduling and why do you care?
  3. 3. Container scheduling is how your containers are placed and run on your instance.
  4. 4. Managing one container is easy Server Guest OS Bins/Libs Bins/Libs App2App1
  5. 5. Managing many containers is hard Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS Server Guest OS
  6. 6. Scheduling with ECS
  7. 7. Core components Scheduling engine Placement engine Extensions
  8. 8. Scheduling engines
  9. 9. Types of schedulers Services Batch Events Daemon
  10. 10. Placement engine
  11. 11. Task Placement Engine Name Example AMI ID attribute:ecs.ami-id == ami-eca289fb Availability Zone attribute:ecs.availability-zone == us-east-1a Instance Type attribute:ecs.instance-type == t2.small Distinct Instances type=“distinctInstances” Custom attribute:stack == prod
  12. 12. Task Placement selection Cluster Constraints Satisfy CPU, memory, and port requirements Filter for location, instance-type, AMI, or custom attribute constraints Identify instances that meet spread or binpack placement strategy Select final container instances for placement Custom Constraints Placement Strategies Apply filter
  13. 13. Supported placement strategies Binpacking Spread Affinity Distinct instance
  14. 14. Task Placement chaining Spread tasks across zone AND binpack within zone. Chain multiple strategies.
  15. 15. What does a container manager do?
  16. 16. • Track available resources • Watch resource changes • Accept resource requests • Guarantee accuracy and consistency Container managers:
  17. 17. Resource constraints • CPU • Memory • Ports • Disk Space • iOPS • Network bandwidth
  18. 18. What manages and enforces resource usage for ECS? EC2 Instance ECS Agent Docker Task Container Task Container ecs-agent
  19. 19. How do load balancers fit in?
  20. 20. What are load balancers? At a high level, load balancers do the same thing: distribute (balance) traffic between targets. Targets could be different tasks in a service, IP addresses, or EC2 instances in a cluster.
  21. 21. Different types of load balancers ELB Classic: the original. Balances traffic between EC2 instances. Application Load Balancer: request level (7). great for microservices. Path-based HTTP/HTTPS routing (/web, /messages), content based routing, IP routing. Only in VPC. Network Load Balancer: connection level (4). Route to targets (EC2, containers, IPs). High throughput, low latency. Great for spiky traffic patterns. Requires no warming. Can assign elastic IP per subnet View the entire breakdown here: https://aws.amazon.com/elasticloadbalancing/details/#details)
  22. 22. What does this have to do with scheduling? • First, ELB is what actually distributes the request. So, deployments and scheduling can be tweaked at that level: for example, changing the connection draining timeout can speed up deployments. • Secondly, your ELB can influence your resource management. For example, dynamic port allocation with ALB.
  23. 23. The importance of images
  24. 24. Docker image size • Major component of resource management is the size of your Docker images. They add up quickly, with big consequences. • The more layers you have (in general), and the larger those layers are, the larger your final image will be. This eats up disk space. • You don’t always need the recommended packages (--no-install- recommends)
  25. 25. OK, so how can I reduce image sizes? • Sharing is caring. • Use shared base images where possible • Limit the data written to the container layer • Chain RUN statements • Prevent cache misses at build for as long as possible
  26. 26. Let’s talk cache • Docker cacheing is complicated! • Calling RUN, ADD or COPY will add layers. Other instructions will not (Docker 1.10 and above) • How the cache works: starting from the current layer, Docker looks backwards at all child images to see if they use the same instruction. If so, the cache is used*** • For ADD and COPY: a checksum is used: other than with ADD and COPY, Docker looks at the string of the command, not the contents of the packages (for example, with apt-get update)
  27. 27. *** (sometimes footnotes need their own slides) So what happens if my command string is always the same, but I need to rerun the command? For example, with git commands. You can ignore the cache, or some people break it by changing in the string each time (like with a timestamp)
  28. 28. In the image itself, clean as you go: • If you download and install a package (like with curl and tar), remove the compressed original in the same layer:
  29. 29. Take advantage of the OS built-ins
  30. 30. Clean up after your images, both in the image, and on the system Docker image prune: $ docker image prune –a Alternatively, go even further with Docker system prune: $ docker system prune -a
  31. 31. Garbage collection • Clean up after your containers! Beyond image and system prune: • Make sure your orchestration platform (like ECS or K8s) is garbage collecting: • ECS • Kubernetes • 3rd party tools like spotify-gc
  32. 32. Instance registration
  33. 33. Instance registration • When an instance launches and is registered with the ECS cluster, it reports its total amount of resources
  34. 34. Modifying exposed resources • You can also modify which resources the ecs-agent exposes by configuring the agent.
  35. 35. Accepting resource requests
  36. 36. For tasks, scheduling a task starts that task if there are available resources Shared Data Volume Containers launch Container Instance Volume Definitions Container Definitions
  37. 37. Starting a task User / Scheduler StartTask API Container Instance – What set of resources should we subtract from? Task Definitions – What resources does the application need?
  38. 38. Starting a task API User / Scheduler StartTask Cluster Management Engine We take that information, check against our Regional Cluster Management Engine, and either Approve or reject the request. The Cluster Management Engine has been designed to provide distributed transactions with Availability Zone isolation. So even if there is an issue in one Availability Zone you will continue to be able to schedule to your cluster.
  39. 39. Starting a task API User / Scheduler StartTask Cluster Management Engine Agent Communication Once a request is approved we propagate down to the Agent Communication that a node needs to change its state.
  40. 40. Starting a task API User / Scheduler StartTask Cluster Management Engine Agent Communication Docker Container Instance ECS Agent Task Container WebSocket The Agent Communication Service will push this information down to the Websocket that the container instance opened.
  41. 41. Starting a task User / Scheduler StartTask Agent Communication Docker Task Container ECS Agent Task Container SubmitStateChange API Cluster Management Engine We will then acknowledge to the service that we have performed (or failed to perform) the specified action. At this point the task is now happily running and tracked, but how do we keep in sync?
  42. 42. Cluster query language
  43. 43. Filtering: match on Instance family or type
  44. 44. Filtering: match on multiple attributes
  45. 45. Filtering: match on custom attributes
  46. 46. Task Placement Examples
  47. 47. Placement: Targeting Instance Type & Zone g2.2xlarge t2.small t2.micro t2.medium t2.medium t2.small g2.2xlarge t2.small t2.small t2.medium
  48. 48. Placement: Spread across Zone and Binpack g2.2xlarge t2.small t2.micro t2.medium t2.medium t2.small g2.2xlarge t2.small g2.2xlarge t2.medium t2.micro t2.small
  49. 49. Placement: Services – Distinct Instances t2.medium g2.2xlarge t2.micro t2.small t2.small t2.small g2.2xlarge t2.small t2.small t2.small g2.2xlarge t2.small
  50. 50. Console: Getting Started with Placement
  51. 51. Console: Placement Templates to Get Started
  52. 52. Console: Customizing Placement Strategies
  53. 53. Scheduling
  54. 54. Load-based task scheduling
  55. 55. Run tasks in response to CloudWatch alarms
  56. 56. Load-based scheduling
  57. 57. Time-based task scheduling
  58. 58. Run tasks in response to a cron expression, or at a specific time
  59. 59. Time-based task scheduling • Schedule on fixed time intervals (e.g.: number of minutes, hours, or days) • Or use cron expressions. • Set Amazon ECS as a CloudWatch Events target

×