Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Scheduling Containers on Amazon ECS

One of the core principles behind the design of Amazon ECS is the separation of the scheduling logic from the state management. This allows you to use the Amazon ECS schedulers, write your own schedulers, or integrate with third party schedulers.

In this session we will explore the advanced cluster management capabilities of Amazon ECS and dive deep into the Amazon ECS Service Scheduler, which supports long-running applications by monitoring container health, restarting failed containers, and load balancing across containers. We will explain how you can communicate with the Amazon ECS API in order to integrate your own custom schedulers. We will then walk through how we built an Apache Mesos scheduler driver that enables you to integrate Mesos scheduling frameworks to work with Amazon ECS without requiring a Mesos cluster. We will also demo using Marathon to schedule Docker containers on an Amazon ECS cluster.

AWS DevDay San Francisco, June 21, 2016.
Presenter: Dan Gerdesmeier, Sr. Software Development Engineer

  • Login to see the comments

Scheduling Containers on Amazon ECS

  1. 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Dan Gerdesmeier, Sr. Software Development Engineer June 21st, 2016 Scheduling Containers on Amazon ECS
  2. 2. What is Amazon ECS? Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service. You can use Amazon ECS to schedule the placement of containers across your cluster. You can also integrate your own scheduler or third-party scheduler to meet business or application specific requirements.
  3. 3. Agenda Introduction to Scheduling Resource Management in ECS Scheduling in ECS Demo
  4. 4. Introduction to Scheduling
  5. 5. Why is Scheduling Important? Utilization and Cost Application Performance Placement and Launch Time Application Availability
  6. 6. Types of Schedulers Monolithic Schedulers Two-Level Schedulers Shared-State Schedulers
  7. 7. Monolithic Scheduler
  8. 8. c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge Scheduler
  9. 9. Monolithic Scheduler Pros • Entire State • Consistent View • Easy to Understand Cons • Head-of-Line blocking • Difficult to maintain • Single Point of Failure
  10. 10. Two-Level Scheduling
  11. 11. c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge Resource Manager Scheduler 1 Scheduler 2 Scheduler 3
  12. 12. Two-Level Scheduling Pros • Avoids Contention • Enables Multiple Schedulers Cons • Partial State • Difficult to Repartition • Difficult to Achieve High Utilization
  13. 13. Shared State
  14. 14. c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge Resource Manager Scheduler 1 Scheduler 2 Scheduler 3
  15. 15. c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge Resource Manager Scheduler 1 Scheduler 2 Scheduler 3
  16. 16. Shared State Scheduling Pros • Full State • Enables Multiple Schedulers • No Head-Of-Line Blocking Cons • Contention can Slow Progress • Multiple Copies of State • Stale Information
  17. 17. Tiered Scheduling
  18. 18. c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge Resource Manager Scheduler 1
  19. 19. Resource Management
  20. 20. Maintains Available Resources Tracks Resource Changes Accepts Resource Requests Guarantees Accuracy and Consistency What is a Resource Manager?
  21. 21. CPU Memory Ports Disk Space Disk IOPS Network Bandwidth Resources
  22. 22. ECS Agent Docker Task Container Instance Container ECS Agent Task Container
  23. 23. Available Resources register-container-instance --total-resources [ { “name” : “cpu”, “type” : “integerValue”, “integerValue” : 2048 }, … ]
  24. 24. Modifying Exposed Resources
  25. 25. Requesting Resources register-task-definition -–family demo –-cli-input-json { "name": "simple-demo", "image": "my-demo", "cpu": 10, "memory": 500, "portMappings": [ { "containerPort": 80, "hostPort": 80 } ... }
  26. 26. Accepting Resource Requests
  27. 27. Tasks Shared Data Volume Containers launch Container Instance Volume Definitions Container Definitions
  28. 28. Starting a Task API User / Scheduler StartTask
  29. 29. Starting a Task API User / Scheduler StartTask Cluster Management Engine
  30. 30. Starting a Task API User / Scheduler StartTask Cluster Management Engine Agent Communication
  31. 31. Starting a Task API User / Scheduler StartTask Cluster Management Engine Agent Communication Docker Container Instance ECS Agent Task Container WebSocket
  32. 32. Starting a Task API User / Scheduler StartTask Cluster Management Engine Agent Communication Docker Task Container Instance Container ECS Agent Task Container SubmitStateChange
  33. 33. Tracking Resource Changes
  34. 34. Terminated Task API User / Scheduler StartTask Cluster Management Engine Agent Communication Docker Task Container Instance Container ECS Agent SubmitStateChange
  35. 35. Missing Container Instance API User / Scheduler StartTask Cluster Management Engine Docker Task Container Instance Container ECS Agent ? Agent Communication
  36. 36. Terminated Container Instance API User / Scheduler StartTask Cluster Management Engine Agent Communication Termination Notifier Docker Task Container Instance Container ECS Agent
  37. 37. Guaranteeing Accuracy and Consistency
  38. 38. Amazon ECS under the Hood IDN-1 IDN IDN+1 IDN+2 IDN+3 IDN+4 IDN+5 IDN+6 IDN+5 WRITE READ
  39. 39. Amazon ECS under the Hood IDN-1 IDN IDN+1 IDN+2 IDN+3 IDN+4 IDN+5 IDN+6IDN+3 IDN+5IDN+2 WRITE WRITE READREAD
  40. 40. Scheduling in ECS c3.xlarge c3.xlarge c3.xlarge r3.8xlarge r3.8xlarge r3.8xlarge c3.8xlarge c3.8xlarge c3.8xlarge c3.4xlarge c3.4xlarge c3.4xlarge r3.2xlarge r3.2xlarge r3.2xlarge
  41. 41. ECS Schedulers Batch Jobs ECS Task scheduler Run tasks once Batch jobs RunTask (random) StartTask (placed) Long-Running Apps ECS Service scheduler Health management Scale-up and scale-down AZ aware Grouped Containers
  42. 42. Custom Schedulers 1. Calls the ECS List* and Describe* API operations to determine the current state of the cluster. 2. Selects one (or more) container instances according to the logic implemented. 3. Calls StartTask API to start a task on the selected container instance.
  43. 43. Scheduling API Examples start-task --cluster default --task-definition demo:1 --container-instance be21c208-1554-4e38-b9e2-fe236bf0a555 run-task --task-definition demo:1 create-service --task-definition demo:1 --service-name demo-service --desired-count 1 --deployment-configuration maximumPercent=200, minimumHealthyPercent=100
  44. 44. Integration with third party schedulers ECS allows you to use third party schedulers, e.g. Marathon and Chronos Integration via ECS API For Mesos schedulers, the ECSSchedulerDriver interprets the command given when scheduling jobs with Mesos and starts a task with TaskDefinition family:revision https://github.com/awslabs/ecs-mesos-scheduler-driver
  45. 45. Multiple Schedulers on the Same Cluster
  46. 46. Amazon ECS Service Scheduler
  47. 47. Service Scheduling Responsibilities Determine Desired State Check Against Current State Perform Action
  48. 48. Discovering Differences Deployment Status Desired Pending Running ecs-svc/1 PRIMARY 5 0 0 Minimum Healthy Maximum Healthy 50% 200%
  49. 49. Steady State Determine Placement Options Deploy Task Service State Machine RUNNING == DESIRED RUNNING != DESIRED && STATUS == PRIMARY ALL_RUNNING < MAX_HEALTHY
  50. 50. Discovering Differences Deployment Status Desired Pending Running ecs-svc/2 PRIMARY 10 0 0 ecs-svc/1 ACTIVE 5 0 5 Minimum Healthy Maximum Healthy 50% 200%
  51. 51. Steady State Determine Placement Options Deploy Task RUNNING == DESIRED RUNNING != DESIRED && STATUS == PRIMARY ALL_RUNNING < MAX_HEALTHY Service State Machine Clear Deployment Kill Task ALL_RUNNING > MAX_HEALTHY RUNNING != DESIRED && STATUS == ACTIVE Mark Inactive ALL_RUNNING == 0
  52. 52. Steady State Determine Placement Options Deploy Task RUNNING == DESIRED RUNNING != DESIRED && STATUS == PRIMARY ALL_RUNNING < MAX_HEALTHY Service State Machine Clear Deployment Wait for Drain ALL_RUNNING > MAX_HEALTHY RUNNING != DESIRED && STATUS == ACTIVE Mark Inactive ALL_RUNNING == 0 Deregister & Kill Task Register ELB
  53. 53. Other Considerations Task AutoScaling Availability Zone Balancing Permissions and Errors Task Health
  54. 54. Demo
  55. 55. Thank You!

×