Advanced Container Management and Scheduling

Advanced container
scheduling
Abby Fuller
@abbyfuller

What is container scheduling and why do
you care?

Container scheduling is how your
containers are placed and run on your
instance.

Managing one container is easy
Server
Guest OS
Bins/Libs Bins/Libs
App2App1

Managing many containers is hard
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS

Core components
Scheduling
engine
Placement engine Extensio
ns

Types of schedulers
Services Batch Event
s
Daemon

Task Placement Engine
Name Example
AMI ID attribute:ecs.ami-id == ami-eca289fb
Availability Zone attribute:ecs.availability-zone == us-east-1a
Instance Type attribute:ecs.instance-type == t2.small
Distinct Instances type=“distinctInstances”
Custom attribute:stack == prod

Task Placement selection
Cluster Constraints Satisfy CPU, memory, and port
requirements
Filter for location, instance-type, AMI, or
custom attribute constraints
Identify instances that meet spread or binpack
placement strategy
Select final container instances for placement
Custom Constraints
Placement
Strategies
Apply filter

Supported placement strategies
Binpacking Spread Affinity
Distinct instance

Task Placement chaining
Spread tasks across zone AND binpack
within zone. Chain multiple strategies.

What does a container manager do?

• Track available resources
• Watch resource changes
• Accept resource requests
• Guarantee accuracy and
consistency
Container managers:

Resource constraints
• CPU
• Memory
• Ports
• Disk Space
• iOPS
• Network bandwidth

What manages and enforces resource
usage for ECS?
EC2 Instance
ECS Agent
Docker
Task
Container
Task
Container
ecs-agent

What are load balancers?
At a high level, load balancers do the same thing: distribute
(balance) traffic between targets. Targets could be different tasks
in a service, IP addresses, or EC2 instances in a cluster.

Different types of load balancers
ELB Classic: the original. Balances traffic between EC2 instances.
Application Load Balancer: request level (7). great for microservices.
Path-based HTTP/HTTPS routing (/web, /messages), content based
routing, IP routing. Only in VPC.
Network Load Balancer: connection level (4). Route to targets (EC2,
containers, IPs). High throughput, low latency. Great for spiky traffic
patterns. Requires no warming. Can assign elastic IP per subnet
View the entire breakdown here:
https://aws.amazon.com/elasticloadbalancing/details/#details)

What does this have to do with
scheduling?
• First, ELB is what actually distributes the request. So,
deployments and scheduling can be tweaked at that level: for
example, changing the connection draining timeout can speed
up deployments.
• Secondly, your ELB can influence your resource management.
For example, dynamic port allocation with ALB.

Docker image size
• Major component of resource management is the size of your
Docker images. They add up quickly, with big consequences.
• The more layers you have (in general), and the larger those
layers are, the larger your final image will be. This eats up disk
space.
• You don’t always need the recommended packages (--no-
install-recommends)

OK, so how can I reduce image sizes?
• Sharing is caring.
• Use shared base images where possible
• Limit the data written to the container
layer
• Chain RUN statements
• Prevent cache misses at build for as
long as possible

Let’s talk cache
• Docker cacheing is complicated!
• Calling RUN, ADD or COPY will add layers. Other instructions
will not (Docker 1.10 and above)
• How the cache works: starting from the current layer, Docker
looks backwards at all child images to see if they use the same
instruction. If so, the cache is used***
• For ADD and COPY: a checksum is used: other than with ADD
and COPY, Docker looks at the string of the command, not the
contents of the packages (for example, with apt-get update)

*** (sometimes footnotes need their own
slides)
So what happens if my command string is always the same, but I
need to rerun the command? For example, with git commands.
You can ignore the cache, or some people break it by changing in
the string each time (like with a timestamp)

In the image itself, clean as you go:
• If you download and install a package (like with curl and tar),
remove the compressed original in the same layer:
RUN mkdir -p /app/cruft/
&& curl -SL http://cruft.com/bigthing.tar.xz | tar -xvf /app/cruft/

&& make -C /app/cruft/ all &&
rm /app/cruft/bigthing.tar.xz

Take advantage of the OS built-ins
RUN apt-get update && apt-get install -y
aufs-tools
automake
build-essential
ruby1.9.1
ruby1.9.1-dev
s3cmd=1.1.*
&& rm -rf /var/lib/apt/lists/*

Clean up after your images, both in the
image, and on the system
Docker image prune:
$ docker image prune –a
Alternatively, go even further with Docker system prune:
$ docker system prune -a

Garbage collection
• Clean up after your containers! Beyond image and system
prune:
• Make sure your orchestration platform (like ECS or K8s) is garbage
collecting:
• ECS
• Kubernetes
• 3rd party tools like spotify-gc

Instance registration
• When an instance launches and is registered with the ECS cluster, it reports its total amount of
resources
register-container-instance --total-resources
[
{
“name” : “cpu”,
“type” : “integerValue”,
“integerValue” : 2048
},
…
]

Modifying exposed resources
• You can also modify which resources the ecs-agent exposes by
configuring the agent.

For tasks, scheduling a task starts that
task if there are available resources
Shared Data Volume
Containers
launch
Container
Instance
Volume Definitions
Container Definitions

Starting a task
User /
Scheduler
StartTask
API
Container Instance – What set of resources should we
subtract from?
Task Definitions – What resources does the application need?

Starting a task
API
User /
Scheduler
StartTask
Cluster Management Engine
We take that information, check against our Regional Cluster Management Engine, and either Approve or
reject the request.
The Cluster Management Engine has been designed to provide distributed transactions with Availability
Zone isolation. So even if there is an issue in one Availability Zone you will continue to be able to
schedule to your cluster.

Starting a task
API
User /
Scheduler
StartTask
Agent Communication
Once a request is approved we propagate down to the Agent Communication that a node
needs to change its state.

Starting a task
API
User /
Scheduler
StartTask
Agent Communication
Docker
Container Instance
ECS Agent
Task
Container
WebSocket
The Agent Communication Service will push this information down to the Websocket that the
container instance opened.

Starting a task
User /
Scheduler
StartTask
Agent Communication
Docker
Task
Container
ECS Agent
Task
Container
SubmitStateChange
API
We will then acknowledge to the service that we have performed (or failed to perform) the
specified action.
At this point the task is now happily running and tracked, but how do we keep in sync?

Filtering: match on Instance family or
type

Filtering: match on multiple attributes

Filtering: match on custom attributes

Placement: Targeting Instance Type &
Zone
g2.2xlarge t2.small t2.micro t2.medium
t2.medium t2.small g2.2xlarge
t2.small
t2.small t2.medium

Placement: Spread across Zone and
Binpack
g2.2xlarge t2.small t2.micro t2.medium
t2.medium t2.small g2.2xlarge t2.small
g2.2xlarge t2.medium
t2.micro t2.small

Placement: Services – Distinct Instances
t2.medium g2.2xlarge
t2.micro
t2.small
t2.small t2.small g2.2xlarge t2.small
t2.small t2.small
g2.2xlarge t2.small

Console: Getting Started with Placement

Console: Placement Templates to Get
Started

Console: Customizing Placement
Strategies

Run tasks in response to CloudWatch
alarms

Run tasks in response to a cron
expression, or at a specific time

Time-based task scheduling
• Schedule on fixed time intervals (e.g.: number of minutes, hours, or days)
• Or use cron expressions.
• Set Amazon ECS as a CloudWatch Events target

Advanced Container Management and Scheduling

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Advanced Container Management and Scheduling

Similar to Advanced Container Management and Scheduling (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Advanced Container Management and Scheduling