These are the slides from a talk I recently gave at a Terraform meetup on how we do rolling ECS host updates. We use an event emitter based architecture to update our ECS hosts. Each time Terraform creates a new launch configuration, it emits an AWS Cloudwatch Event. This event is used to trigger an AWS Step function which detaches the old instances from the ASG, waits for capacity, migrates all the ECS services to the new instances and terminates the old EC2 instances.
12. Update EC2 hosts in a ECS cluster: Use cases
➢ You have a custom AMI for your ECS cluster(s).
➢ You want to always rollout the latest ECS-optimized AMIs.
➢ You want to rotate the admin keys.
➢ Change Instance type.
➢ Use an updated user_data script.
13. Update EC2 hosts in a ECS cluster: The process
➢ Terraform emits an AWS cloudwatch event once launch
configuration was created.
➢ Detach “old instances“ from ASG and wait for capacity.
➢ “Move” services from old instances to new instances.
➢ Terminate old instances when no more tasks running.
➢ Alert on failures.
14. Terraform + AWS Events + AWS Step functions =
Awesome
I created a new
launch configuration
lc-1234 for ASG
asg-1234 belonging
to ECS cluster
cluster-A
15. AWS CloudWatch Events
time
Task A
started
bar
Task C
started
Task B
stopped
ECS
Host
bla
baz
custom event
custom event
custom event
26. #1 ECS agent disconnects - Initial solution
➢ Cron job on ECS hosts to notify via SNS event and restart
ECS agent.
➢ Chances of ECS agent failing again due to some inherent
problem within the instance are high.
28. #1 ECS agent disconnects - Better solution
➢ Detect ECS agent disconnects.
➢ Bootup new ECS host and wait for it to be healthy.
➢ “Move” all the existing containers from the problematic
instance to a new Instance.
➢ Terminate the problematic instance.
➢ Alert on failures.
30. #1 ECS agent disconnects: Detection
How do we detect ECS agent disconnects?
AWS Cloudwatch EVENTS to the
rescue!!!
31. #1 ECS agent disconnects: ECS Events
time
Task A
started
bar
Task C
started
Task B
stopped foo baz
ECS agent
disconnected
ECS agent
connected
ECS agent
disconnected