SlideShare a Scribd company logo
1 of 34
Download to read offline
Managing ECS hosts with AWS lambda and step
functions
Terraform at Comtravo
Terraform at Comtravo
➢ Six environments maintained by Terraform.
➢ Integrated into our CI/CD pipeline.
➢ Each environment has:
○ 500+ AWS components.
○ 43 Lambdas.
○ 25 microservices.
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Merge to master
CI/CD at Comtravo: Mono-repo Merge to master
ECS at Comtravo
ECS: Many interesting challenges
One such challenge:
Update EC2 hosts in a ECS cluster
Update EC2 hosts in a ECS cluster: Use cases
➢ You have a custom AMI for your ECS cluster(s).
➢ You want to always rollout the latest ECS-optimized AMIs.
➢ You want to rotate the admin keys.
➢ Change Instance type.
➢ Use an updated user_data script.
Update EC2 hosts in a ECS cluster: The process
➢ Terraform emits an AWS cloudwatch event once launch
configuration was created.
➢ Detach “old instances“ from ASG and wait for capacity.
➢ “Move” services from old instances to new instances.
➢ Terminate old instances when no more tasks running.
➢ Alert on failures.
Terraform + AWS Events + AWS Step functions =
Awesome
I created a new
launch configuration
lc-1234 for ASG
asg-1234 belonging
to ECS cluster
cluster-A
AWS CloudWatch Events
time
Task A
started
bar
Task C
started
Task B
stopped
ECS
Host
bla
baz
custom event
custom event
custom event
Terraform Event Emitter
resource "null_resource" "launch-config-update" {
provisioner "local-exec" {
command = "python ${path.module}/scripts/emit_launchconfig_event.py
--launch_configuration_name ${aws_launch_configuration.ecs-lc.name}
--autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name}
--ami ${var.aws_ami}
--cluster ${var.cluster}"
}
triggers {
launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}"
}
}
Terraform Event
{
"version": "0",
"id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55",
"source": "comtravo.terraform.alpha",
"account": "1234567890",
"time": "2018-05-09T13:35:43Z",
"region": "eu-west-1",
"resources": [
"ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
],
"detail": {
"ami": "ami-bfb5fec6",
"status": "ACTIVE",
"agentConnected": false,
"autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005",
"environment": "alpha",
"clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha"
"launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
},
"detailType": "ECS Launch Configuration Change"
}
AWS CloudWatch Event Rules
resource "aws_cloudwatch_event_rule" "ecs-manager" {
name = "capture-ecs-events-${terraform.workspace}"
description = "Capture ECS related events"
event_pattern = <<PATTERN
{
"source": [
"comtravo.terraform.${terraform.workspace}"
],
"detail-type": [
"ECS Launch Configuration Change"
],
"detail": {
"clusterArn": [
"arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}"
],
"status": ["ACTIVE"]
}
}
PATTERN
}
AWS Step functions
DEMO
Questions
You all have been awesome!!!
Extras
ECS Challenge #1
ECS AGENT DISCONNECTS
#1 ECS agent disconnects - Initial solution
➢ Cron job on ECS hosts to notify via SNS event and restart
ECS agent.
➢ Chances of ECS agent failing again due to some inherent
problem within the instance are high.
#1 ECS agent disconnects - Initial solution
#1 ECS agent disconnects - Better solution
➢ Detect ECS agent disconnects.
➢ Bootup new ECS host and wait for it to be healthy.
➢ “Move” all the existing containers from the problematic
instance to a new Instance.
➢ Terminate the problematic instance.
➢ Alert on failures.
#1 ECS agent disconnects - Better solution
#1 ECS agent disconnects: Detection
How do we detect ECS agent disconnects?
AWS Cloudwatch EVENTS to the
rescue!!!
#1 ECS agent disconnects: ECS Events
time
Task A
started
bar
Task C
started
Task B
stopped foo baz
ECS agent
disconnected
ECS agent
connected
ECS agent
disconnected
#1 ECS agent disconnects: Filter ECS Events
{
"detail": {
"agentConnected": [
false
],
"clusterArn": [
"arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa"
],
"status": [
"ACTIVE"
]
},
"detail-type": [
"ECS Container Instance State Change"
],
"source": [
"aws.ecs"
]
}
#1 ECS agent disconnects: Trigger step function
#1 ECS agent disconnects: ECS Events

More Related Content

What's hot

Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformRadek Simko
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Big Data Spain
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with TerraformSam Bashton
 
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudAWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudSharma Podila
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World OptimizationDavid Golden
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Philipp Garbe
 
Testing & deploying terraform
Testing & deploying terraformTesting & deploying terraform
Testing & deploying terraformFarid Neshat
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Stephane Jourdan
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Anton Babenko
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeMartin Schütte
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesHimani Agrawal
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraformPaolo Tonin
 
Orbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmOrbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmGianluca Arbezzano
 
Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Jonathon Brouse
 

What's hot (19)

Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with Terraform
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
 
Terraform at Scale
Terraform at ScaleTerraform at Scale
Terraform at Scale
 
London Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in ProductionLondon Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in Production
 
Terraform
TerraformTerraform
Terraform
 
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudAWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World Optimization
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
 
Testing & deploying terraform
Testing & deploying terraformTesting & deploying terraform
Testing & deploying terraform
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
 
From * to Symfony2
From * to Symfony2From * to Symfony2
From * to Symfony2
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018
 
Scalable Event Tracking
Scalable Event TrackingScalable Event Tracking
Scalable Event Tracking
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraform
 
Orbiter and how to extend Docker Swarm
Orbiter and how to extend Docker SwarmOrbiter and how to extend Docker Swarm
Orbiter and how to extend Docker Swarm
 
Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 

Similar to Zero downtime ECS host updates with Terraform

From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneySK Telecom
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界Amazon Web Services
 
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivSelf Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivAmazon Web Services
 
Scalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWSScalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWSFernando Rodriguez
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1JurajHantk
 
Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Julien SIMON
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...Tobias Schneck
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupTobias Schneck
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web ServicesSimone Brunozzi
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS CloudAmazon Web Services
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila
 
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...NETWAYS
 
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...NETWAYS
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An introAndré Cruz
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache MesosJoe Stein
 

Similar to Zero downtime ECS host updates with Terraform (20)

Zero down time ECS cluster upgrades
Zero down time ECS cluster upgradesZero down time ECS cluster upgrades
Zero down time ECS cluster upgrades
 
From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in Sydney
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel AvivSelf Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
Self Service Agile Infrastructure for Product Teams - Pop-up Loft Tel Aviv
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
 
Scalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWSScalable and Fault-Tolerant Apps with AWS
Scalable and Fault-Tolerant Apps with AWS
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)Running Docker clusters on AWS (November 2016)
Running Docker clusters on AWS (November 2016)
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
 
5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services5 things you don't know about Amazon Web Services
5 things you don't know about Amazon Web Services
 
5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud5 Things You Don't Know About AWS Cloud
5 Things You Don't Know About AWS Cloud
 
ProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdfProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdf
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)
 
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
OSMC 2016 - ZMON Zalandos OS approach to monitoring in the cloud and DCs by J...
 
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
OSMC 2016 | ZMON: Zalando's OS approach to monitoring in the cloud and DCs by...
 
(Re)discover your AEM
(Re)discover your AEM(Re)discover your AEM
(Re)discover your AEM
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An intro
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 

Recently uploaded

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?Watsoo Telematics
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?What are the features of Vehicle Tracking System?
What are the features of Vehicle Tracking System?
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

Zero downtime ECS host updates with Terraform

  • 1.
  • 2. Managing ECS hosts with AWS lambda and step functions
  • 4. Terraform at Comtravo ➢ Six environments maintained by Terraform. ➢ Integrated into our CI/CD pipeline. ➢ Each environment has: ○ 500+ AWS components. ○ 43 Lambdas. ○ 25 microservices.
  • 5. CI/CD at Comtravo: Mono-repo Pull request
  • 6. CI/CD at Comtravo: Mono-repo Pull request
  • 7. CI/CD at Comtravo: Mono-repo Merge to master
  • 8. CI/CD at Comtravo: Mono-repo Merge to master
  • 10. ECS: Many interesting challenges
  • 11. One such challenge: Update EC2 hosts in a ECS cluster
  • 12. Update EC2 hosts in a ECS cluster: Use cases ➢ You have a custom AMI for your ECS cluster(s). ➢ You want to always rollout the latest ECS-optimized AMIs. ➢ You want to rotate the admin keys. ➢ Change Instance type. ➢ Use an updated user_data script.
  • 13. Update EC2 hosts in a ECS cluster: The process ➢ Terraform emits an AWS cloudwatch event once launch configuration was created. ➢ Detach “old instances“ from ASG and wait for capacity. ➢ “Move” services from old instances to new instances. ➢ Terminate old instances when no more tasks running. ➢ Alert on failures.
  • 14. Terraform + AWS Events + AWS Step functions = Awesome I created a new launch configuration lc-1234 for ASG asg-1234 belonging to ECS cluster cluster-A
  • 15. AWS CloudWatch Events time Task A started bar Task C started Task B stopped ECS Host bla baz custom event custom event custom event
  • 16. Terraform Event Emitter resource "null_resource" "launch-config-update" { provisioner "local-exec" { command = "python ${path.module}/scripts/emit_launchconfig_event.py --launch_configuration_name ${aws_launch_configuration.ecs-lc.name} --autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name} --ami ${var.aws_ami} --cluster ${var.cluster}" } triggers { launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}" } }
  • 17. Terraform Event { "version": "0", "id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55", "source": "comtravo.terraform.alpha", "account": "1234567890", "time": "2018-05-09T13:35:43Z", "region": "eu-west-1", "resources": [ "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" ], "detail": { "ami": "ami-bfb5fec6", "status": "ACTIVE", "agentConnected": false, "autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005", "environment": "alpha", "clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha" "launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" }, "detailType": "ECS Launch Configuration Change" }
  • 18. AWS CloudWatch Event Rules resource "aws_cloudwatch_event_rule" "ecs-manager" { name = "capture-ecs-events-${terraform.workspace}" description = "Capture ECS related events" event_pattern = <<PATTERN { "source": [ "comtravo.terraform.${terraform.workspace}" ], "detail-type": [ "ECS Launch Configuration Change" ], "detail": { "clusterArn": [ "arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}" ], "status": ["ACTIVE"] } } PATTERN }
  • 20. DEMO
  • 21.
  • 23. You all have been awesome!!!
  • 25. ECS Challenge #1 ECS AGENT DISCONNECTS
  • 26. #1 ECS agent disconnects - Initial solution ➢ Cron job on ECS hosts to notify via SNS event and restart ECS agent. ➢ Chances of ECS agent failing again due to some inherent problem within the instance are high.
  • 27. #1 ECS agent disconnects - Initial solution
  • 28. #1 ECS agent disconnects - Better solution ➢ Detect ECS agent disconnects. ➢ Bootup new ECS host and wait for it to be healthy. ➢ “Move” all the existing containers from the problematic instance to a new Instance. ➢ Terminate the problematic instance. ➢ Alert on failures.
  • 29. #1 ECS agent disconnects - Better solution
  • 30. #1 ECS agent disconnects: Detection How do we detect ECS agent disconnects? AWS Cloudwatch EVENTS to the rescue!!!
  • 31. #1 ECS agent disconnects: ECS Events time Task A started bar Task C started Task B stopped foo baz ECS agent disconnected ECS agent connected ECS agent disconnected
  • 32. #1 ECS agent disconnects: Filter ECS Events { "detail": { "agentConnected": [ false ], "clusterArn": [ "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa" ], "status": [ "ACTIVE" ] }, "detail-type": [ "ECS Container Instance State Change" ], "source": [ "aws.ecs" ] }
  • 33. #1 ECS agent disconnects: Trigger step function
  • 34. #1 ECS agent disconnects: ECS Events