SlideShare a Scribd company logo
1 of 34
Download to read offline
Managing ECS hosts with AWS lambda and step
functions
Terraform at Comtravo
Terraform at Comtravo
➢ Six environments maintained by Terraform.
➢ Integrated into our CI/CD pipeline.
➢ Each environment has:
○ 500+ AWS components.
○ 43 Lambdas.
○ 25 microservices.
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Pull request
CI/CD at Comtravo: Mono-repo Merge to master
CI/CD at Comtravo: Mono-repo Merge to master
ECS at Comtravo
ECS: Many interesting challenges
One such challenge:
Update EC2 hosts in a ECS cluster
Update EC2 hosts in a ECS cluster: Use cases
➢ You have a custom AMI for your ECS cluster(s).
➢ You want to always rollout the latest ECS-optimized AMIs.
➢ You want to rotate the admin keys.
➢ Change Instance type.
➢ Use an updated user_data script.
Update EC2 hosts in a ECS cluster: The process
➢ Terraform emits an AWS cloudwatch event once launch
configuration was created.
➢ Detach “old instances“ from ASG and wait for capacity.
➢ “Move” services from old instances to new instances.
➢ Terminate old instances when no more tasks running.
➢ Alert on failures.
Terraform + AWS Events + AWS Step functions =
Awesome
I created a new
launch configuration
lc-1234 for ASG
asg-1234 belonging
to ECS cluster
cluster-A
AWS CloudWatch Events
time
Task A
started
bar
Task C
started
Task B
stopped
ECS Host
bla baz
custom event
custom event
custom event
Terraform Event Emitter
resource "null_resource" "launch-config-update" {
provisioner "local-exec" {
command = "python ${path.module}/scripts/emit_launchconfig_event.py
--launch_configuration_name ${aws_launch_configuration.ecs-lc.name}
--autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name}
--ami ${var.aws_ami}
--cluster ${var.cluster}"
}
triggers {
launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}"
}
}
Terraform Event
{
"version": "0",
"id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55",
"source": "comtravo.terraform.alpha",
"account": "1234567890",
"time": "2018-05-09T13:35:43Z",
"region": "eu-west-1",
"resources": [
"ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
],
"detail": {
"ami": "ami-bfb5fec6",
"status": "ACTIVE",
"agentConnected": false,
"autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005",
"environment": "alpha",
"clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha"
"launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003"
},
"detailType": "ECS Launch Configuration Change"
}
AWS CloudWatch Event Rules
resource "aws_cloudwatch_event_rule" "ecs-manager" {
name = "capture-ecs-events-${terraform.workspace}"
description = "Capture ECS related events"
event_pattern = <<PATTERN
{
"source": [
"comtravo.terraform.${terraform.workspace}"
],
"detail-type": [
"ECS Launch Configuration Change"
],
"detail": {
"clusterArn": [
"arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}"
],
"status": ["ACTIVE"]
}
}
PATTERN
}
AWS Step functions
DEMO
Questions
You all have been awesome!!!
Extras
ECS Challenge #1
ECS AGENT DISCONNECTS
#1 ECS agent disconnects - Initial solution
➢ Cron job on ECS hosts to notify via SNS event and restart
ECS agent.
➢ Chances of ECS agent failing again due to some inherent
problem within the instance are high.
#1 ECS agent disconnects - Initial solution
#1 ECS agent disconnects - Better solution
➢ Detect ECS agent disconnects.
➢ Bootup new ECS host and wait for it to be healthy.
➢ “Move” all the existing containers from the problematic
instance to a new Instance.
➢ Terminate the problematic instance.
➢ Alert on failures.
#1 ECS agent disconnects - Better solution
#1 ECS agent disconnects: Detection
How do we detect ECS agent disconnects?
AWS Cloudwatch EVENTS to the
rescue!!!
#1 ECS agent disconnects: ECS Events
time
Task A
started
bar
Task C
started
Task B
stopped foo baz
ECS agent
disconnected
ECS agent
connected
ECS agent
disconnected
#1 ECS agent disconnects: Filter ECS Events
{
"detail": {
"agentConnected": [
false
],
"clusterArn": [
"arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa"
],
"status": [
"ACTIVE"
]
},
"detail-type": [
"ECS Container Instance State Change"
],
"source": [
"aws.ecs"
]
}
#1 ECS agent disconnects: Trigger step function
#1 ECS agent disconnects: ECS Events

More Related Content

What's hot

Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformRadek Simko
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Big Data Spain
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with TerraformSam Bashton
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World OptimizationDavid Golden
 
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudAWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudSharma Podila
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Philipp Garbe
 
Testing & deploying terraform
Testing & deploying terraformTesting & deploying terraform
Testing & deploying terraformFarid Neshat
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Anton Babenko
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Stephane Jourdan
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeMartin Schütte
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesHimani Agrawal
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraformPaolo Tonin
 
Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Jonathon Brouse
 

What's hot (19)

Declarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with TerraformDeclarative & workflow based infrastructure with Terraform
Declarative & workflow based infrastructure with Terraform
 
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
Elasticsearch (R)Evolution — You Know, for Search… by Philipp Krenn at Big Da...
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
 
Terraform at Scale
Terraform at ScaleTerraform at Scale
Terraform at Scale
 
London Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in ProductionLondon Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in Production
 
Terraform
TerraformTerraform
Terraform
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World Optimization
 
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the CloudAWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
AWS re:Invent 2014 talk: Scheduling using Apache Mesos in the Cloud
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
 
Testing & deploying terraform
Testing & deploying terraformTesting & deploying terraform
Testing & deploying terraform
 
Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018Terraform modules and best-practices - September 2018
Terraform modules and best-practices - September 2018
 
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
Using Terraform.io (Human Talks Montpellier, Epitech, 2014/09/09)
 
From * to Symfony2
From * to Symfony2From * to Symfony2
From * to Symfony2
 
Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Scalable Event Tracking
Scalable Event TrackingScalable Event Tracking
Scalable Event Tracking
 
Using Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal KubernetesUsing Libvirt with Cluster API to manage baremetal Kubernetes
Using Libvirt with Cluster API to manage baremetal Kubernetes
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraform
 
Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017Terraform at Scale - All Day DevOps 2017
Terraform at Scale - All Day DevOps 2017
 
Flamingo in Production
Flamingo in ProductionFlamingo in Production
Flamingo in Production
 

Similar to Zero down time ECS cluster upgrades

Zero downtime ECS host updates with Terraform
Zero downtime ECS host updates with TerraformZero downtime ECS host updates with Terraform
Zero downtime ECS host updates with TerraformPuneeth Nanjundaswamy
 
From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneySK Telecom
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界Amazon Web Services
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1JurajHantk
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsCloud Native Day Tel Aviv
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski buildacloud
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An introAndré Cruz
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...Tobias Schneck
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021WDDay
 
Oliver leech cloudstack
Oliver leech   cloudstackOliver leech   cloudstack
Oliver leech cloudstackShapeBlue
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHenning Jacobs
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage makerPhilipBasford
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupTobias Schneck
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerAWSCOMSUM
 
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Tenchi Security
 
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Alexandre Sieira
 
Serverless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationServerless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationSanghyun Lee
 

Similar to Zero down time ECS cluster upgrades (20)

Zero downtime ECS host updates with Terraform
Zero downtime ECS host updates with TerraformZero downtime ECS host updates with Terraform
Zero downtime ECS host updates with Terraform
 
From Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in SydneyFrom Kubernetes to OpenStack in Sydney
From Kubernetes to OpenStack in Sydney
 
ProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdfProxySQL at Scale on AWS.pdf
ProxySQL at Scale on AWS.pdf
 
Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)Artem Zhurbila - docker clusters (solit 2015)
Artem Zhurbila - docker clusters (solit 2015)
 
以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界以Device Shadows與Rules Engine串聯實體世界
以Device Shadows與Rules Engine串聯實體世界
 
Autoscaling in kubernetes v1
Autoscaling in kubernetes v1Autoscaling in kubernetes v1
Autoscaling in kubernetes v1
 
Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming Ceilometer + Heat = Alarming
Ceilometer + Heat = Alarming
 
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell ScruggsOrchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
Orchestration Tool Roundup - Arthur Berezin & Trammell Scruggs
 
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
Troubleshooting Strategies for CloudStack Installations by Kirk Kosinski
 
Deploying on Kubernetes - An intro
Deploying on Kubernetes - An introDeploying on Kubernetes - An intro
Deploying on Kubernetes - An intro
 
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...Kubernetes Cluster API - managing the infrastructure of  multi clusters (k8s ...
Kubernetes Cluster API - managing the infrastructure of multi clusters (k8s ...
 
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
ОЛЕКСАНДР ЛИПКО «Graceful Shutdown Node.js + k8s» Online WDDay 2021
 
Oliver leech cloudstack
Oliver leech   cloudstackOliver leech   cloudstack
Oliver leech cloudstack
 
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:InventHow Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
How Zalando runs Kubernetes clusters at scale on AWS - AWS re:Invent
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes MeetupCreating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
Creating Kubernetes multi clusters with ClusterAPI @ Stuttgart Kubernetes Meetup
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
 
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
Shopping for Vulnerabilities - How Cloud Service Provider Marketplaces can He...
 
Serverless Multi Region Cache Replication
Serverless Multi Region Cache ReplicationServerless Multi Region Cache Replication
Serverless Multi Region Cache Replication
 

Recently uploaded

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 

Recently uploaded (20)

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 

Zero down time ECS cluster upgrades

  • 1.
  • 2. Managing ECS hosts with AWS lambda and step functions
  • 4. Terraform at Comtravo ➢ Six environments maintained by Terraform. ➢ Integrated into our CI/CD pipeline. ➢ Each environment has: ○ 500+ AWS components. ○ 43 Lambdas. ○ 25 microservices.
  • 5. CI/CD at Comtravo: Mono-repo Pull request
  • 6. CI/CD at Comtravo: Mono-repo Pull request
  • 7. CI/CD at Comtravo: Mono-repo Merge to master
  • 8. CI/CD at Comtravo: Mono-repo Merge to master
  • 10. ECS: Many interesting challenges
  • 11. One such challenge: Update EC2 hosts in a ECS cluster
  • 12. Update EC2 hosts in a ECS cluster: Use cases ➢ You have a custom AMI for your ECS cluster(s). ➢ You want to always rollout the latest ECS-optimized AMIs. ➢ You want to rotate the admin keys. ➢ Change Instance type. ➢ Use an updated user_data script.
  • 13. Update EC2 hosts in a ECS cluster: The process ➢ Terraform emits an AWS cloudwatch event once launch configuration was created. ➢ Detach “old instances“ from ASG and wait for capacity. ➢ “Move” services from old instances to new instances. ➢ Terminate old instances when no more tasks running. ➢ Alert on failures.
  • 14. Terraform + AWS Events + AWS Step functions = Awesome I created a new launch configuration lc-1234 for ASG asg-1234 belonging to ECS cluster cluster-A
  • 15. AWS CloudWatch Events time Task A started bar Task C started Task B stopped ECS Host bla baz custom event custom event custom event
  • 16. Terraform Event Emitter resource "null_resource" "launch-config-update" { provisioner "local-exec" { command = "python ${path.module}/scripts/emit_launchconfig_event.py --launch_configuration_name ${aws_launch_configuration.ecs-lc.name} --autoscaling_group_name ${aws_autoscaling_group.ecs-asg.name} --ami ${var.aws_ami} --cluster ${var.cluster}" } triggers { launchConfigurationName = "${aws_launch_configuration.ecs-lc.name}" } }
  • 17. Terraform Event { "version": "0", "id": "f24d8f1c-8c3f-9b62-cb3c-54430739fc55", "source": "comtravo.terraform.alpha", "account": "1234567890", "time": "2018-05-09T13:35:43Z", "region": "eu-west-1", "resources": [ "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" ], "detail": { "ami": "ami-bfb5fec6", "status": "ACTIVE", "agentConnected": false, "autoscalingGroupName": "ct-backend-ecs-alpha-t2.large-generic20180503065507554700000005", "environment": "alpha", "clusterArn": "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-alpha" "launchConfigurationName": "ct-backend-ecs-alpha-t2.large-generic20180509133303168200000003" }, "detailType": "ECS Launch Configuration Change" }
  • 18. AWS CloudWatch Event Rules resource "aws_cloudwatch_event_rule" "ecs-manager" { name = "capture-ecs-events-${terraform.workspace}" description = "Capture ECS related events" event_pattern = <<PATTERN { "source": [ "comtravo.terraform.${terraform.workspace}" ], "detail-type": [ "ECS Launch Configuration Change" ], "detail": { "clusterArn": [ "arn:aws:ecs:${var.region}:${var.ct_account_id}:cluster/ct-backend-ecs-${terraform.workspace}" ], "status": ["ACTIVE"] } } PATTERN }
  • 20. DEMO
  • 21.
  • 23. You all have been awesome!!!
  • 25. ECS Challenge #1 ECS AGENT DISCONNECTS
  • 26. #1 ECS agent disconnects - Initial solution ➢ Cron job on ECS hosts to notify via SNS event and restart ECS agent. ➢ Chances of ECS agent failing again due to some inherent problem within the instance are high.
  • 27. #1 ECS agent disconnects - Initial solution
  • 28. #1 ECS agent disconnects - Better solution ➢ Detect ECS agent disconnects. ➢ Bootup new ECS host and wait for it to be healthy. ➢ “Move” all the existing containers from the problematic instance to a new Instance. ➢ Terminate the problematic instance. ➢ Alert on failures.
  • 29. #1 ECS agent disconnects - Better solution
  • 30. #1 ECS agent disconnects: Detection How do we detect ECS agent disconnects? AWS Cloudwatch EVENTS to the rescue!!!
  • 31. #1 ECS agent disconnects: ECS Events time Task A started bar Task C started Task B stopped foo baz ECS agent disconnected ECS agent connected ECS agent disconnected
  • 32. #1 ECS agent disconnects: Filter ECS Events { "detail": { "agentConnected": [ false ], "clusterArn": [ "arn:aws:ecs:eu-west-1:1234567890:cluster/ct-backend-ecs-qa" ], "status": [ "ACTIVE" ] }, "detail-type": [ "ECS Container Instance State Change" ], "source": [ "aws.ecs" ] }
  • 33. #1 ECS agent disconnects: Trigger step function
  • 34. #1 ECS agent disconnects: ECS Events