SlideShare a Scribd company logo
Container-based Microservices DevOps in
AWS
How Perfecto Did it and What We Learned So Far
© 2018, Perfecto Mobile Ltd. All Rights Reserved.
About Perfecto
1/10/2018 2© 2018, Perfecto Mobile Ltd. All Rights Reserved.
How We Started
We started 11 years ago with developing monolith
servers in our own DCs
We were moving slowly…
1/10/2018 3© 2018, Perfecto Mobile Ltd. All Rights Reserved.
But Then we Heard Some
Buzzwords
1/10/2018 4© 2018, Perfecto Mobile Ltd. All Rights Reserved.
And we decided we want to
move faster and do more
impact on the company
business
Big Change
1/10/2018 5© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Waterfall
Monolith servers
Deployment in DC
Dependencies
Agile
Microservices
Cloud
Autonomous teams
The 3 Components of the Change
technology
methodologyculture
1/10/2018 6© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Autonomous Teams
DevOps
Dev
QA
1/10/2018 7© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Development
Continuous integration
Continuous deployment
Monitoring
Budget control
Technologies we Use (partial
list…)
1/10/2018 8© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Why we Chose ECS for Container
Orchestration
• We were new to the containers world,
but we understood container
orchestration is a key decision
• We looked at ECS, Kubernetes,
Swarm and other alternatives.
• ECS seemed best in terms on time to
value
1/10/2018 9© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Our First Microservice
1/10/2018 10© 2018, Perfecto Mobile Ltd. All Rights Reserved.
• Deployed in ECS
• ELB + ECS tasks = ECS
service
• EC2 instances are
managed in an Auto
Scaling Group
• Service Discovery using
Route53
• Task per EC2 instance
(ELB static port
limitation)
Decisions we Took (1)
1/10/2018 11© 2018, Perfecto Mobile Ltd. All Rights Reserved.
• Deploying in a single availability zone
• One of those decisions you regret -
overhead of changing it grows with
time
Decisions We took (2)
• Single VPC for all teams
• Seems natural – it’s network, right?
• Pros
• Less work for teams
• Simpler to move services between teams
• Cons
• Dependency between teams. Who owns the VPC?
• Simpler to take shortcuts (e.g. use VPN to DC)
• Budget control is more difficult - no option for account per team (need
to tag all services)
1/10/2018 12© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Decisions we took (3)
• ECS cluster per… what?
• Options
• One cluster to rule them all
• Cluster per service (group of microservices)
• Cluster per team
• We let our teams decide between the two last options
• No dependencies between teams
• Better budget control
• Reduce blast radius of ECS cluster issues (more on that soon)
1/10/2018 13© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Infrastructure as code is the only way
to go
• We (try to) do everything (except for very small
and initial POCs) with CloudFormation
• Every time you do a change in UI, CLI or API without
CloudFormation – think again
• CloudFormation templates stored in Git
• CloudFormation invoked by Jenkins
• We maintain shared CloudFormation templates
used by all teams to create ECS clusters, services
and more.
1/10/2018 14© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Working with CloudFormation
• There is a learning curve
• Templates can become long and unreadable
• Split to sub-templates
• Consider generating templates
• CloudFormation behavior can be surprising, but
it is consistent
• Practice in product-like environments (dev/staging)
• Using the UI is dangerous
• Automate all CloudFormation invocations
• Read-only access to UI
• Protect your stacks using stack policies
1/10/2018 15© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Moving to ALBs
• ALB – Application Load Balancer, can (should) replace ELB
• Why
• Cost - 1 ALB can replace X ELBs - Less expensive for clusters with
large number of services
• Dynamic port management – Allows deploying multiple services on one
EC2 instance, more flexibility
• ELB is (kind of) legacy – e.g. not supported in Fargate
• Routes requests to backend containers based on request path
rules
• Challenge with ALBs – no URL rewrite in rules. If if you have no
control on the request path in the deployed services, you will
need a reverse proxy.
1/10/2018 16© 2018, Perfecto Mobile Ltd. All Rights Reserved.
ELB vs ALB
1/10/2018 17© 2018, Perfecto Mobile Ltd. All Rights Reserved.
What about logs?
• We’re using CloudWatch logs
• Note perfect, but very simple to integrate with
anything in AWS
• Container logs
• Standard container logs can be sent to
CloudWatch – that is easy, supported natively in
Docker
• To take application log files to CloudWatch – we’re
using a ”satellite container” (AKA sidecar) per task
- https://github.com/moshebs/docker-awslogs
• ELB/ALB access logs:
• Sent to S3, natively supported by ELB/ALB
• CloudWatch event from S3  Lambda that parses
the logs and pushes them to CloudWatch logs
1/10/2018 18© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Logs
1/10/2018 19© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Monitoring with Prometheus
1/10/2018 20© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Dashboards with Grafana
1/10/2018 21© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Monitoring in Perfecto
• Each team owns their own monitoring system
• Deployment
• Maintenance
• Building dashboards
• Getting alerts, usually in Slack
• Deployed using CloudFormation
• All teams use the same templates
• Coniguration using sidecar containers
1/10/2018 22© 2018, Perfecto Mobile Ltd. All Rights Reserved.
What we monitor
• EC2 instance metrics – by deploying Prometheus
node_exporter on the EC2 instances
• Application metrics
• If metrics are shared between the microservice nodes – scrape through
LB
• Otherwise – scrape each microservice tasks (how do you find them?
Next slide…)
• 3rd party (Mongo, RabbitMQ, Redis, e.g.) – standard open-
source exporters
• CloudWatch metrics – using cloudwatch_exporter (but be
careful, pulling metrics from CloudWatch is expensive!)
1/10/2018 23© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Monitoring Architecture
1/10/2018 24© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Scraping ECS Tasks
• The challenge:
• Prometheus needs to know where each task runs, and what port to use
for scraping
• But Prometheus supports filtering EC2 instances by tags only
• ECS decide which task goes where
• The solution – a container that dynamically tags EC2 instances
according to ECS tasks running on them
1/10/2018 25© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Scraping ECS Tasks
1/10/2018 26© 2018, Perfecto Mobile Ltd. All Rights Reserved.
ECS Biggest Challenge
• The integration between ECS
and Auto Scaling Group is not
perfect
• ASG changes ignore ECS tasks
• Let’s look at 2 examples
1/10/2018 27© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Auto Scaling Group Downscale
1/10/2018 28© 2018, Perfecto Mobile Ltd. All Rights Reserved.
VM1 VM2 VM3 VM4 VM5 VM6 VM7
Upgrade of ECS-Optimized AMI
1/10/2018 29© 2018, Perfecto Mobile Ltd. All Rights Reserved.
VM1 VM2 VM3 VM4VM5
Auto Scaling Group
Simple Workaround
• You can control when the EC2 instance sends the “I’m ready”
signal to CloudFormation (in fact you must send it in the
userdata)
• Add a sleep, to allow the ECS task to start
• Helps with the AMI upgrade scenario only
• Upgrades are slower, but a bit safer
• In practice – this really helped us
1/10/2018 30© 2018, Perfecto Mobile Ltd. All Rights Reserved.
Better Solution
• Auto Scaling Group has life cycle hooks
• We can add a hook to prevent VM shutdown until the task in the new VM is ready.
1/10/2018 31© 2018, Perfecto Mobile Ltd. All Rights Reserved.
VM1 VM2 VM3 VM4VM5
Auto Scaling Group
Shutdown
Hook
SNS
ECS
Deregister
VM2
Wait
for task
Complete
lifecycle
action
But the truth is…
• We don’t want to manage VMs at all
• We just want to deploy containers over CPU and memory
• Enter Fargate – serverless containers
• We plan to try it soon, but we’re still missing
• Storage attachment
• Availability outside of us-east-1
1/10/2018 32© 2018, Perfecto Mobile Ltd. All Rights Reserved.
© 2018, Perfecto Mobile Ltd. All Rights Reserved.
moshe_benshoham
mosheb@perfectomobile.com
Thank You!

More Related Content

What's hot

Mumbai Meetup on Pivotal CF Jan 15
Mumbai Meetup on Pivotal CF Jan 15 Mumbai Meetup on Pivotal CF Jan 15
Mumbai Meetup on Pivotal CF Jan 15
Mayur Gandhi
 
CloudStack Meetup - Introduction
CloudStack Meetup - IntroductionCloudStack Meetup - Introduction
CloudStack Meetup - Introduction
Madan Ganesh Velayudham
 
SIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the CloudSIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the Cloud
Finmeccanica
 
Savig cost using application level virtualization
Savig cost using application level virtualizationSavig cost using application level virtualization
Savig cost using application level virtualization
Nati Shalom
 
On command shift 1.0 release
On command shift 1.0 releaseOn command shift 1.0 release
On command shift 1.0 release
Takano Masaru
 
Investing in Cloud Integration at Microsoft IT
Investing in Cloud Integration at Microsoft ITInvesting in Cloud Integration at Microsoft IT
Investing in Cloud Integration at Microsoft IT
BizTalk360
 
Istio Service Mesh
Istio Service MeshIstio Service Mesh
Istio Service Mesh
Lew Tucker
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
Nati Shalom
 
Cloud expo 2015_rags
Cloud expo 2015_ragsCloud expo 2015_rags
Cloud expo 2015_rags
ragss
 
Learning Request Management
Learning Request ManagementLearning Request Management
Learning Request Management
CA | Automic Software
 
Citrix - Open Elastic Platform for the Private Cloud
Citrix -  Open Elastic Platform for the Private CloudCitrix -  Open Elastic Platform for the Private Cloud
Citrix - Open Elastic Platform for the Private Cloud
Nati Shalom
 
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
PT Datacomm Diangraha
 
OpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, OracleOpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, Oracle
Sriram Subramanian
 
Building a Scalable Federated Hybrid Cloud
Building a Scalable Federated Hybrid CloudBuilding a Scalable Federated Hybrid Cloud
Building a Scalable Federated Hybrid Cloud
PLUMgrid
 
Container and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicroContainer and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicro
Jen-Chieh Ko
 
VMware: your path to the cloud
VMware: your path to the cloudVMware: your path to the cloud
VMware: your path to the cloud
VMEngine
 
Introduction to Istio on Kubernetes
Introduction to Istio on KubernetesIntroduction to Istio on Kubernetes
Introduction to Istio on Kubernetes
Jonh Wendell
 
The future of scaling forrester research - GigaSpaces Road Show 2011
The future of scaling forrester research - GigaSpaces Road Show 2011The future of scaling forrester research - GigaSpaces Road Show 2011
The future of scaling forrester research - GigaSpaces Road Show 2011
Nati Shalom
 
Reality Check: Moving From the Transformation Laboratory to Production
Reality Check: Moving From the Transformation Laboratory to ProductionReality Check: Moving From the Transformation Laboratory to Production
Reality Check: Moving From the Transformation Laboratory to Production
DevOps.com
 
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
Khash Nakhostin
 

What's hot (20)

Mumbai Meetup on Pivotal CF Jan 15
Mumbai Meetup on Pivotal CF Jan 15 Mumbai Meetup on Pivotal CF Jan 15
Mumbai Meetup on Pivotal CF Jan 15
 
CloudStack Meetup - Introduction
CloudStack Meetup - IntroductionCloudStack Meetup - Introduction
CloudStack Meetup - Introduction
 
SIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the CloudSIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the Cloud
 
Savig cost using application level virtualization
Savig cost using application level virtualizationSavig cost using application level virtualization
Savig cost using application level virtualization
 
On command shift 1.0 release
On command shift 1.0 releaseOn command shift 1.0 release
On command shift 1.0 release
 
Investing in Cloud Integration at Microsoft IT
Investing in Cloud Integration at Microsoft ITInvesting in Cloud Integration at Microsoft IT
Investing in Cloud Integration at Microsoft IT
 
Istio Service Mesh
Istio Service MeshIstio Service Mesh
Istio Service Mesh
 
Cloudify and terraform integration
Cloudify and terraform integrationCloudify and terraform integration
Cloudify and terraform integration
 
Cloud expo 2015_rags
Cloud expo 2015_ragsCloud expo 2015_rags
Cloud expo 2015_rags
 
Learning Request Management
Learning Request ManagementLearning Request Management
Learning Request Management
 
Citrix - Open Elastic Platform for the Private Cloud
Citrix -  Open Elastic Platform for the Private CloudCitrix -  Open Elastic Platform for the Private Cloud
Citrix - Open Elastic Platform for the Private Cloud
 
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
Converting Your Existing SAP Server Infrastructure to a Modern Cloud-Based Ar...
 
OpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, OracleOpenStack Telco Cloud Challenges, David Fick, Oracle
OpenStack Telco Cloud Challenges, David Fick, Oracle
 
Building a Scalable Federated Hybrid Cloud
Building a Scalable Federated Hybrid CloudBuilding a Scalable Federated Hybrid Cloud
Building a Scalable Federated Hybrid Cloud
 
Container and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicroContainer and Test Automation Management Practices in TrendMicro
Container and Test Automation Management Practices in TrendMicro
 
VMware: your path to the cloud
VMware: your path to the cloudVMware: your path to the cloud
VMware: your path to the cloud
 
Introduction to Istio on Kubernetes
Introduction to Istio on KubernetesIntroduction to Istio on Kubernetes
Introduction to Istio on Kubernetes
 
The future of scaling forrester research - GigaSpaces Road Show 2011
The future of scaling forrester research - GigaSpaces Road Show 2011The future of scaling forrester research - GigaSpaces Road Show 2011
The future of scaling forrester research - GigaSpaces Road Show 2011
 
Reality Check: Moving From the Transformation Laboratory to Production
Reality Check: Moving From the Transformation Laboratory to ProductionReality Check: Moving From the Transformation Laboratory to Production
Reality Check: Moving From the Transformation Laboratory to Production
 
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
Security Requirements and Tradeoffs for Controlling VPC-to-Internet Egress Tr...
 

Similar to Container-based Microservices DevOps in AWS

The rise of microservices
The rise of microservicesThe rise of microservices
The rise of microservices
Cloud Technology Experts
 
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Weaveworks
 
Corporate overview 2017
Corporate overview 2017Corporate overview 2017
Corporate overview 2017
Jon Pyke FBCS CITP
 
Storage as a service v4 eng
Storage as a service v4 engStorage as a service v4 eng
Storage as a service v4 eng
Dell EMC
 
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
Weaveworks
 
OPNFV EMC - Benefiting from IT & Net Ops Convergence
OPNFV EMC - Benefiting from IT & Net Ops ConvergenceOPNFV EMC - Benefiting from IT & Net Ops Convergence
OPNFV EMC - Benefiting from IT & Net Ops Convergence
Paul To
 
Using containerization to enable your microservice architecture
Using containerization to enable your microservice architecture Using containerization to enable your microservice architecture
Using containerization to enable your microservice architecture
Apigee | Google Cloud
 
Migrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesMigrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetes
Konveyor Community
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
iguazio
 
Matt Wright - The Application Grid
Matt Wright - The Application GridMatt Wright - The Application Grid
Matt Wright - The Application Grid
Saul Cunningham
 
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with IstioNavigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Gary Arora
 
Consul connect
Consul connectConsul connect
Consul connect
jabizz
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd
Kim Kao
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDD
Amazon Web Services
 
Consul connect
Consul connectConsul connect
Consul connect
momenton_slides
 
AWS Pune Meetup - Microservices
AWS Pune Meetup - MicroservicesAWS Pune Meetup - Microservices
AWS Pune Meetup - Microservices
Arif Amirani
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
DevOps.com
 
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
DevOps.com
 
Cedar Day 2018 - Cloud IaaS - Ken MacMahon
Cedar Day 2018 - Cloud IaaS - Ken MacMahonCedar Day 2018 - Cloud IaaS - Ken MacMahon
Cedar Day 2018 - Cloud IaaS - Ken MacMahon
Cedar Consulting
 
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPREMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
ServiceMesh
 

Similar to Container-based Microservices DevOps in AWS (20)

The rise of microservices
The rise of microservicesThe rise of microservices
The rise of microservices
 
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
Overcoming Regulatory & Compliance Hurdles with Hybrid Cloud EKS and Weave Gi...
 
Corporate overview 2017
Corporate overview 2017Corporate overview 2017
Corporate overview 2017
 
Storage as a service v4 eng
Storage as a service v4 engStorage as a service v4 eng
Storage as a service v4 eng
 
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
Achieve Data & Operational Sovereignty: Managing Hybrid & Edge EKS Deployment...
 
OPNFV EMC - Benefiting from IT & Net Ops Convergence
OPNFV EMC - Benefiting from IT & Net Ops ConvergenceOPNFV EMC - Benefiting from IT & Net Ops Convergence
OPNFV EMC - Benefiting from IT & Net Ops Convergence
 
Using containerization to enable your microservice architecture
Using containerization to enable your microservice architecture Using containerization to enable your microservice architecture
Using containerization to enable your microservice architecture
 
Migrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetesMigrating from oracle soa suite to microservices on kubernetes
Migrating from oracle soa suite to microservices on kubernetes
 
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
Building the Serverless Container Experience: Kevin McGrath, Spotinst, Server...
 
Matt Wright - The Application Grid
Matt Wright - The Application GridMatt Wright - The Application Grid
Matt Wright - The Application Grid
 
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with IstioNavigating a Mesh of Microservices in the new Cloud-Native World with Istio
Navigating a Mesh of Microservices in the new Cloud-Native World with Istio
 
Consul connect
Consul connectConsul connect
Consul connect
 
2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd2019 03-13-implementing microservices by ddd
2019 03-13-implementing microservices by ddd
 
Implementing Microservices by DDD
Implementing Microservices by DDDImplementing Microservices by DDD
Implementing Microservices by DDD
 
Consul connect
Consul connectConsul connect
Consul connect
 
AWS Pune Meetup - Microservices
AWS Pune Meetup - MicroservicesAWS Pune Meetup - Microservices
AWS Pune Meetup - Microservices
 
So you want to provision a test environment...
So you want to provision a test environment...So you want to provision a test environment...
So you want to provision a test environment...
 
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
3 Reasons to Select Time Series Platforms for Cloud Native Applications Monit...
 
Cedar Day 2018 - Cloud IaaS - Ken MacMahon
Cedar Day 2018 - Cloud IaaS - Ken MacMahonCedar Day 2018 - Cloud IaaS - Ken MacMahon
Cedar Day 2018 - Cloud IaaS - Ken MacMahon
 
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPREMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
EMCWorld 2013 -ServiceMesh Agility Platform: Cloud-based DevOps with ViPR
 

Recently uploaded

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 

Container-based Microservices DevOps in AWS

  • 1. Container-based Microservices DevOps in AWS How Perfecto Did it and What We Learned So Far © 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 2. About Perfecto 1/10/2018 2© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 3. How We Started We started 11 years ago with developing monolith servers in our own DCs We were moving slowly… 1/10/2018 3© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 4. But Then we Heard Some Buzzwords 1/10/2018 4© 2018, Perfecto Mobile Ltd. All Rights Reserved. And we decided we want to move faster and do more impact on the company business
  • 5. Big Change 1/10/2018 5© 2018, Perfecto Mobile Ltd. All Rights Reserved. Waterfall Monolith servers Deployment in DC Dependencies Agile Microservices Cloud Autonomous teams
  • 6. The 3 Components of the Change technology methodologyculture 1/10/2018 6© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 7. Autonomous Teams DevOps Dev QA 1/10/2018 7© 2018, Perfecto Mobile Ltd. All Rights Reserved. Development Continuous integration Continuous deployment Monitoring Budget control
  • 8. Technologies we Use (partial list…) 1/10/2018 8© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 9. Why we Chose ECS for Container Orchestration • We were new to the containers world, but we understood container orchestration is a key decision • We looked at ECS, Kubernetes, Swarm and other alternatives. • ECS seemed best in terms on time to value 1/10/2018 9© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 10. Our First Microservice 1/10/2018 10© 2018, Perfecto Mobile Ltd. All Rights Reserved. • Deployed in ECS • ELB + ECS tasks = ECS service • EC2 instances are managed in an Auto Scaling Group • Service Discovery using Route53 • Task per EC2 instance (ELB static port limitation)
  • 11. Decisions we Took (1) 1/10/2018 11© 2018, Perfecto Mobile Ltd. All Rights Reserved. • Deploying in a single availability zone • One of those decisions you regret - overhead of changing it grows with time
  • 12. Decisions We took (2) • Single VPC for all teams • Seems natural – it’s network, right? • Pros • Less work for teams • Simpler to move services between teams • Cons • Dependency between teams. Who owns the VPC? • Simpler to take shortcuts (e.g. use VPN to DC) • Budget control is more difficult - no option for account per team (need to tag all services) 1/10/2018 12© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 13. Decisions we took (3) • ECS cluster per… what? • Options • One cluster to rule them all • Cluster per service (group of microservices) • Cluster per team • We let our teams decide between the two last options • No dependencies between teams • Better budget control • Reduce blast radius of ECS cluster issues (more on that soon) 1/10/2018 13© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 14. Infrastructure as code is the only way to go • We (try to) do everything (except for very small and initial POCs) with CloudFormation • Every time you do a change in UI, CLI or API without CloudFormation – think again • CloudFormation templates stored in Git • CloudFormation invoked by Jenkins • We maintain shared CloudFormation templates used by all teams to create ECS clusters, services and more. 1/10/2018 14© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 15. Working with CloudFormation • There is a learning curve • Templates can become long and unreadable • Split to sub-templates • Consider generating templates • CloudFormation behavior can be surprising, but it is consistent • Practice in product-like environments (dev/staging) • Using the UI is dangerous • Automate all CloudFormation invocations • Read-only access to UI • Protect your stacks using stack policies 1/10/2018 15© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 16. Moving to ALBs • ALB – Application Load Balancer, can (should) replace ELB • Why • Cost - 1 ALB can replace X ELBs - Less expensive for clusters with large number of services • Dynamic port management – Allows deploying multiple services on one EC2 instance, more flexibility • ELB is (kind of) legacy – e.g. not supported in Fargate • Routes requests to backend containers based on request path rules • Challenge with ALBs – no URL rewrite in rules. If if you have no control on the request path in the deployed services, you will need a reverse proxy. 1/10/2018 16© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 17. ELB vs ALB 1/10/2018 17© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 18. What about logs? • We’re using CloudWatch logs • Note perfect, but very simple to integrate with anything in AWS • Container logs • Standard container logs can be sent to CloudWatch – that is easy, supported natively in Docker • To take application log files to CloudWatch – we’re using a ”satellite container” (AKA sidecar) per task - https://github.com/moshebs/docker-awslogs • ELB/ALB access logs: • Sent to S3, natively supported by ELB/ALB • CloudWatch event from S3  Lambda that parses the logs and pushes them to CloudWatch logs 1/10/2018 18© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 19. Logs 1/10/2018 19© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 20. Monitoring with Prometheus 1/10/2018 20© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 21. Dashboards with Grafana 1/10/2018 21© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 22. Monitoring in Perfecto • Each team owns their own monitoring system • Deployment • Maintenance • Building dashboards • Getting alerts, usually in Slack • Deployed using CloudFormation • All teams use the same templates • Coniguration using sidecar containers 1/10/2018 22© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 23. What we monitor • EC2 instance metrics – by deploying Prometheus node_exporter on the EC2 instances • Application metrics • If metrics are shared between the microservice nodes – scrape through LB • Otherwise – scrape each microservice tasks (how do you find them? Next slide…) • 3rd party (Mongo, RabbitMQ, Redis, e.g.) – standard open- source exporters • CloudWatch metrics – using cloudwatch_exporter (but be careful, pulling metrics from CloudWatch is expensive!) 1/10/2018 23© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 24. Monitoring Architecture 1/10/2018 24© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 25. Scraping ECS Tasks • The challenge: • Prometheus needs to know where each task runs, and what port to use for scraping • But Prometheus supports filtering EC2 instances by tags only • ECS decide which task goes where • The solution – a container that dynamically tags EC2 instances according to ECS tasks running on them 1/10/2018 25© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 26. Scraping ECS Tasks 1/10/2018 26© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 27. ECS Biggest Challenge • The integration between ECS and Auto Scaling Group is not perfect • ASG changes ignore ECS tasks • Let’s look at 2 examples 1/10/2018 27© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 28. Auto Scaling Group Downscale 1/10/2018 28© 2018, Perfecto Mobile Ltd. All Rights Reserved. VM1 VM2 VM3 VM4 VM5 VM6 VM7
  • 29. Upgrade of ECS-Optimized AMI 1/10/2018 29© 2018, Perfecto Mobile Ltd. All Rights Reserved. VM1 VM2 VM3 VM4VM5 Auto Scaling Group
  • 30. Simple Workaround • You can control when the EC2 instance sends the “I’m ready” signal to CloudFormation (in fact you must send it in the userdata) • Add a sleep, to allow the ECS task to start • Helps with the AMI upgrade scenario only • Upgrades are slower, but a bit safer • In practice – this really helped us 1/10/2018 30© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 31. Better Solution • Auto Scaling Group has life cycle hooks • We can add a hook to prevent VM shutdown until the task in the new VM is ready. 1/10/2018 31© 2018, Perfecto Mobile Ltd. All Rights Reserved. VM1 VM2 VM3 VM4VM5 Auto Scaling Group Shutdown Hook SNS ECS Deregister VM2 Wait for task Complete lifecycle action
  • 32. But the truth is… • We don’t want to manage VMs at all • We just want to deploy containers over CPU and memory • Enter Fargate – serverless containers • We plan to try it soon, but we’re still missing • Storage attachment • Availability outside of us-east-1 1/10/2018 32© 2018, Perfecto Mobile Ltd. All Rights Reserved.
  • 33. © 2018, Perfecto Mobile Ltd. All Rights Reserved. moshe_benshoham mosheb@perfectomobile.com Thank You!

Editor's Notes

  1. One VPC for all Here we will show a diagram of a set of microservices Deployed in ECS ECS cluster running on top of ASG (we started in a single AZ – don’t do that!) Using ELB (one task per EC2 instance) Service discovery using DNS in Route53 Network level One VPC for all Each cluster in separate subnet Using security groups to control access between machines Expose service to the Internet – using CheckPoint