Come to this talk to dive deep on running containers at any scale. Learn first hand best practices for deploying microservice architectures to Amazon EC2 Container Service (ECS), as well as everything you need to build a continuous delivery pipeline for your containers.
AWS Speaker: Paul Maddox, Specialist Solutions Architect, DevOps & Developer Technologies - Amazon Web Services
Customer Speaker: Cobus Bernard - DevOps Team Lead, HealthQ
2. What to Expect from the Session
• Microservices: What, why?
• Docker / Amazon EC2 Container Service deep dive
• HealthQ: Hands-on Microservices Learnings
4. What are microservices?
“A software architecture style in which complex
applications are composed of small, independent
processes communicating with each other using
language-agnostic APIs. These services are small, highly
decoupled and focus on doing a small task, facilitating a
modular approach to system-building.” - Wikipedia
https://en.wikipedia.org/wiki/Microservices
9. Order UI User UI
Shipping
UI
Order
Service
User
Service
Shipping
Service
Microservices Architecture
10. Order UI User UI UI
Order
Service
Service
Shipping
Service
Order UI
Order UI
User UI UIShipping
UI
Order
ServiceOrder
Service
Service
Service
Service
Service
User
Service
Shipping
Service
Microservices Architecture – Scaling
14. Microservice Challenge #1 – Resource Management
Managing a large fleet by hand is impossible:
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
Server
Guest OS
AZ 1 AZ 2
AZ 3
15. Microservices Challenge #2 – Monitoring
A microservices architecture will have 10s, 100s, 1000s,
maybe even 10,000s of individual services:
• How do you know if an individual service is healthy?
• How do you measure the performance of an individual
service?
• How do you troubleshoot and debug an individual
service?
16. Microservices Challenge #3: Service Discovery
Each microservice scales up and down independently of
one another:
• How does Service A know the URLs for all instances of
Service B?
• How do you allow services to scale independently while
still using load balancers?
• How does a new instance of a service announce itself to
other services?
17. Microservices Challenge #4: Conf Management
Each microservice should be made up of one or more
immutable containers, that are consistent between
environments (staging, prod etc):
• How do I configure a container at runtime?
• How do I manage passwords / API keys / secrets?
18. Microservices Challenge #5: Deployment
A microservices architecture will have 10s, 100s, 1000s,
maybe even 10,000s of individual services:
• Each service will be developed, tested, and deployed on
its own timeline – How do you manage this across large
numbers of services?
• Services are polyglot – different languages, frameworks
– how do you efficiently deploy them?
• How do you decide which hosts to deploy a service on?
20. Introducing Amazon ECS
• Fully managed elastic service – You don’t need
to run anything, and the service scales as your
microservices architecture grows
• Shared state optimistic scheduling
• Fully ACID compliant resource and state
management
• Integration with CloudWatch service for
monitoring and logging
• Integration with Code* services for continuous
integration and delivery (CI/CD)
26. Automatic Service Scaling
Publish metrics
Auto Scaling ECS service
Availability
Zone A
Availability
Zone B
TASK A
Add/Remove ECS
tasks
TASK C
TASK BScaling Policies
Amazon
CloudWatch
Amazon ECS
Application
Load Balancer
27. Resource
Management
Anatomy of Task Placement
Cluster Constraints
Custom Constraints
Placement Strategies
Apply Filter
Satisfy CPU, memory, and port requirements
Filter for location, instance-type, AMI, or custom
attribute constraints
Identify instances that meet spread or binpack
placement strategy
Select final container instances for placement
28. Resource
Management
Placement Constraints
Name Example
AMI ID attribute:ecs.ami-id == ami-eca289fb
Availability Zone attribute:ecs.availability-zone == us-east-1a
Instance Type attribute:ecs.instance-type == t2.small
Distinct
Instances
type=“distinctInstances”
Custom attribute:stack == prod
43. Deployment – In Place – Doubling
Availability Zone Availability Zone
Scenario
Service’s task definition is
updated to a new revision with
parameters:
Desired Count = 2
Minimum Healthy Percent = 100%
Maximum Percent = 200%
These settings permit the service
to grow to double its desired size
during deployment
EXISTING EXISTING
44. Deployment – In Place – Doubling
Availability Zone Availability Zone
Two new tasks are started
growing the number of tasks to
200% of its desired count which is
the maximum permitted
EXISTING EXISTINGNEW NEW
Desired Count = 2
Minimum Healthy Percent = 100%
Maximum Percent = 200%
45. Deployment – In Place – Doubling
Availability Zone Availability Zone
After the new tasks are verified to
be healthy by the Elastic Load
Balancer health check, the two
previous tasks with the older task
definition are drained and stopped
NEW NEW
Desired Count = 2
Minimum Healthy Percent = 100%
Maximum Percent = 200%
46. Deployment – In Place – Rolling
Availability Zone Availability Zone
Scenario
Service’s task definition is
updated to a new revision with
parameters:
Desired Count = 2
Minimum Healthy Percent = 50%
Maximum Percent = 100%
These settings constrain the
service to not exceed its desired
size but allows it to halve the
number of tasks during
deployment
EXISTING EXISTING
47. Deployment – In Place – Rolling
Availability Zone Availability Zone
First, an existing task is stopped
which brings the healthy
percentage of the service to 50%
and makes room on the cluster for
new tasks
EXISTING
Desired Count = 2
Minimum Healthy Percent = 50%
Maximum Percent = 100%
48. Deployment – In Place – Rolling
Availability Zone Availability Zone
A task using the new task
definition is started bringing the
service back to 100%
EXISTING
Desired Count = 2
Minimum Healthy Percent = 50%
Maximum Percent = 100%
NEW
49. Deployment – In Place – Rolling
Availability Zone Availability Zone
After the new task is verified to be
healthy by the Elastic Load
Balancer health check, the next
existing task with the older task
definition is drained and stopped
Desired Count = 2
Minimum Healthy Percent = 50%
Maximum Percent = 100%
NEW
50. Deployment – In Place – Rolling
Availability Zone Availability Zone
The second new task is started on
the cluster bringing the service
back to 100%
NEW NEW
Desired Count = 2
Minimum Healthy Percent = 50%
Maximum Percent = 100%
52. Deployment – Blue Green (DNS or Target Group)
Availability Zone
EXISTING EXISTING
next.myproduct.com
Availability Zone
NEW NEW
www.myproduct.com
53. Best Practices
• Use Elastic Load Balancing health checks to
prevent botched deploys
• For higher confidence, integrate automated
testing against a new environment or
monitoring of a canary before cutover
• Ensure your application can function against
the same backend schema for adjacent
releases
59. Content
1. Who we are
2. History
3. Monolith Breakup
4. Automation
5. Automation Wins
6. Service Discovery
7. Logging
8. Docker
60. Who we are
HealthQ Technology: The technical partner for LifeQ
LifeQ: A platform consisting of hosted internet services that
enables the bio-mathematical algorithms and Virtual
Human Model (VHM) to receive data from integrated data
sources and makes the analytics available to other parties
in the ecosystem.
61. History
• Single Repo, 5 deployable services
• Versioned together, deployed together
• Scala clustering
• "Microservices" - A monolith in 5 parts
• Slow deployments
62. Breakup
• Split out 1 service into own repo
• New instances, ASG, IAM, CodeDeploy, etc
• Infra created by hand
• Took 2 ~ 3 days for development
• Another day for staging
63. Automation
• Microservices: moving complexity to Infrastructure
• Terraform'ing & Chef'ing all AWS resources
• Destroying hand-crafted snowflakes
• Spinning up with Terraform in new VPC
• Low cost to test old and new side-by-side
64. Automation wins
• Still splitting out services
• New services added: 5
• 30mins to create new infra (with reviews)
• Environments identical, only size & quantities differ
• Dev -> Staging: under 5mins
65. Service discovery
• Host-based routing on internal ALB
• ASG registers instances automatically
• Route53 created by Terraform
• New services instantly available
• Simplifies configs "rabbitmq.core.healthq.internal"
66. Logging
• Socket connection directly to Logstash (LS)
• Elasticsearch backpressure, Logstash restarts
• Services restart when Logstash isn't up
• Current: Filebeats, ELK
• Future: Docker, CloudWatch, ELK
67. The Future: Docker
Better hardware density, lower costs
Simpler deployments
Simpler server configurations
Faster deployments
Spin up entire system locally (~39 containers)
Editor's Notes
ta
Monolith – Single Unit, tightly coupled, hard to change, slow
Microservice – atomic unit, do ne thing well. Clear interfaces, easy to change, rapid iteration.
A task is an instantiation of a task definition.
You can have a task with just 1 container…or up to 10 that work together on a single machine. Maybe nginx in front of rails, or redis behind rails.
Run tasks on container instances, as many as will fit.
Often people wonder about cross host links, those don’t go in your task, put them behind an ELB, or a discovery system and make multiple tasks.
ECS has two APIs for scheduling. Run task looks at the first 100 instances in a cluster, and randomly places tasks in a spot that’ll fit. It is good for short-lived containers like batch jobs.
The second scheduler is the service scheduler. This is good for long-running applications. You reference a task definition and count and optionally an ELB
Port Binding, Concurrency, Disposability
Dev/Prod parity, Logs, Admin Processes
Mention expiring the logs
Codebase, Dependencies, Config
Port Binding, Concurrency, Disposability
Introduce Event Stream
This function only looks at tasks moving into the running state, but could be written to handle start and stop as well.
CWE to Labmda and Lambda function updates R53.
CloudWatch rule to filter on Task Running state changes
Lambda function to filter for Task Start / Stop events and update R53
R53 store active private IP for containers