SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STEVEN BRYEN | AWS TECHNICAL & DEVELOPER EVANGELISM | @steven_bryen
sbryen@amazon.com
LONDON – MARCH 2019
Resiliency & Availability Design Patterns
for the Cloud
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
RELIABILITY AND RESILIENCY
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
99.99%
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Everything fails all the
time.
Werner Vogels
CTO – Amazon.com
“ “
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reliability:
handles the “known, unknowns”
Resiliency:
handles the “unknown, unknowns”
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How do we build resilient software
systems?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
…
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Quality is not an act, it is a habit”
Aristotle, some time around
350BC
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Distributed Systems
are hard
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Complex systems
Amazon Twitter Netflix
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about Multi-AZ for
Maximum availability
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Better to react without reacting
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
But Why?
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
3 AZ’s is better than 2
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
Requires 8 Instances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
Requires 6 Instances
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
https://aws.amazon.com/wellarchitected
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about auto scaling
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Auto-Scaling
FixedVariable
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Availability zone 1
Auto Scaling group
AWS Region
Availability zone 2
Auto-scaling for self-healing
• Set min > 0
X
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about decoupling and async
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Process A Process B Process A Process B
Synchronous Asynchronous
Waiting
Working
Continues
get or fetch resultGet result
Pattern 5: Decoupling with async pattern
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
API: {DO foo}
PUT JOB: {JobID: 0001, Task: DO foo}
API: {JobID: 0001}
GET JOB: {JobID: 0001, Task: DO foo}
{JobID: 0001, Result: bar}
Cache node
Worker
Instance
Worker
Instance
Queue/Streaming
API
Instance
API
Instance
API
Instance
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Push Notification
User
Worker
Instance
Worker
Instance
API
Instance
API
Instance
Cache node
Fetch results
API
Instance
Queue/Streaming
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Degrade & prioritize traffic
with queues
Worker
Instance
Worker
Instance
API
Instance
API
Instance
API
Instance
HighPriorityQueue
LowPriorityQueue
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about timeouts, backoff &
retries!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Users
App
DB
Conn
Pool
INSERT
INSERT
INSERT
INSERT
What happens if the DB “slows down”?
Timeout client side Timeout backend side ??
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
User 1
App
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = Not implemented
Retry INSERT
Retry INSERT
ERROR: Failed to get connection from pool
Retry
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.http://docs.python-requests.org/en/master/user/advanced/#timeouts
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
http://docs.python-requests.org/en/master/user/advanced/#timeouts
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
How else could we have prevented the error?
User 1
DB
Conn
Pool
INSERT
Retry INSERT
Retry INSERT
Retry
ERROR: Failed to get connection from pool
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Wait 16s before Retry
Wait 8s before Retry
Wait 4s before Retry
Wait 2s before Retry
User 1
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = 10s
INSERT
INSERT
Exponential Backoff?
Releasing connectionsBackoff
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No jitter With jitter
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
Simple Exponential Backoff is not enough: Add Jitter
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Example: add jitter 0-1000ms
def get_item(self, url, n=1):
MAX_TRIES = 12
try:
res = requests.get(url)
except:
if n > MAX_TRIES:
return None
n += 1
time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0))
return self.get_item(url, n)
else:
return res
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about databases.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Database Federation
Users
DB
Product
s DB
Instance InstanceInstance
DB Instance
DB instance
read replica
DB Instance
DB instance
read replica
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Database Sharding User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
CBA
Instance InstanceInstance
DB Instance
DB instance
read replica
DB Instance
DB instance
read replica
DB Instance
DB instance
read replica
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Read / Write separation
DB Instance DB instance
read replica
DB instance
read replica
DB instance
read replica
Instance InstanceInstance
Supports degradation through Read-Only mode
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about shuffle sharding.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Cascading Failures
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Measure for this: blast radius
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
1 12 23 34 4 556 677 88
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No of Nodes: 8
Shard Size: 2
Blast Radius
Overlap % of Customers Impacted
0 53.6%
1 42.8%
2 3.6%
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
No of Nodes: 100
Shard Size: 5
Blast Radius
Overlap % of Customers Impacted
0 77%
1 21%
2 1.8%
3 0.06%
4 0.0006%
5 0.0000013%
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Needs a client that retries or is fault tolerant
Works for servers, queues or other resources
Needs a routing mechanism such as per customer
DNS Names
Shuffle Sharding
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Let’s talk about chaos!
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
GameDay at Amazon
Creating Resiliency Through Destruction
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Chaos engineering
https://github.com/Netflix/SimianArmy
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
“Chaos Engineering is the discipline of
experimenting on a distributed system
in order to build confidence in the system’s
capability to withstand turbulent conditions
in production.”
http://principlesofchaos.org
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Failure injection
Start small & build confidence
• Application level
• Host failure
• Resource attacks (CPU, memory, …)
• Network attacks (dependencies, latency, …)
• Region attacks
• “Paul” attack
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@adhorn
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
@adhorn
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Plan for the worst, prepare for the
unexpected.
© 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
STEVEN BRYEN | AWS TECHNICAL & DEVELOPER EVANGELISM | @steven_bryen
sbryen@amazon.com
LONDON – MARCH 2019
Thank You!

More Related Content

What's hot

Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge
James Beswick
 
AWS Workshop Series: Microsoft licensing and active directory on AWS
AWS Workshop Series: Microsoft licensing and active directory on AWSAWS Workshop Series: Microsoft licensing and active directory on AWS
AWS Workshop Series: Microsoft licensing and active directory on AWS
Amazon Web Services
 
Design patterns for microservice architecture
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architecture
The Software House
 
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Amazon Web Services
 
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019 Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Amazon Web Services
 
Aligning to the NIST Cybersecurity Framework in the AWS
Aligning to the NIST Cybersecurity Framework in the AWSAligning to the NIST Cybersecurity Framework in the AWS
Aligning to the NIST Cybersecurity Framework in the AWS
Amazon Web Services
 
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
Amazon Web Services
 
Guide to an API-first Strategy
Guide to an API-first StrategyGuide to an API-first Strategy
Guide to an API-first Strategy
Kellton Tech Solutions Ltd
 
ServiceNow Overview
ServiceNow OverviewServiceNow Overview
ServiceNow Overview
Jeremy Smith
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
confluent
 
Become an AWS IAM Policy Ninja
Become an AWS IAM Policy NinjaBecome an AWS IAM Policy Ninja
Become an AWS IAM Policy Ninja
Amazon Web Services
 
Aws EC2 ENI, ENA, EFA
Aws EC2 ENI, ENA, EFAAws EC2 ENI, ENA, EFA
Aws EC2 ENI, ENA, EFA
Aléx Carvalho
 
MuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft AutomationMuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft Automation
Jitendra Bafna
 
(SEC315) AWS Directory Service Deep Dive
(SEC315) AWS Directory Service Deep Dive (SEC315) AWS Directory Service Deep Dive
(SEC315) AWS Directory Service Deep Dive
Amazon Web Services
 
A Capability Blueprint for Microservices
A Capability Blueprint for MicroservicesA Capability Blueprint for Microservices
A Capability Blueprint for Microservices
Matt McLarty
 
Best Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWSBest Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWS
Amazon Web Services
 
Simplify & Standardise Your Migration to AWS with a Migration Landing Zone
Simplify & Standardise Your Migration to AWS with a Migration Landing ZoneSimplify & Standardise Your Migration to AWS with a Migration Landing Zone
Simplify & Standardise Your Migration to AWS with a Migration Landing Zone
Amazon Web Services
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
Andrew Schofield
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
Amazon Web Services
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
Amazon Web Services
 

What's hot (20)

Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge Building Event-driven Architectures with Amazon EventBridge
Building Event-driven Architectures with Amazon EventBridge
 
AWS Workshop Series: Microsoft licensing and active directory on AWS
AWS Workshop Series: Microsoft licensing and active directory on AWSAWS Workshop Series: Microsoft licensing and active directory on AWS
AWS Workshop Series: Microsoft licensing and active directory on AWS
 
Design patterns for microservice architecture
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architecture
 
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
Running a High-Performance Kubernetes Cluster with Amazon EKS (CON318-R1) - A...
 
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019 Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
Security best practices the well-architected way - SDD318 - AWS re:Inforce 2019
 
Aligning to the NIST Cybersecurity Framework in the AWS
Aligning to the NIST Cybersecurity Framework in the AWSAligning to the NIST Cybersecurity Framework in the AWS
Aligning to the NIST Cybersecurity Framework in the AWS
 
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
Moving Large Scale Contact Centers to Amazon Connect (BAP324) - AWS re:Invent...
 
Guide to an API-first Strategy
Guide to an API-first StrategyGuide to an API-first Strategy
Guide to an API-first Strategy
 
ServiceNow Overview
ServiceNow OverviewServiceNow Overview
ServiceNow Overview
 
Microservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka EcosystemMicroservices in the Apache Kafka Ecosystem
Microservices in the Apache Kafka Ecosystem
 
Become an AWS IAM Policy Ninja
Become an AWS IAM Policy NinjaBecome an AWS IAM Policy Ninja
Become an AWS IAM Policy Ninja
 
Aws EC2 ENI, ENA, EFA
Aws EC2 ENI, ENA, EFAAws EC2 ENI, ENA, EFA
Aws EC2 ENI, ENA, EFA
 
MuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft AutomationMuleSoft Surat Meetup#54 - MuleSoft Automation
MuleSoft Surat Meetup#54 - MuleSoft Automation
 
(SEC315) AWS Directory Service Deep Dive
(SEC315) AWS Directory Service Deep Dive (SEC315) AWS Directory Service Deep Dive
(SEC315) AWS Directory Service Deep Dive
 
A Capability Blueprint for Microservices
A Capability Blueprint for MicroservicesA Capability Blueprint for Microservices
A Capability Blueprint for Microservices
 
Best Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWSBest Practices for Hosting Web Applications on AWS
Best Practices for Hosting Web Applications on AWS
 
Simplify & Standardise Your Migration to AWS with a Migration Landing Zone
Simplify & Standardise Your Migration to AWS with a Migration Landing ZoneSimplify & Standardise Your Migration to AWS with a Migration Landing Zone
Simplify & Standardise Your Migration to AWS with a Migration Landing Zone
 
Event-driven microservices
Event-driven microservicesEvent-driven microservices
Event-driven microservices
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
AWS Landing Zone Deep Dive (ENT350-R2) - AWS re:Invent 2018
 

Similar to Resiliency and Availability Design Patterns for the Cloud

Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
Amazon Web Services
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
Tim Wagner
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
Amazon Web Services
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Amazon Web Services
 
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
Amazon Web Services
 
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Amazon Web Services
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M users
Amazon Web Services
 
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
Amazon Web Services
 
Serverless best practices plus design principles 20m version
Serverless   best practices plus design principles 20m versionServerless   best practices plus design principles 20m version
Serverless best practices plus design principles 20m version
Heitor Lessa
 
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Web Services Korea
 
Making Headless Drupal Serverless
Making Headless Drupal ServerlessMaking Headless Drupal Serverless
Making Headless Drupal Serverless
Amazon Web Services
 
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
Amazon Web Services
 
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS SummitAWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
Amazon Web Services
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat Way
Amazon Web Services
 
Jets: A Ruby Serverless Framework
Jets: A Ruby Serverless FrameworkJets: A Ruby Serverless Framework
Jets: A Ruby Serverless Framework
Tung Nguyen
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
Adrian Hornsby
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Amazon Web Services
 
Using Mobile to Engage Your Audience
Using Mobile to Engage Your AudienceUsing Mobile to Engage Your Audience
Using Mobile to Engage Your Audience
Amazon Web Services
 
Future of Enterprise IT
Future of Enterprise ITFuture of Enterprise IT
Future of Enterprise IT
Amazon Web Services
 
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
Amazon Web Services
 

Similar to Resiliency and Availability Design Patterns for the Cloud (20)

Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
Automating Compliance on AWS (HLC302-S-i) - AWS re:Invent 2018
 
ServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless MythsServerlessConf 2018 Keynote - Debunking Serverless Myths
ServerlessConf 2018 Keynote - Debunking Serverless Myths
 
Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
Breaking Containers: Chaos Engineering for Modern Applications on AWS (CON310...
 
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
A Self-Defending Border: Protect Your Web-Facing Workloads with AWS Security ...
 
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
Automate & Audit Cloud Governance & Compliance in Your Landing Zone (ENT315-R...
 
Scaling up to and beyond 10M users
Scaling up to and beyond 10M usersScaling up to and beyond 10M users
Scaling up to and beyond 10M users
 
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
Monitoring Serverless Applications (SRV303-S) - AWS re:Invent 2018
 
Serverless best practices plus design principles 20m version
Serverless   best practices plus design principles 20m versionServerless   best practices plus design principles 20m version
Serverless best practices plus design principles 20m version
 
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
Amazon Polly와 Cloud9을 활용한 서버리스 웹 애플리케이션 및 CI/CD 배포 프로세스 구축 (김현수, AWS 솔루션즈 아키텍...
 
Making Headless Drupal Serverless
Making Headless Drupal ServerlessMaking Headless Drupal Serverless
Making Headless Drupal Serverless
 
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
[NEW LAUNCH!] Introduction to AWS Global Accelerator (NET330) - AWS re:Invent...
 
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS SummitAWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
AWS Systems Manager: Bridging Operational Models - SRV212 - Chicago AWS Summit
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat Way
 
Jets: A Ruby Serverless Framework
Jets: A Ruby Serverless FrameworkJets: A Ruby Serverless Framework
Jets: A Ruby Serverless Framework
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
Autonomous DevSecOps: Five Steps to a Self-Driving Cloud (ENT214-S) - AWS re:...
 
Using Mobile to Engage Your Audience
Using Mobile to Engage Your AudienceUsing Mobile to Engage Your Audience
Using Mobile to Engage Your Audience
 
Future of Enterprise IT
Future of Enterprise ITFuture of Enterprise IT
Future of Enterprise IT
 
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
AWS 主題演講:聚焦企業工作負載 (enterprise workloads) 與全球案例分享
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Resiliency and Availability Design Patterns for the Cloud

  • 1. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STEVEN BRYEN | AWS TECHNICAL & DEVELOPER EVANGELISM | @steven_bryen sbryen@amazon.com LONDON – MARCH 2019 Resiliency & Availability Design Patterns for the Cloud
  • 2. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. RELIABILITY AND RESILIENCY
  • 3. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 99.99%
  • 4. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Everything fails all the time. Werner Vogels CTO – Amazon.com “ “
  • 5. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Reliability: handles the “known, unknowns” Resiliency: handles the “unknown, unknowns”
  • 6. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How do we build resilient software systems?
  • 7. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. …
  • 8. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Quality is not an act, it is a habit” Aristotle, some time around 350BC
  • 9. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed Systems are hard
  • 10. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Complex systems Amazon Twitter Netflix
  • 11. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about Multi-AZ for Maximum availability
  • 12. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 13. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 14. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Better to react without reacting
  • 15. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 16. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. But Why?
  • 17. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3 AZ’s is better than 2
  • 18. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application Requires 8 Instances
  • 19. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application Requires 6 Instances
  • 20. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. https://aws.amazon.com/wellarchitected
  • 21. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about auto scaling
  • 22. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Auto-Scaling FixedVariable
  • 23. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Availability zone 1 Auto Scaling group AWS Region Availability zone 2 Auto-scaling for self-healing • Set min > 0 X
  • 24. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about decoupling and async
  • 25. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Process A Process B Process A Process B Synchronous Asynchronous Waiting Working Continues get or fetch resultGet result Pattern 5: Decoupling with async pattern
  • 26. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. API: {DO foo} PUT JOB: {JobID: 0001, Task: DO foo} API: {JobID: 0001} GET JOB: {JobID: 0001, Task: DO foo} {JobID: 0001, Result: bar} Cache node Worker Instance Worker Instance Queue/Streaming API Instance API Instance API Instance
  • 27. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Push Notification User Worker Instance Worker Instance API Instance API Instance Cache node Fetch results API Instance Queue/Streaming
  • 28. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Degrade & prioritize traffic with queues Worker Instance Worker Instance API Instance API Instance API Instance HighPriorityQueue LowPriorityQueue
  • 29. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about timeouts, backoff & retries!
  • 30. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Users App DB Conn Pool INSERT INSERT INSERT INSERT What happens if the DB “slows down”? Timeout client side Timeout backend side ??
  • 31. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. User 1 App DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = Not implemented Retry INSERT Retry INSERT ERROR: Failed to get connection from pool Retry
  • 32. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.http://docs.python-requests.org/en/master/user/advanced/#timeouts
  • 33. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. http://docs.python-requests.org/en/master/user/advanced/#timeouts
  • 34. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. How else could we have prevented the error? User 1 DB Conn Pool INSERT Retry INSERT Retry INSERT Retry ERROR: Failed to get connection from pool
  • 35. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Wait 16s before Retry Wait 8s before Retry Wait 4s before Retry Wait 2s before Retry User 1 DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = 10s INSERT INSERT Exponential Backoff? Releasing connectionsBackoff
  • 36. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No jitter With jitter https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ Simple Exponential Backoff is not enough: Add Jitter
  • 37. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 38. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: add jitter 0-1000ms def get_item(self, url, n=1): MAX_TRIES = 12 try: res = requests.get(url) except: if n > MAX_TRIES: return None n += 1 time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0)) return self.get_item(url, n) else: return res
  • 39. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about databases.
  • 40. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Database Federation Users DB Product s DB Instance InstanceInstance DB Instance DB instance read replica DB Instance DB instance read replica
  • 41. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Database Sharding User ShardID 002345 A 002346 B 002347 C 002348 B 002349 A CBA Instance InstanceInstance DB Instance DB instance read replica DB Instance DB instance read replica DB Instance DB instance read replica
  • 42. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 43. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Read / Write separation DB Instance DB instance read replica DB instance read replica DB instance read replica Instance InstanceInstance Supports degradation through Read-Only mode
  • 44. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about shuffle sharding.
  • 45. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 46. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 47. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 48. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Cascading Failures
  • 49. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Measure for this: blast radius
  • 50. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1 12 23 34 4 556 677 88
  • 52. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No of Nodes: 8 Shard Size: 2 Blast Radius Overlap % of Customers Impacted 0 53.6% 1 42.8% 2 3.6%
  • 53. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. No of Nodes: 100 Shard Size: 5 Blast Radius Overlap % of Customers Impacted 0 77% 1 21% 2 1.8% 3 0.06% 4 0.0006% 5 0.0000013%
  • 54. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Needs a client that retries or is fault tolerant Works for servers, queues or other resources Needs a routing mechanism such as per customer DNS Names Shuffle Sharding
  • 55. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Let’s talk about chaos!
  • 56. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GameDay at Amazon Creating Resiliency Through Destruction https://www.youtube.com/watch?v=zoz0ZjfrQ9s
  • 57. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Chaos engineering https://github.com/Netflix/SimianArmy
  • 58. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” http://principlesofchaos.org
  • 59. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Failure injection Start small & build confidence • Application level • Host failure • Resource attacks (CPU, memory, …) • Network attacks (dependencies, latency, …) • Region attacks • “Paul” attack
  • 60. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. @adhorn
  • 61. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. @adhorn
  • 62. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Plan for the worst, prepare for the unexpected.
  • 63. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. STEVEN BRYEN | AWS TECHNICAL & DEVELOPER EVANGELISM | @steven_bryen sbryen@amazon.com LONDON – MARCH 2019 Thank You!