SlideShare a Scribd company logo
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
N O R D I C S
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resiliency and Availability Design
Patterns for the Cloud
Adrian Hornsby
Sr. Technical Evangelist
Amazon Web Services
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Systems
are hard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Complex systems
Amazon Twitter Netflix
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partial failure mode
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resiliency: Ability for a system to handle and
eventually recover from unexpected conditions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we build resilient software
systems?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
…
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Quality is not an act, it is a habit”
Aristotle, some time around 350BC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/wellarchitected
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about Multi-AZ
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance
X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
• Enables fault-tolerant applications
• AWS regional services designed to
withstand AZ failures
• Leveraged by most AWS regional
services (S3, DynamoDB, ELBs, …)
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cascading Failures
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Better to react without reacting
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But Why?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
3 AZ’s is better than 2
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
Requires 8 Instances
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Region
Availability zone a Availability zone b Availability zone c
Application
Requires 6 Instances
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about auto scaling
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto-Scaling
FixedVariable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability zone 1
Auto Scaling group
AWS Region
Availability zone 2
Auto-scaling for self-healing
X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about decoupling and async
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process A Process B Process A Process B
Synchronous Asynchronous
Waiting
Working
Continues
get or fetch resultGet result
Decoupling with async pattern
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API: {DO foo}
PUT JOB: {JobID: 0001, Task: DO foo}
API: {JobID: 0001}
GET JOB: {JobID: 0001, Task: DO foo}
{JobID: 0001, Result: bar}
Cache node
Worker
Instance
Worker
Instance
Queue/Streaming
API
Instance
API
Instance
API
Instance
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Push Notification
User
Worker
Instance
Worker
Instance
API
Instance
API
Instance
Cache node
Fetch results
API
Instance
Queue/Streaming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Degrade & prioritize traffic
with queues
Worker
Instance
Worker
Instance
API
Instance
API
Instance
API
Instance
HighPriorityQueue
LowPriorityQueue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about timeouts, backoff &
retries!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Users
App
DB
Conn
Pool
INSERT
INSERT
INSERT
INSERT
What happens if the DB “slows down”?
Timeout client side Timeout backend side ??
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User 1
App
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = default = Infinite
Retry INSERT
Retry INSERT
ERROR: Failed to get connection from pool
Retry
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Set the timeouts!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How else could we have prevented the error?
User 1
DB
Conn
Pool
INSERT
Retry INSERT
Retry INSERT
Retry
ERROR: Failed to get connection from pool
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User 1
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = 10s
Wait 2s before Retry
INSERT
INSERT
Wait 4s before Retry
Wait 8s before Retry
Wait 16s before Retry
Backing off between retries
Releasing connectionsBackoff
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
No jitter With jitter
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
Simple Exponential Backoff is not enough: Add Jitter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: add jitter 0-1000ms
def get_item(self, url, n=1):
MAX_TRIES = 12
try:
res = requests.get(url)
except:
if n > MAX_TRIES:
return None
n += 1
time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0))
return self.get_item(url, n)
else:
return res
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Idempotent operation
No additional effect if it is called more than
once with the same input parameters.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Circuit Breaker
• Wrap a protected function
call in a circuit breaker
object, which monitors for
failures.
• If failures reach a certain
threshold, the circuit
breaker trips.
Producer Circuit Breaker Consumer
Connection
Monitoring
Timeouts
Breaking Circuit
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://spring.io/guides/gs/circuit-breaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about health checking!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Scaling group
Service A
Availability zone 1
Auto Scaling group
AWS Region
Service A
Availability zone 2
Service BService B
database Email
Probing for health
Cluster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shallow health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shallow health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
Are you healthy?
yes
yes
yes
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
no
Are you healthy?
no
yes
yes
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Prioritize shallow health checks during
hard times.
Cache.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about load shedding.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cheaply reject excess work
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Be careful when selecting the right
metric
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Don’t be overly optimistic and take on more than you can.
Find an operational metric to reject what you cannot take in.
Favor cached and static content
Prioritize ELB health check (shallow) pings
In an overload situation you have precious resources, do not
let any of it go to waste.
Load Shedding
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Service Degradation & Fallbacks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about databases!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database Federation
Users
DB
Products
DB
Instance InstanceInstance
DB Instance
DB instance read
replica
DB Instance
DB instance read
replica
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Database Sharding User ShardID
002345 A
002346 B
002347 C
002348 B
002349 A
CBA
Instance InstanceInstance
DB Instance
DB instance
read replica
DB Instance
DB instance
read replica
DB Instance
DB instance
read replica
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Read / Write separation
DB Instance DB instance read
replica
DB instance read
replica
DB instance read
replica
Instance InstanceInstance
Supports degradation through Read-Only mode
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Transient state does not
belong in the database.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about shuffle sharding.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
X X X X X X XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cascading Failures
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Measure for this: blast radius
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell-based architecture
XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
XX
♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢
♡ ♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 8
Shard size = 2
Combinations = 28
Overlap % customers
0 53.6%
1 42.8%
2 3.6%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 100
Shard size = 5
Combinations = 75 million!
Overlap % customers
0 77%
1 21%
2 1.8%
3 0.06%
4 0.0006%
5 0.0000013%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about chaos!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fire Drills
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GameDay at Amazon
Creating Resiliency Through Destruction
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering
https://github.com/Netflix/SimianArmy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Chaos Engineering is the discipline of
experimenting on a distributed system
in order to build confidence in the system’s
capability to withstand turbulent conditions in
production.”
http://principlesofchaos.org
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failure injection
• Start small & build confidence
• Application level
• Host failure
• Resource attacks (CPU, memory, …)
• Network attacks (dependencies, latency, …)
• Region attacks
• “Paul” attack
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Plan for the worst, prepare for the
unexpected.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adrian Hornsby
@adhorn
https://medium.com/@adhorn
https://speakerdeck.com/adhorn/patterns-for-building-resilient-software-systems-2019

More Related Content

What's hot

2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps
Cobus Bernard
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Amazon Web Services
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Amazon Web Services
 
[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers
Amazon Web Services
 
Pro-Tips-for-Builders-on-AWS
Pro-Tips-for-Builders-on-AWSPro-Tips-for-Builders-on-AWS
Pro-Tips-for-Builders-on-AWS
Amazon Web Services
 
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
Amazon Web Services
 
Introduction to the AWS Cloud - AWSome Day 2019 - Chicago
Introduction to the AWS Cloud - AWSome Day 2019 - ChicagoIntroduction to the AWS Cloud - AWSome Day 2019 - Chicago
Introduction to the AWS Cloud - AWSome Day 2019 - Chicago
Amazon Web Services
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
AWS Summits
 
Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
Amazon Web Services
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Summits
 
AWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico CityAWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico City
Amazon Web Services
 
Virtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web ServicesVirtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web Services
Amazon Web Services
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019
Amazon Web Services
 
AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019
Amazon Web Services
 
AWS Fargate deep dive - MAD303 - Chicago AWS Summit
AWS Fargate deep dive - MAD303 - Chicago AWS SummitAWS Fargate deep dive - MAD303 - Chicago AWS Summit
AWS Fargate deep dive - MAD303 - Chicago AWS Summit
Amazon Web Services
 
AWSome Day Nairobi 2019
AWSome Day Nairobi 2019AWSome Day Nairobi 2019
AWSome Day Nairobi 2019
Amazon Web Services
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Amazon Web Services
 
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Amazon Web Services
 
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
Amazon Web Services
 
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
Amazon Web Services Korea
 

What's hot (20)

2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps2019-11-09 DevOpsNG - What I've learned from DevOps
2019-11-09 DevOpsNG - What I've learned from DevOps
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
 
[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers[NEW LAUNCH] Introducing AWS Deep Learning Containers
[NEW LAUNCH] Introducing AWS Deep Learning Containers
 
Pro-Tips-for-Builders-on-AWS
Pro-Tips-for-Builders-on-AWSPro-Tips-for-Builders-on-AWS
Pro-Tips-for-Builders-on-AWS
 
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
Introduction to AWS OutIntroduction to AWS Outposts - CMP203 - Chicago AWS Su...
 
Introduction to the AWS Cloud - AWSome Day 2019 - Chicago
Introduction to the AWS Cloud - AWSome Day 2019 - ChicagoIntroduction to the AWS Cloud - AWSome Day 2019 - Chicago
Introduction to the AWS Cloud - AWSome Day 2019 - Chicago
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
 
Architetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS LambdaArchitetture serverless e pattern avanzati per AWS Lambda
Architetture serverless e pattern avanzati per AWS Lambda
 
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
AWS Analytics Services - When to use what? | AWS Summit Tel Aviv 2019
 
AWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico CityAWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico City
 
Virtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web ServicesVirtual AWSome Day October 2018 - Amazon Web Services
Virtual AWSome Day October 2018 - Amazon Web Services
 
Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019Design with ops in mind | AWS Summit Tel Aviv 2019
Design with ops in mind | AWS Summit Tel Aviv 2019
 
AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019AWSome Day Bethesda - February 2019
AWSome Day Bethesda - February 2019
 
AWS Fargate deep dive - MAD303 - Chicago AWS Summit
AWS Fargate deep dive - MAD303 - Chicago AWS SummitAWS Fargate deep dive - MAD303 - Chicago AWS Summit
AWS Fargate deep dive - MAD303 - Chicago AWS Summit
 
AWSome Day Nairobi 2019
AWSome Day Nairobi 2019AWSome Day Nairobi 2019
AWSome Day Nairobi 2019
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
 
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
 
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
The Next Wave of Retailing, An AWS Perspective - Tom Litchford 월드와이드 리테일 사업 개...
 

Similar to PatternsResiliency_DevDays2019.pdf

"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
Provectus
 
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloudAWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
Cobus Bernard
 
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloudAWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
Cobus Bernard
 
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloudAWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
Cobus Bernard
 
Tools for building your Startup on AWS
Tools for building your Startup on AWSTools for building your Startup on AWS
Tools for building your Startup on AWS
Rob De Feo
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
Amazon Web Services
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
Provectus
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Amazon Web Services
 
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
Amazon Web Services Korea
 
DevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the CloudDevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the Cloud
Cobus Bernard
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
Amazon Web Services Korea
 
How AWS builds Serverless services using Serverless
How AWS builds Serverless services using ServerlessHow AWS builds Serverless services using Serverless
How AWS builds Serverless services using Serverless
Chris Munns
 
AWS Startup Day Bogotá - Tools for Building Your Startup
AWS Startup Day Bogotá - Tools for Building Your StartupAWS Startup Day Bogotá - Tools for Building Your Startup
AWS Startup Day Bogotá - Tools for Building Your Startup
Amazon Web Services LATAM
 
AWS DevDay Berlin 2019 - Going Global With Serverless
AWS DevDay Berlin 2019 - Going Global With ServerlessAWS DevDay Berlin 2019 - Going Global With Serverless
AWS DevDay Berlin 2019 - Going Global With Serverless
Darko Mesaroš
 
Continuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero DowntimeContinuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero Downtime
Casey Lee
 
Tools for Building your MVP on AWS
Tools for Building your MVP on AWSTools for Building your MVP on AWS
Tools for Building your MVP on AWS
Amazon Web Services
 
AWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWSAWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWS
Cobus Bernard
 
AWS Startup Day Santiago - Tools For Building Your Startup
AWS Startup Day Santiago - Tools For Building Your StartupAWS Startup Day Santiago - Tools For Building Your Startup
AWS Startup Day Santiago - Tools For Building Your Startup
Amazon Web Services LATAM
 
Amazon and Region Build Engineering
Amazon and Region Build EngineeringAmazon and Region Build Engineering
Amazon and Region Build Engineering
Rich Alberth
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
AWS Summits
 

Similar to PatternsResiliency_DevDays2019.pdf (20)

"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloudAWS DevDay Vienna - Resiliency and availability design patterns for the cloud
AWS DevDay Vienna - Resiliency and availability design patterns for the cloud
 
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloudAWS DevDay Cologne - Resiliency and availability design patterns for the cloud
AWS DevDay Cologne - Resiliency and availability design patterns for the cloud
 
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloudAWS DevDay Berlin - Resiliency and availability design patterns for the cloud
AWS DevDay Berlin - Resiliency and availability design patterns for the cloud
 
Tools for building your Startup on AWS
Tools for building your Startup on AWSTools for building your Startup on AWS
Tools for building your Startup on AWS
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
 
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
교육, 연구 개발자가 직접 전하는 AWS를 선택한 이유 Part.3 - 김재동 교사, IndiSchool (NPO) :: AWS Summi...
 
DevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the CloudDevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the Cloud
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
 
How AWS builds Serverless services using Serverless
How AWS builds Serverless services using ServerlessHow AWS builds Serverless services using Serverless
How AWS builds Serverless services using Serverless
 
AWS Startup Day Bogotá - Tools for Building Your Startup
AWS Startup Day Bogotá - Tools for Building Your StartupAWS Startup Day Bogotá - Tools for Building Your Startup
AWS Startup Day Bogotá - Tools for Building Your Startup
 
AWS DevDay Berlin 2019 - Going Global With Serverless
AWS DevDay Berlin 2019 - Going Global With ServerlessAWS DevDay Berlin 2019 - Going Global With Serverless
AWS DevDay Berlin 2019 - Going Global With Serverless
 
Continuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero DowntimeContinuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero Downtime
 
Tools for Building your MVP on AWS
Tools for Building your MVP on AWSTools for Building your MVP on AWS
Tools for Building your MVP on AWS
 
AWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWSAWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWS
 
AWS Startup Day Santiago - Tools For Building Your Startup
AWS Startup Day Santiago - Tools For Building Your StartupAWS Startup Day Santiago - Tools For Building Your Startup
AWS Startup Day Santiago - Tools For Building Your Startup
 
Amazon and Region Build Engineering
Amazon and Region Build EngineeringAmazon and Region Build Engineering
Amazon and Region Build Engineering
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

PatternsResiliency_DevDays2019.pdf

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. N O R D I C S
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency and Availability Design Patterns for the Cloud Adrian Hornsby Sr. Technical Evangelist Amazon Web Services
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Distributed Systems are hard
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Complex systems Amazon Twitter Netflix
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Partial failure mode
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency: Ability for a system to handle and eventually recover from unexpected conditions
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do we build resilient software systems?
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. …
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Quality is not an act, it is a habit” Aristotle, some time around 350BC
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/wellarchitected
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about Multi-AZ
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby X
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby X
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance X
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture • Enables fault-tolerant applications • AWS regional services designed to withstand AZ failures • Leveraged by most AWS regional services (S3, DynamoDB, ELBs, …) Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cascading Failures
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Better to react without reacting
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. But Why?
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. 3 AZ’s is better than 2
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application Requires 8 Instances
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Region Availability zone a Availability zone b Availability zone c Application Requires 6 Instances
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about auto scaling
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto-Scaling FixedVariable
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability zone 1 Auto Scaling group AWS Region Availability zone 2 Auto-scaling for self-healing X
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about decoupling and async
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Process A Process B Process A Process B Synchronous Asynchronous Waiting Working Continues get or fetch resultGet result Decoupling with async pattern
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. API: {DO foo} PUT JOB: {JobID: 0001, Task: DO foo} API: {JobID: 0001} GET JOB: {JobID: 0001, Task: DO foo} {JobID: 0001, Result: bar} Cache node Worker Instance Worker Instance Queue/Streaming API Instance API Instance API Instance
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Push Notification User Worker Instance Worker Instance API Instance API Instance Cache node Fetch results API Instance Queue/Streaming
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Degrade & prioritize traffic with queues Worker Instance Worker Instance API Instance API Instance API Instance HighPriorityQueue LowPriorityQueue
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about timeouts, backoff & retries!
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Users App DB Conn Pool INSERT INSERT INSERT INSERT What happens if the DB “slows down”? Timeout client side Timeout backend side ??
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. User 1 App DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = default = Infinite Retry INSERT Retry INSERT ERROR: Failed to get connection from pool Retry
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Set the timeouts!
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How else could we have prevented the error? User 1 DB Conn Pool INSERT Retry INSERT Retry INSERT Retry ERROR: Failed to get connection from pool
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. User 1 DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = 10s Wait 2s before Retry INSERT INSERT Wait 4s before Retry Wait 8s before Retry Wait 16s before Retry Backing off between retries Releasing connectionsBackoff
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. No jitter With jitter https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ Simple Exponential Backoff is not enough: Add Jitter
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: add jitter 0-1000ms def get_item(self, url, n=1): MAX_TRIES = 12 try: res = requests.get(url) except: if n > MAX_TRIES: return None n += 1 time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0)) return self.get_item(url, n) else: return res
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Idempotent operation No additional effect if it is called more than once with the same input parameters.
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Circuit Breaker • Wrap a protected function call in a circuit breaker object, which monitors for failures. • If failures reach a certain threshold, the circuit breaker trips. Producer Circuit Breaker Consumer Connection Monitoring Timeouts Breaking Circuit
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://spring.io/guides/gs/circuit-breaker/
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about health checking!
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Scaling group Service A Availability zone 1 Auto Scaling group AWS Region Service A Availability zone 2 Service BService B database Email Probing for health Cluster
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shallow health check Instance Cache node Email database Cluster Are you healthy? yes
  • 51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shallow health check Instance Cache node Email database Cluster Are you healthy? yes
  • 52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep health check Instance Cache node Email database Cluster Are you healthy? yes Are you healthy? yes yes yes yes
  • 53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep health check Instance Cache node Email database Cluster Are you healthy? no Are you healthy? no yes yes yes
  • 54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Prioritize shallow health checks during hard times. Cache.
  • 55. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about load shedding.
  • 56. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 57. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 58. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cheaply reject excess work
  • 59. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 60. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Be careful when selecting the right metric
  • 61. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Don’t be overly optimistic and take on more than you can. Find an operational metric to reject what you cannot take in. Favor cached and static content Prioritize ELB health check (shallow) pings In an overload situation you have precious resources, do not let any of it go to waste. Load Shedding
  • 62. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Service Degradation & Fallbacks
  • 63. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about databases!
  • 64. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Database Federation Users DB Products DB Instance InstanceInstance DB Instance DB instance read replica DB Instance DB instance read replica
  • 65. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Database Sharding User ShardID 002345 A 002346 B 002347 C 002348 B 002349 A CBA Instance InstanceInstance DB Instance DB instance read replica DB Instance DB instance read replica DB Instance DB instance read replica
  • 66. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Read / Write separation DB Instance DB instance read replica DB instance read replica DB instance read replica Instance InstanceInstance Supports degradation through Read-Only mode
  • 67. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Transient state does not belong in the database.
  • 68. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about shuffle sharding.
  • 69. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 70. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. X X X X X X XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 71. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cascading Failures
  • 72. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Measure for this: blast radius
  • 73. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 74. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell-based architecture XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 75. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding XX ♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢ ♡ ♧♢
  • 76. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 8 Shard size = 2 Combinations = 28 Overlap % customers 0 53.6% 1 42.8% 2 3.6%
  • 77. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 100 Shard size = 5 Combinations = 75 million! Overlap % customers 0 77% 1 21% 2 1.8% 3 0.06% 4 0.0006% 5 0.0000013%
  • 78. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 79. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about chaos!
  • 80. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fire Drills
  • 81. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. GameDay at Amazon Creating Resiliency Through Destruction https://www.youtube.com/watch?v=zoz0ZjfrQ9s
  • 82. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering https://github.com/Netflix/SimianArmy
  • 83. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” http://principlesofchaos.org
  • 84. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failure injection • Start small & build confidence • Application level • Host failure • Resource attacks (CPU, memory, …) • Network attacks (dependencies, latency, …) • Region attacks • “Paul” attack
  • 85. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Plan for the worst, prepare for the unexpected.
  • 86. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 87. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 88. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adrian Hornsby @adhorn https://medium.com/@adhorn https://speakerdeck.com/adhorn/patterns-for-building-resilient-software-systems-2019