SlideShare a Scribd company logo
1 of 103
Download to read offline
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resiliency and Availability Design
Patterns for the Cloud
B A R 4
K Y I V
11.06.2019
{
"name": "Sébastien Stormacq",
"role": ”Technical Evangelist",
"company": "Amazon Web Services”,
"twitter": ”@sebsto”,
”github": ”sebsto”
}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Can you guess what will happen?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failures are a given and
everything will eventually fail
over time.
Werner Vogels
CTO – Amazon.com
“ “
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Distributed Systems
are hard
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Complex systems
Amazon Twitter Netflix
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Resiliency: Ability for a system to handle and
eventually recover from unexpected conditions
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Partial failure mode
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we build resilient software
systems?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
People
Application
Network & Data
Infrastructure
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about Availability
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability in parallel
A = 1 – (1 – Ax)2
Part X
Part X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability in parallel
Component Availability Downtime
X 99% (2-nines) 3 days 15 hours
Two X in parallel 99.99% (4-nines) 52 minutes
Three X in parallel 99.9999% (6-nines) 31 seconds
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Component redundancy increases availability
significantly!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fully-scaled Availability Zone
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Highly redundant regional network
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Region and availability zones
Region
Availability zone a Availability zone b Availability zone c
data center
data center
data center
1 or more data centers per AZ
2 or more AZs per region (new regions min 3)
data center
data center
data center
data center
data center
data center
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about Multi-AZ
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
Elastic Load
Balancing (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
X
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
Elastic Load
Balancing (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
X
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
Elastic Load
Balancing (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
X
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
new master
Elastic Load
Balancing (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Multi-AZ architecture
• Enables fault-tolerant applications
• AWS regional services designed to
withstand AZ failures
• Leveraged by AWS regional
services such as Amazon S3,
Amazon DynamoDB, Amazon
Aurora, Amazon ELBs, etc.
Region
Availability zone a Availability zone b Availability zone c
Instances Instances Instances
DB Instance DB instance
standby
Elastic Load
Balancing (ELB)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about auto scaling
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto-Scaling
FixedVariable
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Availability zone 1
Auto Scaling group
AWS Region
Availability zone 2
Auto-scaling for self-healing
Elastic Load
Balancing (ELB)
X
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about decoupling and async
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Process A Process B Process A Process B
Synchronous Asynchronous
Waiting
Working
Continues
get or fetch resultGet result
Decoupling with async pattern
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API: {DO foo}
PUT JOB: {JobID: 0001, Task: DO foo}
API: {JobID: 0001}
GET JOB: {JobID: 0001, Task: DO foo}
{JobID: 0001, Result: bar}
Cache node
Worker
Instance
Worker
Instance
Queue/Streaming
API
Instance
API
Instance
API
Instance
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Push Notification
User
Worker
Instance
Worker
Instance
API
Instance
API
Instance
Cache node
Fetch results
API
Instance
Queue/Streaming
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Degrade & prioritize traffic
with queues
Worker
Instance
Worker
Instance
API
Instance
API
Instance
API
Instance
HighPriorityQueue
LowPriorityQueue
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about the failures in
distributed systems
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recommendation Engine
Service
Service
Service
Preserve
at all cost
Preventing failures
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Some of the most important things to think about
Recommendation Engine
Service
Service
Service
Preserve
at all cost
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about timeouts, backoff &
retries!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Users
App
DB
Conn
Pool
INSERT
INSERT
INSERT
INSERT
What happens if the DB “slows down”?
Timeout client side Timeout backend side ??
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User 1
App
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = default = Infinite
Retry INSERT
Retry INSERT
ERROR: Failed to get connection from pool
Retry
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.timeout
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-configuration-properties.html
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
@timeout_decorator.timeout(5, timeout_exception=StopIteration)
def timed_get(url):
return requests.get(url)
https://pypi.org/project/timeout-decorator/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Set the timeouts!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How else could we have prevented the error?
User 1
DB
Conn
Pool
INSERT
Retry INSERT
Retry INSERT
Retry
ERROR: Failed to get connection from pool
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
User 1
DB
Conn
Pool
INSERT
Timeout client side = 10s Timeout backend side = 10s
Wait 2s before Retry
INSERT
INSERT
Wait 4s before Retry
Wait 8s before Retry
Wait 16s before Retry
Backing off between retries
Releasing connectionsBackoff
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
No jitter With jitter
https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/
Simple Exponential Backoff is not enough: Add Jitter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding Jitter
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example: add jitter 0-1000ms
def get_item(self, url, n=1):
MAX_TRIES = 12
try:
res = requests.get(url)
except:
if n > MAX_TRIES:
return None
n += 1
time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0))
return self.get_item(url, n)
else:
return res
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
@backoff.on_exception(backoff.full_jitter, max_time=60)
def poll_for_message(queue):
return queue.get()
https://pypi.org/project/backoff/
As of version 1.2, the default jitter function backoff.full_jitter implements the ‘Full Jitter’ algorithm as defined in the
AWS Architecture Blog’s Exponential Backoff And Jitter post.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Idempotent operation
No additional effect if it is called more than
once with the same input parameters.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Circuit Breaker
• Wrap a protected function
call in a circuit breaker
object, which monitors for
failures.
• If failures reach a certain
threshold, the circuit
breaker trips.
Producer Circuit Breaker Consumer
Connection
Monitoring
Timeouts
Breaking Circuit
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://github.com/Netflix/Hystrix
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://spring.io/guides/gs/circuit-breaker/
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about health checking!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Auto Scaling group
Service A
Availability zone 1
Auto Scaling group
AWS Region
Service A
Availability zone 2
Service BService B
database Email
Probing for health
Cluster
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shallow health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shallow health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
yes
Are you healthy?
yes
yes
yes
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Deep health check
Instance
Cache node
Email
database
Cluster
Are you healthy?
no
Are you healthy?
no
yes
yes
yes
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Prioritize shallow health checks during
hard times.
Cache.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about load shedding.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cheaply reject excess work
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Be careful when selecting the right
metric
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Don’t be overly optimistic and take on more than you can.
Find an operational metric to reject what you cannot take in.
Favor cached and static content
Prioritize ELB health check (shallow) pings
In an overload situation you have precious resources, do not
let any of it go to waste.
Load Shedding
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Service Degradation & Fallbacks
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.https://twitter.com/redditstatus/status/1116204502703493120
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about shuffle sharding.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
X X X X X X XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Measure for this: blast radius
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Blast radius
• How many customers?
• What functionality?
• How many locations?
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Cell-based architecture
XX
♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
XX
♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢
♡ ♧♢
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 8
Shard size = 2
Combinations = 28
Overlap % customers
0 53.6%
1 42.8%
2 3.6%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
Nodes = 100
Shard size = 5
Combinations = 75 million!
Overlap % customers
0 77%
1 21%
2 1.8%
3 0.06%
4 0.0006%
5 0.0000013%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Shuffle sharding
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about chaos!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fire Drills
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GameDay at Amazon
Creating Resiliency Through Destruction
https://www.youtube.com/watch?v=zoz0ZjfrQ9s
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering
https://github.com/Netflix/SimianArmy
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Chaos Engineering is the discipline of
experimenting on a distributed system
in order to build confidence in the system’s
capability to withstand turbulent conditions in
production.”
http://principlesofchaos.org
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Failure injection
• Start small & build confidence
• Application level
• Host failure
• Resource attacks (CPU, memory, …)
• Network attacks (dependencies, latency, …)
• Region attacks
• “Paul” attack
https://www.gremlin.comhttps://github.com/Netflix/SimianArmy https://chaostoolkit.org
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bananas for Monkeys
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to DDoS yourself
~ wrk -t12 -c400 -d30s http://127.0.0.1/api/health
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Adding delay to the network
~ tc qdisc add dev eth0 root netem delay 200ms
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://github.com/Netflix/SimianArmy
Set of scheduled agent:
• shuts down services randomly
• slows down performances
• checks conformity
• breaks an entire region
• Integrates with spinnaker (CI/CD)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Let’s talk about operational resiliency
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Value realized by example
Operational
resilience
1. Scaled to handle a 400% increase in page views (Kurt Geiger)
2. Improved security posture (CapitalOne)
3. 8600 transactions/second (McDonalds)
4. Transfer of over 750 TB of data from pipeline inspection machinery (GE)
5. Processing over 75 billion market events daily (FINRA)
6. Critical applications run in multiple AZs, x-Regions for robust disaster recovery (Expedia)
7. Supports over 300,000 requests per minute to its API (Easy Taxi)
8. 60% reduced downtime (Trainline)
9. Migration of SAP on Oracle to AWS with zero unplanned downtime across five countries
(Kellogg’s)
10. SAP availability boosted to 100% (MacMillan)
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational Resilience
Operational
resilience
Critical workloads run in Multiple
AZs and Regions for robust DR
(Expedia)
Benefit of improving SLAs and reducing
unplanned outages
What is it?
Example
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The cost of downtime
Annual Fortune
1000 application
downtime costs
(IDC)
$1.25 to
$2.5B
Average cost of
a data breach
(Ponemon
Institute)
$3.6M
Cost/hr of a
critical
application
failure (IDC)
$500K
to $1M
Average cost/hr
of downtime
(Ponemon
Institute)
$474K
Average cost per
lost or stolen
record
(Ponemon
Institute)
$141
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational resilience: Quantifying cost
Cost Category % of Total Definition
Third Parties 1.3%
The cost of contractors, consultants, auditors and other specialists engaged to
help resolve unplanned outages.
Equipment 1.3% The cost of new equipment purchases and repairs, including refurbishment.
Ex-post Activities 1.1%
All after-the-fact incidental costs associated with business
disruption and recovery.
Recovery 2.9%
Activities and associated costs that relate to bringing the organization’s
networks and core systems back to a state of readiness.
Detection 3.6%
Activities associated with the initial discovery and subsequent investigation
of the partial or complete outage incident.
IT Productivity 8.4% The lost time and related expenses associated with IT personnel downtime.
End-user Productivity 18.7% The lost time and related expenses associated with end-user downtime.
Lost Revenue 28.2%
The total revenue loss from customers and potential customers because of
their inability to access core systems during the outage period.
Business disruption 34.6%
Additional economic loss of the outage, including reputational damages,
customer churn and lost business opportunities.
TOTAL 100.0%
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational resilience: Case studies
Migrated to AWS in 6 weeks with
no downtime and improved
availability to 99.99%+
Migrated all workloads to AWS to
reduce downtime by 60% with an
annual savings of £1.2M
Rebuilt patient engagement portal
on AWS and reduced downtime
from 120 to <5 min / month
Using AWS, Travelstart has seized
opportunities in emerging markets
and has cut operational costs by
43% and downtime by 25%
With its on-premises setup, the
availability of its system ran to 98%, but
on its cloud infrastructure, this has risen
to 99.965%
Three 9’s to five 9’s
“We no longer need to worry about data
center, server, or hypervisor
security…which allows us to focus our
attention on securing our applications.”
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
And before we go.
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DON’T blame people for failure…
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
“Quality is not an act, it is a habit”
Aristotle, some time around 350BC
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://aws.amazon.com/wellarchitected
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://medium.com/@adhorn
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
{
"name": "Sébastien Stormacq",
"role": ”Technical Evangelist",
"company": "Amazon Web Services”,
"twitter": ”@sebsto”,
”github": ”sebsto”
}
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Enabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingEnabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingAmazon Web Services
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Amazon Web Services
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Amazon Web Services
 
Storage for business-critical applications - STG303 - Santa Clara AWS Summit
Storage for business-critical applications - STG303 - Santa Clara AWS SummitStorage for business-critical applications - STG303 - Santa Clara AWS Summit
Storage for business-critical applications - STG303 - Santa Clara AWS SummitAmazon Web Services
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Amazon Web Services
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Amazon Web Services
 
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS Summit
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS SummitGetting started with robots and AWS RoboMaker - SVC208 - New York AWS Summit
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS SummitAmazon Web Services
 
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Amazon Web Services
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019AWS Summits
 
AWS SSA Webinar 7 - Getting Started on AWS
AWS SSA Webinar 7 - Getting Started on AWSAWS SSA Webinar 7 - Getting Started on AWS
AWS SSA Webinar 7 - Getting Started on AWSCobus Bernard
 
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...Amazon Web Services
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Amazon Web Services
 
Introduction to the AWS Cloud - AWSome Day 2019 - Toronto
Introduction to the AWS Cloud - AWSome Day 2019 - TorontoIntroduction to the AWS Cloud - AWSome Day 2019 - Toronto
Introduction to the AWS Cloud - AWSome Day 2019 - TorontoAmazon Web Services
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerArun Gupta
 
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...Amazon Web Services
 
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019Amazon Web Services Korea
 
Increase the value of video using ML and AWS media services - SVC301 - Santa ...
Increase the value of video using ML and AWS media services - SVC301 - Santa ...Increase the value of video using ML and AWS media services - SVC301 - Santa ...
Increase the value of video using ML and AWS media services - SVC301 - Santa ...Amazon Web Services
 
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS Summit
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS SummitDeveloping intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS Summit
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS SummitAmazon Web Services
 
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...Amazon Web Services
 
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...Amazon Web Services
 

What's hot (20)

Enabling Research Using Cloud Computing
Enabling Research Using Cloud ComputingEnabling Research Using Cloud Computing
Enabling Research Using Cloud Computing
 
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
Setting up custom machine learning environments on AWS - AIM204 - Chicago AWS...
 
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
Migliora la disponibilità e le prestazioni delle tue applicazioni con Amazon ...
 
Storage for business-critical applications - STG303 - Santa Clara AWS Summit
Storage for business-critical applications - STG303 - Santa Clara AWS SummitStorage for business-critical applications - STG303 - Santa Clara AWS Summit
Storage for business-critical applications - STG303 - Santa Clara AWS Summit
 
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
Grid computing in the cloud for Financial Services industry - CMP205-I - New ...
 
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
Introduction to EC2 A1 instances, powered by the AWS Graviton processor - CMP...
 
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS Summit
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS SummitGetting started with robots and AWS RoboMaker - SVC208 - New York AWS Summit
Getting started with robots and AWS RoboMaker - SVC208 - New York AWS Summit
 
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
Build intelligent applications quickly with AWS AI services - AIM301 - New Yo...
 
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019Orchestrating containers on AWS  | AWS Summit Tel Aviv 2019
Orchestrating containers on AWS | AWS Summit Tel Aviv 2019
 
AWS SSA Webinar 7 - Getting Started on AWS
AWS SSA Webinar 7 - Getting Started on AWSAWS SSA Webinar 7 - Getting Started on AWS
AWS SSA Webinar 7 - Getting Started on AWS
 
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...
Video anomaly detection using Amazon SageMaker, AWS DeepLens, & AWS IoT Green...
 
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
Next generation intelligent data lakes, powered by GraphQL & AWS AppSync - MA...
 
Introduction to the AWS Cloud - AWSome Day 2019 - Toronto
Introduction to the AWS Cloud - AWSome Day 2019 - TorontoIntroduction to the AWS Cloud - AWSome Day 2019 - Toronto
Introduction to the AWS Cloud - AWSome Day 2019 - Toronto
 
Secure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using FirecrackerSecure and Fast microVM for Serverless Computing using Firecracker
Secure and Fast microVM for Serverless Computing using Firecracker
 
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...
Get hands-on with AWS DeepRacer and compete in the AWS DeepRacer League - AIM...
 
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019
KINX와 함께 하는 AWS Direct Connect 도입 - 남시우 매니저, KINX :: AWS Summit Seoul 2019
 
Increase the value of video using ML and AWS media services - SVC301 - Santa ...
Increase the value of video using ML and AWS media services - SVC301 - Santa ...Increase the value of video using ML and AWS media services - SVC301 - Santa ...
Increase the value of video using ML and AWS media services - SVC301 - Santa ...
 
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS Summit
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS SummitDeveloping intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS Summit
Developing intelligent robots with AWS RoboMaker - SVC207 - Atlanta AWS Summit
 
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...
Best Friends Animal Society saves puppies (and data) with N2WS & AWS - SVC211...
 
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
[REPEAT] Optimize your workloads with Amazon EC2 & AMD EPYC - DEM01-R - Santa...
 

Similar to "Resiliency and Availability Design Patterns for the Cloud", Sebastien Stormacq, AWS Dev Day Kyiv 2019

PatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfPatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfAmazon Web Services
 
PatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfPatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfAmazon Web Services
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudAmazon Web Services
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
GraphQL backend with AWS AppSync & AWS Lambda
GraphQL backend with AWS AppSync & AWS LambdaGraphQL backend with AWS AppSync & AWS Lambda
GraphQL backend with AWS AppSync & AWS LambdaAleksandr Maklakov
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...Provectus
 
DevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the CloudDevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the CloudCobus Bernard
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019AWS Summits
 
Tools for building your Startup on AWS
Tools for building your Startup on AWSTools for building your Startup on AWS
Tools for building your Startup on AWSRob De Feo
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon Web Services
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...Amazon Web Services Korea
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitAmazon Web Services
 
AWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWSAWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWSCobus Bernard
 
Continuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero DowntimeContinuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero DowntimeCasey Lee
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...Provectus
 
Tools for Building your MVP on AWS
Tools for Building your MVP on AWSTools for Building your MVP on AWS
Tools for Building your MVP on AWSAmazon Web Services
 
How to build a FullStack Airline Ticketing Web App.pdf
How to build a FullStack Airline Ticketing Web App.pdfHow to build a FullStack Airline Ticketing Web App.pdf
How to build a FullStack Airline Ticketing Web App.pdfAmazon Web Services
 
以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構Amazon Web Services
 

Similar to "Resiliency and Availability Design Patterns for the Cloud", Sebastien Stormacq, AWS Dev Day Kyiv 2019 (20)

PatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfPatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdf
 
PatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdfPatternsResiliency_DevDays2019.pdf
PatternsResiliency_DevDays2019.pdf
 
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-CloudResiliency-and-Availability-Design-Patterns-for-the-Cloud
Resiliency-and-Availability-Design-Patterns-for-the-Cloud
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
GraphQL backend with AWS AppSync & AWS Lambda
GraphQL backend with AWS AppSync & AWS LambdaGraphQL backend with AWS AppSync & AWS Lambda
GraphQL backend with AWS AppSync & AWS Lambda
 
AWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico CityAWSome Day 2019 - Mexico City
AWSome Day 2019 - Mexico City
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
DevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the CloudDevConZM - Modern Applications Development in the Cloud
DevConZM - Modern Applications Development in the Cloud
 
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
Budget management with Cloud Economics | AWS Summit Tel Aviv 2019
 
Tools for building your Startup on AWS
Tools for building your Startup on AWSTools for building your Startup on AWS
Tools for building your Startup on AWS
 
Amazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costiAmazon EC2 Strategie per l'ottimizzazione dei costi
Amazon EC2 Strategie per l'ottimizzazione dei costi
 
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
AWS Lambda 내부 동작 방식 및 활용 방법 자세히 살펴 보기 - 김일호 솔루션즈 아키텍트 매니저, AWS :: AWS Summit ...
 
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS SummitGetting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
Getting Started with ARM-Based EC2 A1 Instances - CMP302 - Anaheim AWS Summit
 
AWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWSAWS Startup Garage - Building your MVP on AWS
AWS Startup Garage - Building your MVP on AWS
 
Cloud Migration Workshop
Cloud Migration WorkshopCloud Migration Workshop
Cloud Migration Workshop
 
Continuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero DowntimeContinuous Delivery on AWS with Zero Downtime
Continuous Delivery on AWS with Zero Downtime
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
Tools for Building your MVP on AWS
Tools for Building your MVP on AWSTools for Building your MVP on AWS
Tools for Building your MVP on AWS
 
How to build a FullStack Airline Ticketing Web App.pdf
How to build a FullStack Airline Ticketing Web App.pdfHow to build a FullStack Airline Ticketing Web App.pdf
How to build a FullStack Airline Ticketing Web App.pdf
 
以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構以容器技術為基礎的混合雲設計架構
以容器技術為基礎的混合雲設計架構
 

More from Provectus

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP SolutionProvectus
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Provectus
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningProvectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...Provectus
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...Provectus
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...Provectus
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMProvectus
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupProvectus
 
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupAndrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupProvectus
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupProvectus
 

More from Provectus (20)

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
 
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC MeetupAndrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
Andrei Grigoriev | Version Control in Data Science | Kazan ODSC Meetup
 
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC MeetupModern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
Modern word embeddings | Andrei Kulagin | Kazan ODSC Meetup
 

Recently uploaded

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 

Recently uploaded (20)

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 

"Resiliency and Availability Design Patterns for the Cloud", Sebastien Stormacq, AWS Dev Day Kyiv 2019

  • 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency and Availability Design Patterns for the Cloud B A R 4 K Y I V 11.06.2019 { "name": "Sébastien Stormacq", "role": ”Technical Evangelist", "company": "Amazon Web Services”, "twitter": ”@sebsto”, ”github": ”sebsto” }
  • 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Can you guess what will happen?
  • 3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failures are a given and everything will eventually fail over time. Werner Vogels CTO – Amazon.com “ “
  • 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Distributed Systems are hard
  • 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Complex systems Amazon Twitter Netflix
  • 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Resiliency: Ability for a system to handle and eventually recover from unexpected conditions
  • 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Partial failure mode
  • 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do we build resilient software systems?
  • 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. People Application Network & Data Infrastructure
  • 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about Availability
  • 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability in parallel A = 1 – (1 – Ax)2 Part X Part X
  • 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability in parallel Component Availability Downtime X 99% (2-nines) 3 days 15 hours Two X in parallel 99.99% (4-nines) 52 minutes Three X in parallel 99.9999% (6-nines) 31 seconds
  • 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Component redundancy increases availability significantly!
  • 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fully-scaled Availability Zone
  • 16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Highly redundant regional network
  • 17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Region and availability zones Region Availability zone a Availability zone b Availability zone c data center data center data center 1 or more data centers per AZ 2 or more AZs per region (new regions min 3) data center data center data center data center data center data center
  • 18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about Multi-AZ
  • 19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby Elastic Load Balancing (ELB)
  • 20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture X Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby Elastic Load Balancing (ELB)
  • 21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture X Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby Elastic Load Balancing (ELB)
  • 22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture X Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance new master Elastic Load Balancing (ELB)
  • 23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Multi-AZ architecture • Enables fault-tolerant applications • AWS regional services designed to withstand AZ failures • Leveraged by AWS regional services such as Amazon S3, Amazon DynamoDB, Amazon Aurora, Amazon ELBs, etc. Region Availability zone a Availability zone b Availability zone c Instances Instances Instances DB Instance DB instance standby Elastic Load Balancing (ELB)
  • 24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about auto scaling
  • 25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto-Scaling FixedVariable
  • 26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Availability zone 1 Auto Scaling group AWS Region Availability zone 2 Auto-scaling for self-healing Elastic Load Balancing (ELB) X
  • 27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about decoupling and async
  • 28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Process A Process B Process A Process B Synchronous Asynchronous Waiting Working Continues get or fetch resultGet result Decoupling with async pattern
  • 29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. API: {DO foo} PUT JOB: {JobID: 0001, Task: DO foo} API: {JobID: 0001} GET JOB: {JobID: 0001, Task: DO foo} {JobID: 0001, Result: bar} Cache node Worker Instance Worker Instance Queue/Streaming API Instance API Instance API Instance
  • 30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Push Notification User Worker Instance Worker Instance API Instance API Instance Cache node Fetch results API Instance Queue/Streaming
  • 31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Degrade & prioritize traffic with queues Worker Instance Worker Instance API Instance API Instance API Instance HighPriorityQueue LowPriorityQueue
  • 32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about the failures in distributed systems
  • 33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recommendation Engine Service Service Service Preserve at all cost Preventing failures
  • 34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Some of the most important things to think about Recommendation Engine Service Service Service Preserve at all cost
  • 35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about timeouts, backoff & retries!
  • 36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Users App DB Conn Pool INSERT INSERT INSERT INSERT What happens if the DB “slows down”? Timeout client side Timeout backend side ??
  • 37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. User 1 App DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = default = Infinite Retry INSERT Retry INSERT ERROR: Failed to get connection from pool Retry
  • 38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://docs.microsoft.com/en-us/dotnet/api/system.net.httpwebrequest.timeout
  • 39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. @timeout_decorator.timeout(5, timeout_exception=StopIteration) def timed_get(url): return requests.get(url) https://pypi.org/project/timeout-decorator/
  • 42. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Set the timeouts!
  • 43. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How else could we have prevented the error? User 1 DB Conn Pool INSERT Retry INSERT Retry INSERT Retry ERROR: Failed to get connection from pool
  • 44. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. User 1 DB Conn Pool INSERT Timeout client side = 10s Timeout backend side = 10s Wait 2s before Retry INSERT INSERT Wait 4s before Retry Wait 8s before Retry Wait 16s before Retry Backing off between retries Releasing connectionsBackoff
  • 45. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. No jitter With jitter https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/ Simple Exponential Backoff is not enough: Add Jitter
  • 46. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adding Jitter
  • 47. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 48. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example: add jitter 0-1000ms def get_item(self, url, n=1): MAX_TRIES = 12 try: res = requests.get(url) except: if n > MAX_TRIES: return None n += 1 time.sleep((2 ** n) + (random.randint(0, 1000) / 1000.0)) return self.get_item(url, n) else: return res
  • 49. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. @backoff.on_exception(backoff.full_jitter, max_time=60) def poll_for_message(queue): return queue.get() https://pypi.org/project/backoff/ As of version 1.2, the default jitter function backoff.full_jitter implements the ‘Full Jitter’ algorithm as defined in the AWS Architecture Blog’s Exponential Backoff And Jitter post.
  • 50. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Idempotent operation No additional effect if it is called more than once with the same input parameters.
  • 51. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Circuit Breaker • Wrap a protected function call in a circuit breaker object, which monitors for failures. • If failures reach a certain threshold, the circuit breaker trips. Producer Circuit Breaker Consumer Connection Monitoring Timeouts Breaking Circuit
  • 52. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://github.com/Netflix/Hystrix
  • 53. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://spring.io/guides/gs/circuit-breaker/
  • 54. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about health checking!
  • 55. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Auto Scaling group Service A Availability zone 1 Auto Scaling group AWS Region Service A Availability zone 2 Service BService B database Email Probing for health Cluster
  • 56. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shallow health check Instance Cache node Email database Cluster Are you healthy? yes
  • 57. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shallow health check Instance Cache node Email database Cluster Are you healthy? yes
  • 58. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep health check Instance Cache node Email database Cluster Are you healthy? yes Are you healthy? yes yes yes yes
  • 59. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Deep health check Instance Cache node Email database Cluster Are you healthy? no Are you healthy? no yes yes yes
  • 60. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Prioritize shallow health checks during hard times. Cache.
  • 61. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about load shedding.
  • 62. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 63. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 64. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cheaply reject excess work
  • 65. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 66. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Be careful when selecting the right metric
  • 67. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Don’t be overly optimistic and take on more than you can. Find an operational metric to reject what you cannot take in. Favor cached and static content Prioritize ELB health check (shallow) pings In an overload situation you have precious resources, do not let any of it go to waste. Load Shedding
  • 68. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Service Degradation & Fallbacks
  • 69. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.https://twitter.com/redditstatus/status/1116204502703493120
  • 70. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about shuffle sharding.
  • 71. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 72. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. X X X X X X XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 73. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Measure for this: blast radius
  • 74. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Blast radius • How many customers? • What functionality? • How many locations?
  • 75. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 76. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Cell-based architecture XX ♤♡♢ ⚀ ⚁ ⚂ ⚃♧♢
  • 77. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding XX ♤♡♢ ⚀ ⚁⚂ ⚃♡ ♤ ♧♢ ⚀⚂♧ ⚁⚃♢ ♢ ♡ ♧♢
  • 78. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 8 Shard size = 2 Combinations = 28 Overlap % customers 0 53.6% 1 42.8% 2 3.6%
  • 79. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding Nodes = 100 Shard size = 5 Combinations = 75 million! Overlap % customers 0 77% 1 21% 2 1.8% 3 0.06% 4 0.0006% 5 0.0000013%
  • 80. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Shuffle sharding
  • 81. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about chaos!
  • 82. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fire Drills
  • 83. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. GameDay at Amazon Creating Resiliency Through Destruction https://www.youtube.com/watch?v=zoz0ZjfrQ9s
  • 84. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering https://github.com/Netflix/SimianArmy
  • 85. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Chaos Engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” http://principlesofchaos.org
  • 86. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Failure injection • Start small & build confidence • Application level • Host failure • Resource attacks (CPU, memory, …) • Network attacks (dependencies, latency, …) • Region attacks • “Paul” attack https://www.gremlin.comhttps://github.com/Netflix/SimianArmy https://chaostoolkit.org
  • 87. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bananas for Monkeys
  • 88. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to DDoS yourself ~ wrk -t12 -c400 -d30s http://127.0.0.1/api/health
  • 89. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Adding delay to the network ~ tc qdisc add dev eth0 root netem delay 200ms
  • 90. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://github.com/Netflix/SimianArmy Set of scheduled agent: • shuts down services randomly • slows down performances • checks conformity • breaks an entire region • Integrates with spinnaker (CI/CD)
  • 91. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Let’s talk about operational resiliency
  • 92. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Value realized by example Operational resilience 1. Scaled to handle a 400% increase in page views (Kurt Geiger) 2. Improved security posture (CapitalOne) 3. 8600 transactions/second (McDonalds) 4. Transfer of over 750 TB of data from pipeline inspection machinery (GE) 5. Processing over 75 billion market events daily (FINRA) 6. Critical applications run in multiple AZs, x-Regions for robust disaster recovery (Expedia) 7. Supports over 300,000 requests per minute to its API (Easy Taxi) 8. 60% reduced downtime (Trainline) 9. Migration of SAP on Oracle to AWS with zero unplanned downtime across five countries (Kellogg’s) 10. SAP availability boosted to 100% (MacMillan)
  • 93. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational Resilience Operational resilience Critical workloads run in Multiple AZs and Regions for robust DR (Expedia) Benefit of improving SLAs and reducing unplanned outages What is it? Example
  • 94. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. The cost of downtime Annual Fortune 1000 application downtime costs (IDC) $1.25 to $2.5B Average cost of a data breach (Ponemon Institute) $3.6M Cost/hr of a critical application failure (IDC) $500K to $1M Average cost/hr of downtime (Ponemon Institute) $474K Average cost per lost or stolen record (Ponemon Institute) $141
  • 95. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational resilience: Quantifying cost Cost Category % of Total Definition Third Parties 1.3% The cost of contractors, consultants, auditors and other specialists engaged to help resolve unplanned outages. Equipment 1.3% The cost of new equipment purchases and repairs, including refurbishment. Ex-post Activities 1.1% All after-the-fact incidental costs associated with business disruption and recovery. Recovery 2.9% Activities and associated costs that relate to bringing the organization’s networks and core systems back to a state of readiness. Detection 3.6% Activities associated with the initial discovery and subsequent investigation of the partial or complete outage incident. IT Productivity 8.4% The lost time and related expenses associated with IT personnel downtime. End-user Productivity 18.7% The lost time and related expenses associated with end-user downtime. Lost Revenue 28.2% The total revenue loss from customers and potential customers because of their inability to access core systems during the outage period. Business disruption 34.6% Additional economic loss of the outage, including reputational damages, customer churn and lost business opportunities. TOTAL 100.0%
  • 96. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational resilience: Case studies Migrated to AWS in 6 weeks with no downtime and improved availability to 99.99%+ Migrated all workloads to AWS to reduce downtime by 60% with an annual savings of £1.2M Rebuilt patient engagement portal on AWS and reduced downtime from 120 to <5 min / month Using AWS, Travelstart has seized opportunities in emerging markets and has cut operational costs by 43% and downtime by 25% With its on-premises setup, the availability of its system ran to 98%, but on its cloud infrastructure, this has risen to 99.965% Three 9’s to five 9’s “We no longer need to worry about data center, server, or hypervisor security…which allows us to focus our attention on securing our applications.”
  • 97. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. And before we go.
  • 98. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. DON’T blame people for failure…
  • 99. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. “Quality is not an act, it is a habit” Aristotle, some time around 350BC
  • 100. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://aws.amazon.com/wellarchitected
  • 101. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://medium.com/@adhorn
  • 102. Thank you! © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. { "name": "Sébastien Stormacq", "role": ”Technical Evangelist", "company": "Amazon Web Services”, "twitter": ”@sebsto”, ”github": ”sebsto” }
  • 103. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.