SlideShare a Scribd company logo
1 of 161
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Applying Principles of Chaos
Engineering to Serverless
Yan Cui
Principal Engineer
DAZN
D V C 3 0 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
What is chaos engineering?
New challenges with serverless
Applying latency injection to serverless
Applying error injection to serverless
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
After the talk
Slides will be shared on Slideshare
Recording will be posted on YouTube within 48 hours
Find the links on https://theburningmonk.com/reinvent2018
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
What is chaos engineering?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering is the discipline of experimenting on a distributed system
in order to build confidence in the system’s capability
to withstand turbulent conditions in production.
- principlesofchaos.org
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Smallpox
Earliest evidence of disease in third century BC Egyptian mummy
Estimated 400K deaths per year in eighteenth century Europe
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
History of vaccination
First vaccine was developed in
1798 by Edward Jenner
https://en.wikipedia.org/wiki/Edward_Jenner
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
History of vaccination
WHO certified global eradication
in 1980
https://en.wikipedia.org/wiki/Edward_Jenner
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://en.wikipedia.org/wiki/Vaccine
History of vaccination
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
History of vaccination
Vaccination is the most effective method to prevent infectious diseases
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
History of vaccination
Vaccines stimulate the immune system to recognize and destroy the
disease before contracting it for real
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering
Use controlled experiments to inject failures into our system
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering
Help us learn about our system’s behavior and uncover unknown failure
modes, before they manifest like wildfire in production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering
Lets us build confidence in its ability to withstand turbulent conditions
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering is the vaccine to frailties in modern software
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Who am I?
Principal engineer at DAZN
AWS Serverless hero
Author of Production-Ready Serverless* course by Manning.
Blogger**, speaker.
* https://bit.ly/production-ready-serverless
** https://theburningmonk.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
About DAZN
Available in seven countries—Austria, Switzerland, Germany,
Japan, Canada, Italy, and USA
Available on 30+ platforms
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
About DAZN
Around 1,000,000 concurrent viewers at peak
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering has an image problem
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering has an image problem
Too much emphasis is on breaking things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering has an image problem
Easy to conflate the action of injecting failures with the payback
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering has an image problem
The goal is to learn about the system and build confidence
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos engineering has an image problem
The goal is not to break things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Four steps to start running chaos
experiments yourself
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 1. Define “steady state”
What does normal, working
condition looks like?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
this is not a
steady state
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesize steady state will
continue in both control group
& the experiment group
In other words, you should have a reasonable degree of
confidence the system would handle the failure before you
proceed with the experiment
STEP 2.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Explore unknown unknowns away from production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Experiments that graduate to production should be carefully
considered and planned
You should have reasonable confidence in the system before
running experiments in production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Treat production with the care it deserves
The goal is not to break things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
If you knew the system would break and you did it anyway,
then it’s not a chaos experiment!
It’s called being irresponsible.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 3. Inject realistic failures
For example, server crash, network
error, HD malfunction, more
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Netflix’s Simian Army:
https://github.com/Netflix/SimianArmy
Chaos Engineering ebook (O’Reilly): http://oreil.ly/2tZU1Sn
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 4. Disprove hypothesis
In other words, look for difference
in steady state
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Look for evidence that steady state was impacted by the
injected failure
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Chaos in practice
Address weaknesses before failures happen for real
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Experiments needs to be controlled
The goal is not to break things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Ensure everyone knows what you are doing
Don’t surprise your teammates
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Run experiments during office hours
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Avoid important dates
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Make the smallest change necessary to prove or disprove hypothesis
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Have a rollback plan
Stop the experiment right away if things start to go wrong
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Containment
Don’t start in production
Can learn a lot by running experiments in staging
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New challenges with serverless
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
chaos monkey kills an
Amazon Elastic Cloud
(Amazon EC2) instance
latency monkey induces
artificial delay in APIs
chaos gorilla kills an AWS
Availability Zone
chaos kong kills an entire
AWS region
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
There are no servers that you can access and kill
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
There is more inherent chaos and complexity in a
serverless architecture.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
Smaller units of deployment, but a lot more of them
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
serverful
serverlessServerless challenges
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
Every function needs to be correctly configured and secured
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Kinesis
?
SNS
CloudWatch
Events
CloudWatch
LogsIoT
Core
DynamoDB
S3 SES
Serverless challenges
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
A lot of managed, intermediate services
Each with its own set of failure modes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
Unknown failure modes in the infrastructure we don’t control
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless challenges
Often there’s little we can do when an outage occurs in the platform
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common weaknesses
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common weaknesses
Improperly tuned timeouts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common weaknesses
Missing error handling
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common weaknesses
Missing fallback
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common weaknesses
Missing regional failover
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Latency injection with serverless
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 1. Define “steady state”
What does normal, working
condition looks like?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Defining steady state
What metrics do you use?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Defining steady state
p95/p99 latencies, error count, backlog size, yield*, harvest**
* percentage of requests completed
** completeness of the returned response
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesize steady state will
continue in both control group
& the experiment group
In other words, you should have a reasonable degree of
confidence the system would handle the failure before you
proceed with the experiment
STEP 2.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
API Gateway
Serverless considerations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Serverless considerations
Consider the effect of cold starts
How does it affect your strategy
for handling slow responses
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Strategy should:
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Strategy should:
1. Give requests the best chance to succeed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Strategy should:
1. Give requests the best chance to succeed
2. Do not allow slow response to timeout the caller function
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Finding the right timeout value is tricky
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Too short: requests not given the best chance to succeed
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Too long: risk timing out the calling function
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Even more complicated when you have multiple integration points
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Approach 1: Split invocation time equally
(for example, 3 requests, 6s function timeout = 2s timeout per request)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Approach 2: Every request is given nearly all the invocation time
(for example, 3 requests, 6s function timeout = 5s timeout per request)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
Proposal: set request timeouts dynamically based on
invocation time left
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Request timeouts
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Set timeout based on remaining invocation time
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Set timeout based on remaining invocation time
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recovery steps
Log the timeout with as much context as possible
The API, timeout value, correlation IDs, request object, and more
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recovery steps
Record custom metrics
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recovery steps
Use fallbacks
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recovery steps
Be mindful when you sacrifice precision for availability
User experience is the king
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 3. Inject realistic failures
For example, server crash, network
error, HD malfunction, more
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesis:
Function has appropriate timeout on its HTTP communications
and can degrade gracefully when these requests time out
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
Should be applied to third-party services too
DynamoDB, Twillio, Auth0 …
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
Be mindful of the blast radius of the experiment
The goal is not to break things
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
http client
public-api-a
http client
public-api-b
internal-api
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesis:
All functions have appropriate timeout on their HTTP
communications to this internal API and can degrade
gracefully when requests are timed out
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
Large blast radius, can cause cascade failures unintentionally
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Priming (psychology):
Priming is a technique whereby exposure to one stimulus
influences a response to a subsequent stimulus, without
conscious guidance or intention
It is a technique in psychology used to train a person's
memory both in positive and negative ways
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use failure injection to program your colleagues into
thinking about failure modes early.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
Make X% of all requests slow
in the dev environment
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesis:
The client app has appropriate timeout on their HTTP
communication with the server and can degrade gracefully
when requests are timed out
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 4. Disprove hypothesis
In other words, look for difference
in steady state
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to inject latency?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to inject latency?
Static weavers (such as PostSharp, AspectJ)
Dynamic proxies
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
https://theburningmonk.com/2015/04/design-for-latency-issues/
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to inject latency?
Manually crafted wrapper libraries
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Configured in SSM Parameter Store
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
No injected latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
With injected latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Factory wrapper function
(think bluebird’s promisifyAll function)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Error injection with serverless
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common errors
HTTP 5XX
Amazon DynamoDB provisioned throughput exceeded
Throttled AWS Lambda invocations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesis:
Function has appropriate error handling on its HTTP communications
and can degrade gracefully when downstream dependencies fail
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject errors?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesis:
Function has appropriate error handling on DynamoDB operations and
can degrade gracefully when DynamoDB throughputs are exceeded
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject errors?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Where to inject errors?
Induce Lambda throttling by temporarily setting reserve concurrency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Recap
Failures are INEVITABLE
The only way to truly know your system’s
resilience against failures is to test it
through CONTROLLED experiments
The goal of chaos engineering is NOT to
actually break production
CONTAINMENT should be front and
centre of your thinking
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 1. Define “steady state”
What does normal, working
condition looks like?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Hypothesize steady state will
continue in both control group
& the experiment group
In other words, you should have a reasonable degree of
confidence the system would handle the failure before you
proceed with the experiment
STEP 2.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 3. Inject realistic failures
For example, server crash, network
error, HD malfunction, more
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
STEP 4. Disprove hypothesis
In other words, look for difference
in steady state
There is more inherent chaos and
complexity in a serverless application
Even without servers, you can still inject
CONTROLLED failures at the application level
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Yan Cui
@theburningmonk
https://theburningmonk.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Related breakouts
Wednesday, Nov 28
SRV425-R - Best Practices for Building Multi-Region, Active-Active Serverless Applications
4:00PM – 5:00PM | Venetian, Level 4, Lando 4305
Wednesday, Nov 28
SRV343-R - Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway
4:45PM – 5:45PM | MGM, Level 1, South Concourse 105
Thursday, Nov 29
ARC308 - Chaos Engineering and Scalability at Audible.com
1:00PM – 2:00PM | Aria West, Level 3, Ironwood 5
Please complete the session
survey in the mobile app.
!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Amazon Web Services
 
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018Amazon Web Services
 
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018Amazon Web Services
 
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Amazon Web Services
 
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...Amazon Web Services
 
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...Amazon Web Services
 
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018Amazon Web Services
 
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...Amazon Web Services
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Amazon Web Services
 
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018Amazon Web Services
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Amazon Web Services
 
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Amazon Web Services
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Amazon Web Services
 
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018Amazon Web Services
 
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018Amazon Web Services
 
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...Amazon Web Services
 
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...Amazon Web Services
 
Amazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon Web Services
 
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)Amazon Web Services
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Amazon Web Services
 

What's hot (20)

Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
Layered Perimeter Protection for Apps Running on AWS (CTD201-R1) - AWS re:Inv...
 
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
Moving to DevOps the Amazon Way (DEV210-R1) - AWS re:Invent 2018
 
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018
Foundations of AWS Global Cloud Infrastructure (ARC217) - AWS re:Invent 2018
 
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
Mythical Mysfits: Management and Ops with AWS Fargate (CON322-R1) - AWS re:In...
 
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...
Building Modern Platforms: A Practical Way to Migrate Legacy Systems to Amazo...
 
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
Fully Realizing the Microservices Vision with Service Mesh (DEV312-S) - AWS r...
 
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
What's New with the AWS CLI (DEV322-R1) - AWS re:Invent 2018
 
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...
Scale Your Studio: Rendering with Spot and Deadline on AWS (CMP202) - AWS re:...
 
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
Closing Loops and Opening Minds: How to Take Control of Systems, Big and Smal...
 
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018
A Few Milliseconds in the Life of an HTTP Request (CTD416) - AWS re:Invent 2018
 
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
Proven Methodologies for Accelerating Your Cloud Journey (ENT308-S) - AWS re:...
 
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
Analyze Slide Images and Process Phenotypic Assays at Scale on AWS (CMP358) -...
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
 
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018
Amazon VPC: Security at the Speed Of Light (NET313) - AWS re:Invent 2018
 
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018
[NEW LAUNCH!] Introduction to AWS Security Hub (SEC397) - AWS re:Invent 2018
 
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
Announcing AWS RoboMaker: A New Cloud Robotics Service (ROB201-R) - AWS re:In...
 
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...
The Theory and Math Behind Data Privacy and Security Assurance (SEC301) - AWS...
 
Amazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and RemediationAmazon GuardDuty Threat Detection and Remediation
Amazon GuardDuty Threat Detection and Remediation
 
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)
使用 AWS Step Functions 靈活調度 AWS Lambda (Level:200)
 
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
Operational Excellence with Containerized Workloads Using AWS Fargate (CON320...
 

Similar to Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Invent 2018

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringAmazon Web Services
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with KubernetesArun Gupta
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Amazon Web Services
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedAWS User Group Bengaluru
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayAmazon Web Services
 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Amazon Web Services
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Adrian Hornsby
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudAmazon Web Services
 
打造新一代的企業 IT - Transforming Enterprise IT
打造新一代的企業 IT - Transforming Enterprise IT打造新一代的企業 IT - Transforming Enterprise IT
打造新一代的企業 IT - Transforming Enterprise ITAmazon Web Services
 
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...Amazon Web Services
 
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018Amazon Web Services
 
Life of a Code Change to a Tier 1 Service - AWS Online Tech Talks
Life of a Code Change to a Tier 1 Service - AWS Online Tech TalksLife of a Code Change to a Tier 1 Service - AWS Online Tech Talks
Life of a Code Change to a Tier 1 Service - AWS Online Tech TalksAmazon Web Services
 
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...Amazon Web Services
 
Creating resiliency through destruction
Creating resiliency through destructionCreating resiliency through destruction
Creating resiliency through destructionAmazon Web Services
 
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Amazon Web Services
 
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...Amazon Web Services
 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Amazon Web Services
 

Similar to Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Invent 2018 (20)

Keynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos EngineeringKeynote - Adrian Hornsby on Chaos Engineering
Keynote - Adrian Hornsby on Chaos Engineering
 
Chaos Engineering with Kubernetes
Chaos Engineering with KubernetesChaos Engineering with Kubernetes
Chaos Engineering with Kubernetes
 
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
Chaos Engineering: Why Breaking Things Should Be Practiced - AWS Developer Wo...
 
Keynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practicedKeynote - Chaos Engineering: Why breaking things should be practiced
Keynote - Chaos Engineering: Why breaking things should be practiced
 
Modern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat WayModern Application Delivery on AWS: the Red Hat Way
Modern Application Delivery on AWS: the Red Hat Way
 
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
Chaos Engineering and Scalability at Audible.com (ARC308) - AWS re:Invent 2018
 
Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.Chaos Engineering: Why Breaking Things Should Be Practised.
Chaos Engineering: Why Breaking Things Should Be Practised.
 
Resiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the CloudResiliency and Availability Design Patterns for the Cloud
Resiliency and Availability Design Patterns for the Cloud
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
打造新一代的企業 IT - Transforming Enterprise IT
打造新一代的企業 IT - Transforming Enterprise IT打造新一代的企業 IT - Transforming Enterprise IT
打造新一代的企業 IT - Transforming Enterprise IT
 
TECHTalks - Boston MA - Tim Harney
TECHTalks - Boston MA - Tim HarneyTECHTalks - Boston MA - Tim Harney
TECHTalks - Boston MA - Tim Harney
 
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...
Serverless + Evolutionary Architectures + Safe Deployments = Speed in the Rig...
 
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018
Releasing Mission-Critical Software at Amazon (DEV209-R1) - AWS re:Invent 2018
 
Life of a Code Change to a Tier 1 Service - AWS Online Tech Talks
Life of a Code Change to a Tier 1 Service - AWS Online Tech TalksLife of a Code Change to a Tier 1 Service - AWS Online Tech Talks
Life of a Code Change to a Tier 1 Service - AWS Online Tech Talks
 
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...
ALB User Authentication: Identity Management at Scale with Netflix (NET204) -...
 
Creating resiliency through destruction
Creating resiliency through destructionCreating resiliency through destruction
Creating resiliency through destruction
 
Industrial Transformation
Industrial TransformationIndustrial Transformation
Industrial Transformation
 
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
Safeguard the Integrity of Your Code for Fast and Secure Deployments (DEV349-...
 
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
[REPEAT 1] Safeguard the Integrity of Your Code for Fast and Secure Deploymen...
 
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
Globalizing Player Accounts at Riot Games While Maintaining Availability (ARC...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Applying Principles of Chaos Engineering to Serverless (DVC305) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Applying Principles of Chaos Engineering to Serverless Yan Cui Principal Engineer DAZN D V C 3 0 5
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda What is chaos engineering? New challenges with serverless Applying latency injection to serverless Applying error injection to serverless
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. After the talk Slides will be shared on Slideshare Recording will be posted on YouTube within 48 hours Find the links on https://theburningmonk.com/reinvent2018
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What is chaos engineering?
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering is the discipline of experimenting on a distributed system in order to build confidence in the system’s capability to withstand turbulent conditions in production. - principlesofchaos.org
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Smallpox Earliest evidence of disease in third century BC Egyptian mummy Estimated 400K deaths per year in eighteenth century Europe
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. History of vaccination First vaccine was developed in 1798 by Edward Jenner https://en.wikipedia.org/wiki/Edward_Jenner
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. History of vaccination WHO certified global eradication in 1980 https://en.wikipedia.org/wiki/Edward_Jenner
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://en.wikipedia.org/wiki/Vaccine History of vaccination
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. History of vaccination Vaccination is the most effective method to prevent infectious diseases
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. History of vaccination Vaccines stimulate the immune system to recognize and destroy the disease before contracting it for real
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering Use controlled experiments to inject failures into our system
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering Help us learn about our system’s behavior and uncover unknown failure modes, before they manifest like wildfire in production
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering Lets us build confidence in its ability to withstand turbulent conditions
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering is the vaccine to frailties in modern software
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Who am I? Principal engineer at DAZN AWS Serverless hero Author of Production-Ready Serverless* course by Manning. Blogger**, speaker. * https://bit.ly/production-ready-serverless ** https://theburningmonk.com
  • 18.
  • 19.
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. About DAZN Available in seven countries—Austria, Switzerland, Germany, Japan, Canada, Italy, and USA Available on 30+ platforms
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. About DAZN Around 1,000,000 concurrent viewers at peak
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering has an image problem
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering has an image problem Too much emphasis is on breaking things
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering has an image problem Easy to conflate the action of injecting failures with the payback
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering has an image problem The goal is to learn about the system and build confidence
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos engineering has an image problem The goal is not to break things
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Four steps to start running chaos experiments yourself
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 1. Define “steady state” What does normal, working condition looks like?
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. this is not a steady state
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesize steady state will continue in both control group & the experiment group In other words, you should have a reasonable degree of confidence the system would handle the failure before you proceed with the experiment STEP 2.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Explore unknown unknowns away from production
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Experiments that graduate to production should be carefully considered and planned You should have reasonable confidence in the system before running experiments in production
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Treat production with the care it deserves The goal is not to break things
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice If you knew the system would break and you did it anyway, then it’s not a chaos experiment! It’s called being irresponsible.
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 3. Inject realistic failures For example, server crash, network error, HD malfunction, more
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Netflix’s Simian Army: https://github.com/Netflix/SimianArmy Chaos Engineering ebook (O’Reilly): http://oreil.ly/2tZU1Sn
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 4. Disprove hypothesis In other words, look for difference in steady state
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Look for evidence that steady state was impacted by the injected failure
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Chaos in practice Address weaknesses before failures happen for real
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Experiments needs to be controlled The goal is not to break things
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Ensure everyone knows what you are doing Don’t surprise your teammates
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Run experiments during office hours
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Avoid important dates
  • 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Make the smallest change necessary to prove or disprove hypothesis
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Have a rollback plan Stop the experiment right away if things start to go wrong
  • 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Containment Don’t start in production Can learn a lot by running experiments in staging
  • 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. by Russ Miles @russmiles source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
  • 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. New challenges with serverless
  • 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. chaos monkey kills an Amazon Elastic Cloud (Amazon EC2) instance latency monkey induces artificial delay in APIs chaos gorilla kills an AWS Availability Zone chaos kong kills an entire AWS region
  • 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges There are no servers that you can access and kill
  • 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There is more inherent chaos and complexity in a serverless architecture.
  • 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges Smaller units of deployment, but a lot more of them
  • 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. serverful serverlessServerless challenges
  • 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges Every function needs to be correctly configured and secured
  • 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kinesis ? SNS CloudWatch Events CloudWatch LogsIoT Core DynamoDB S3 SES Serverless challenges
  • 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges A lot of managed, intermediate services Each with its own set of failure modes
  • 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges Unknown failure modes in the infrastructure we don’t control
  • 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless challenges Often there’s little we can do when an outage occurs in the platform
  • 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common weaknesses
  • 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common weaknesses Improperly tuned timeouts
  • 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common weaknesses Missing error handling
  • 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common weaknesses Missing fallback
  • 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common weaknesses Missing regional failover
  • 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Latency injection with serverless
  • 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 1. Define “steady state” What does normal, working condition looks like?
  • 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Defining steady state What metrics do you use?
  • 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Defining steady state p95/p99 latencies, error count, backlog size, yield*, harvest** * percentage of requests completed ** completeness of the returned response
  • 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesize steady state will continue in both control group & the experiment group In other words, you should have a reasonable degree of confidence the system would handle the failure before you proceed with the experiment STEP 2.
  • 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. API Gateway Serverless considerations
  • 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Serverless considerations Consider the effect of cold starts How does it affect your strategy for handling slow responses
  • 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Strategy should:
  • 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Strategy should: 1. Give requests the best chance to succeed
  • 76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Strategy should: 1. Give requests the best chance to succeed 2. Do not allow slow response to timeout the caller function
  • 77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Finding the right timeout value is tricky
  • 78. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Too short: requests not given the best chance to succeed
  • 79. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Too long: risk timing out the calling function
  • 80. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Even more complicated when you have multiple integration points
  • 81. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Approach 1: Split invocation time equally (for example, 3 requests, 6s function timeout = 2s timeout per request)
  • 82. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Approach 2: Every request is given nearly all the invocation time (for example, 3 requests, 6s function timeout = 5s timeout per request)
  • 83. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts Proposal: set request timeouts dynamically based on invocation time left
  • 84. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Request timeouts
  • 85. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Set timeout based on remaining invocation time
  • 86. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Set timeout based on remaining invocation time
  • 87. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recovery steps Log the timeout with as much context as possible The API, timeout value, correlation IDs, request object, and more
  • 88. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recovery steps Record custom metrics
  • 89. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recovery steps Use fallbacks
  • 90. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 91. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recovery steps Be mindful when you sacrifice precision for availability User experience is the king
  • 92. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 3. Inject realistic failures For example, server crash, network error, HD malfunction, more
  • 93. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 94. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesis: Function has appropriate timeout on its HTTP communications and can degrade gracefully when these requests time out
  • 95. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 96. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency? Should be applied to third-party services too DynamoDB, Twillio, Auth0 …
  • 97. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 98. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency? Be mindful of the blast radius of the experiment The goal is not to break things
  • 99. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. http client public-api-a http client public-api-b internal-api Where to inject latency?
  • 100. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesis: All functions have appropriate timeout on their HTTP communications to this internal API and can degrade gracefully when requests are timed out
  • 101. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 102. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 103. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency? Large blast radius, can cause cascade failures unintentionally
  • 104. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 105. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Priming (psychology): Priming is a technique whereby exposure to one stimulus influences a response to a subsequent stimulus, without conscious guidance or intention It is a technique in psychology used to train a person's memory both in positive and negative ways
  • 106. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Use failure injection to program your colleagues into thinking about failure modes early.
  • 107. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency? Make X% of all requests slow in the dev environment
  • 108. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesis: The client app has appropriate timeout on their HTTP communication with the server and can degrade gracefully when requests are timed out
  • 109. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 110. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 111. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 112. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject latency?
  • 113. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 4. Disprove hypothesis In other words, look for difference in steady state
  • 114. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to inject latency?
  • 115. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to inject latency? Static weavers (such as PostSharp, AspectJ) Dynamic proxies
  • 116. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. https://theburningmonk.com/2015/04/design-for-latency-issues/
  • 117. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to inject latency? Manually crafted wrapper libraries
  • 118. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 119. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 120. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 121. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 122. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 123. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Configured in SSM Parameter Store
  • 124. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 125. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. No injected latency
  • 126. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 127. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. With injected latency
  • 128. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 129. Factory wrapper function (think bluebird’s promisifyAll function)
  • 130. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 131. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 132. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 133. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 134. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 135. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Error injection with serverless
  • 136. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Common errors HTTP 5XX Amazon DynamoDB provisioned throughput exceeded Throttled AWS Lambda invocations
  • 137. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesis: Function has appropriate error handling on its HTTP communications and can degrade gracefully when downstream dependencies fail
  • 138. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject errors?
  • 139. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesis: Function has appropriate error handling on DynamoDB operations and can degrade gracefully when DynamoDB throughputs are exceeded
  • 140. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject errors?
  • 141. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Where to inject errors? Induce Lambda throttling by temporarily setting reserve concurrency
  • 142. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recap
  • 144. The only way to truly know your system’s resilience against failures is to test it through CONTROLLED experiments
  • 145.
  • 146. The goal of chaos engineering is NOT to actually break production
  • 147. CONTAINMENT should be front and centre of your thinking
  • 148.
  • 149. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 1. Define “steady state” What does normal, working condition looks like?
  • 150. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Hypothesize steady state will continue in both control group & the experiment group In other words, you should have a reasonable degree of confidence the system would handle the failure before you proceed with the experiment STEP 2.
  • 151. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 3. Inject realistic failures For example, server crash, network error, HD malfunction, more
  • 152. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. STEP 4. Disprove hypothesis In other words, look for difference in steady state
  • 153. There is more inherent chaos and complexity in a serverless application
  • 154. Even without servers, you can still inject CONTROLLED failures at the application level
  • 155. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 156. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 157. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 158. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 159. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Yan Cui @theburningmonk https://theburningmonk.com
  • 160. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Related breakouts Wednesday, Nov 28 SRV425-R - Best Practices for Building Multi-Region, Active-Active Serverless Applications 4:00PM – 5:00PM | Venetian, Level 4, Lando 4305 Wednesday, Nov 28 SRV343-R - Best Practices for Safe Deployments on AWS Lambda and Amazon API Gateway 4:45PM – 5:45PM | MGM, Level 1, South Concourse 105 Thursday, Nov 29 ARC308 - Chaos Engineering and Scalability at Audible.com 1:00PM – 2:00PM | Aria West, Level 3, Ironwood 5
  • 161. Please complete the session survey in the mobile app. ! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.