How to apply chaos engineering to serverless applications

Yan Cui
Yan CuiSpeaker at Self
how to apply chaos engineering to
Serverless applications
Yan Cui, @theburningmonk
Chaos
Engineering?
MUST KILL SERVERS!
RAWR!!
RAWR!!
@theburningmonk theburningmonk.com
“the discipline of experimenting on a system in order to build confidence in the
system’s capability to withstand turbulent conditions in production”
principlesofchaos.org
@theburningmonk theburningmonk.com
microservices death stars circa 2015
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
“the capacity to recover quickly from difficulties; toughness.”
resilience
/rɪˈzɪlɪəns/
noun
@theburningmonk theburningmonk.com
“the capacity to recover quickly from difficulties; toughness.”
resilience
/rɪˈzɪlɪəns/
noun
it’s not about
preventing failures!
everything fails, all the time
@theburningmonk theburningmonk.com
“You don't choose the moment, the moment chooses you!
You only choose how prepared you are when it does.”
Fire Chief Mike Burtch
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
anything that can go wrong, will go wrong.
MURPHY’s LAW
@theburningmonk theburningmonk.com
identify weaknesses before they manifest in system-wide, aberrant behaviors
GOAL
@theburningmonk theburningmonk.com
learn about the system’s behavior by observing it during a controlled experiments
HOW
@theburningmonk theburningmonk.com
learn about the system’s behavior by observing it during a controlled experiments
HOW
game days
failure injection
Yan Cui
http://theburningmonk.com
@theburningmonk
AWS user for 10 years
Yan Cui
http://theburningmonk.com
@theburningmonk
http://bit.ly/yubl-serverless
Yan Cui
http://theburningmonk.com
@theburningmonk
Developer Advocate @
How to apply chaos engineering to serverless applications
Yan Cui
http://theburningmonk.com
@theburningmonk
Independent Consultant
advisetraining delivery
theburningmonk.com/courses
theburningmonk.com/courses
realworldserverless.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
@theburningmonk theburningmonk.com
by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
@theburningmonk theburningmonk.com
Shared Responsibility Model
@theburningmonk theburningmonk.com
by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
@theburningmonk theburningmonk.com
chaos monkey kills an
EC2 instance
latency monkey induces
artificial delay in APIs
chaos gorilla kills an AWS
Availability Zone
chaos kong kills an entire
AWS region
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
there are no servers to kill!
SERVERLESS
@theburningmonk theburningmonk.com
improperly tuned timeouts
@theburningmonk theburningmonk.com
missing error handling
@theburningmonk theburningmonk.com
missing fallbacks
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
STEP 1.
define steady state
i.e. “what does normal look like”
@theburningmonk theburningmonk.com
STEP 2.
hypothesis that steady state continues in control and experimental group
e.g. “the system stays up if a server dies”
@theburningmonk theburningmonk.com
STEP 3.
inject realistic failures
e.g. “slow response from 3rd-party service”
@theburningmonk theburningmonk.com
STEP 4.
try to disprove hypothesis
i.e. “look for difference between control and experimental group”
@theburningmonk theburningmonk.com
latency inject latency to function invocation
@theburningmonk theburningmonk.com
“what if service X has elevated latency?”
@theburningmonk theburningmonk.com
API Gateway Lambda API Gateway Lambda
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
hypothesis: API would timeout and our try-catch
would handle it and return default response
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
result: function times out after 6s
(hypothesis is disproved)
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
API Gateway Lambda API Gateway Lambda
502
200
@theburningmonk theburningmonk.com
API Gateway Lambda API Gateway Lambda
3s timeout
6s timeout
@theburningmonk theburningmonk.com
API Gateway Lambda API Gateway Lambda
max 29s integration
max 15 mins timeout
@theburningmonk theburningmonk.com
and then there’s
cold starts…
@theburningmonk theburningmonk.com
TIL: most HTTP client libraries have default timeout of 60s.
API Gateway has an integration timeout of 29s.
Most Lambda functions default to timeout of 3-6s.
Don’t forget about the cold starts!
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
https://bit.ly/2Wvfort
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
outcome: a more resilient system
@theburningmonk theburningmonk.com
latency
exception
inject latency to function invocation
throws exception
@theburningmonk theburningmonk.com
latency
exception
statuscode
inject latency to function invocation
throws exception
return HTTP status code
@theburningmonk theburningmonk.com
latency
exception
statuscode
diskspace
inject latency to function invocation
throws exception
return HTTP status code
fills up /tmp directory
@theburningmonk theburningmonk.com
latency
exception
statuscode
diskspace
blacklist
inject latency to function invocation
throws exception
return HTTP status code
fills up /tmp directory
looses network connectivity
@theburningmonk theburningmonk.com
“what if DynamoDB has an elevated error rate?”
@theburningmonk theburningmonk.com
API Gateway Lambda DynamoDB
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
hypothesis: the AWS SDK retries would handle it
@theburningmonk theburningmonk.com
result: function times out after 6s
(hypothesis is disproved)
@theburningmonk theburningmonk.com
TIL: the js DynamoDB client defaults to 10 retries
with base delay of 50ms
@theburningmonk theburningmonk.com
TIL: the js DynamoDB client defaults to 10 retries
with base delay of 50ms
delay = Math.random() * (Math.pow(2, retryCount) * base)
this is Marc Brooker’s
fav formula!
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
action: set max retry count + fallback
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
outcome: a more resilient system
@theburningmonk theburningmonk.com
latency
exception
statuscode
diskspace
blacklist
inject latency to function invocation
throws exception
return HTTP status code
fills up /tmp directory
looses network connectivity
everything fails, all the time
@theburningmonk theburningmonk.com
by Russ Miles @russmiles
source https://medium.com/russmiles/chaos-engineering-for-the-business-17b723f26361
@theburningmonk theburningmonk.com
@theburningmonk theburningmonk.com
homeschool.dev/class/production-ready-serverless
https://theburningmonk.com/hire-me
AdviseTraining Delivery
“Fundamentally, Yan has improved our team by increasing our
ability to derive value from AWS and Lambda in particular.”
Nick Blair
Tech Lead
@theburningmonk
theburningmonk.com
github.com/theburningmonk
1 of 90

Recommended

How to win the game of trade-offs by
How to win the game of trade-offsHow to win the game of trade-offs
How to win the game of trade-offsYan Cui
21 views84 slides
How to choose the right messaging service by
How to choose the right messaging serviceHow to choose the right messaging service
How to choose the right messaging serviceYan Cui
135 views118 slides
How to choose the right messaging service for your workload by
How to choose the right messaging service for your workloadHow to choose the right messaging service for your workload
How to choose the right messaging service for your workloadYan Cui
65 views113 slides
Patterns and practices for building resilient serverless applications.pdf by
Patterns and practices for building resilient serverless applications.pdfPatterns and practices for building resilient serverless applications.pdf
Patterns and practices for building resilient serverless applications.pdfYan Cui
170 views137 slides
Lambda and DynamoDB best practices by
Lambda and DynamoDB best practicesLambda and DynamoDB best practices
Lambda and DynamoDB best practicesYan Cui
817 views148 slides
Lessons from running AppSync in prod by
Lessons from running AppSync in prodLessons from running AppSync in prod
Lessons from running AppSync in prodYan Cui
1.1K views102 slides

More Related Content

More from Yan Cui

How serverless changes the cost paradigm by
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigmYan Cui
1.1K views145 slides
Why your next serverless project should use AWS AppSync by
Why your next serverless project should use AWS AppSyncWhy your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSyncYan Cui
1.3K views178 slides
Build social network in 4 weeks by
Build social network in 4 weeksBuild social network in 4 weeks
Build social network in 4 weeksYan Cui
642 views153 slides
Patterns and practices for building resilient serverless applications by
Patterns and practices for building resilient serverless applicationsPatterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applicationsYan Cui
393 views130 slides
How to bring chaos engineering to serverless by
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverlessYan Cui
456 views92 slides
Migrating existing monolith to serverless in 8 steps by
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 stepsYan Cui
402 views156 slides

More from Yan Cui(20)

How serverless changes the cost paradigm by Yan Cui
How serverless changes the cost paradigmHow serverless changes the cost paradigm
How serverless changes the cost paradigm
Yan Cui1.1K views
Why your next serverless project should use AWS AppSync by Yan Cui
Why your next serverless project should use AWS AppSyncWhy your next serverless project should use AWS AppSync
Why your next serverless project should use AWS AppSync
Yan Cui1.3K views
Build social network in 4 weeks by Yan Cui
Build social network in 4 weeksBuild social network in 4 weeks
Build social network in 4 weeks
Yan Cui642 views
Patterns and practices for building resilient serverless applications by Yan Cui
Patterns and practices for building resilient serverless applicationsPatterns and practices for building resilient serverless applications
Patterns and practices for building resilient serverless applications
Yan Cui393 views
How to bring chaos engineering to serverless by Yan Cui
How to bring chaos engineering to serverlessHow to bring chaos engineering to serverless
How to bring chaos engineering to serverless
Yan Cui456 views
Migrating existing monolith to serverless in 8 steps by Yan Cui
Migrating existing monolith to serverless in 8 stepsMigrating existing monolith to serverless in 8 steps
Migrating existing monolith to serverless in 8 steps
Yan Cui402 views
Building a social network in under 4 weeks with Serverless and GraphQL by Yan Cui
Building a social network in under 4 weeks with Serverless and GraphQLBuilding a social network in under 4 weeks with Serverless and GraphQL
Building a social network in under 4 weeks with Serverless and GraphQL
Yan Cui289 views
FinDev as a business advantage in the post covid19 economy by Yan Cui
FinDev as a business advantage in the post covid19 economyFinDev as a business advantage in the post covid19 economy
FinDev as a business advantage in the post covid19 economy
Yan Cui546 views
How to improve lambda cold starts by Yan Cui
How to improve lambda cold startsHow to improve lambda cold starts
How to improve lambda cold starts
Yan Cui867 views
What can you do with lambda in 2020 by Yan Cui
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020
Yan Cui1K views
A chaos experiment a day, keeping the outage away by Yan Cui
A chaos experiment a day, keeping the outage awayA chaos experiment a day, keeping the outage away
A chaos experiment a day, keeping the outage away
Yan Cui385 views
How to debug slow lambda response times by Yan Cui
How to debug slow lambda response timesHow to debug slow lambda response times
How to debug slow lambda response times
Yan Cui317 views
What can you do with lambda in 2020 by Yan Cui
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020
Yan Cui679 views
How to ship customer value faster with step functions by Yan Cui
How to ship customer value faster with step functionsHow to ship customer value faster with step functions
How to ship customer value faster with step functions
Yan Cui680 views
Debugging Lambda timeouts by Yan Cui
Debugging Lambda timeoutsDebugging Lambda timeouts
Debugging Lambda timeouts
Yan Cui218 views
Serverless a superpower for frontend developers by Yan Cui
Serverless a superpower for frontend developersServerless a superpower for frontend developers
Serverless a superpower for frontend developers
Yan Cui591 views
Debugging AWS Lambda Performance Issues by Yan Cui
Debugging AWS Lambda Performance  IssuesDebugging AWS Lambda Performance  Issues
Debugging AWS Lambda Performance Issues
Yan Cui347 views
Patterns and Practices for Building Resilient Serverless Applications by Yan Cui
Patterns and Practices for Building Resilient Serverless ApplicationsPatterns and Practices for Building Resilient Serverless Applications
Patterns and Practices for Building Resilient Serverless Applications
Yan Cui272 views
Serverless Security: Defence Against the Dark Arts by Yan Cui
Serverless Security: Defence Against the Dark ArtsServerless Security: Defence Against the Dark Arts
Serverless Security: Defence Against the Dark Arts
Yan Cui291 views
What can you do with lambda in 2020 by Yan Cui
What can you do with lambda in 2020What can you do with lambda in 2020
What can you do with lambda in 2020
Yan Cui482 views

Recently uploaded

Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...ShapeBlue
178 views15 slides
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...ShapeBlue
120 views17 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
84 views20 slides
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
207 views54 slides
"Running students' code in isolation. The hard way", Yurii Holiuk by
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk Fwdays
36 views34 slides
Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1.1K views8 slides

Recently uploaded(20)

Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue178 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue207 views
"Running students' code in isolation. The hard way", Yurii Holiuk by Fwdays
"Running students' code in isolation. The hard way", Yurii Holiuk "Running students' code in isolation. The hard way", Yurii Holiuk
"Running students' code in isolation. The hard way", Yurii Holiuk
Fwdays36 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue by ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue224 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue199 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
The Power of Generative AI in Accelerating No Code Adoption.pdf by Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri39 views

How to apply chaos engineering to serverless applications