SlideShare a Scribd company logo
1 of 46
Download to read offline
Applying Chaos Engineering to build resilient
serverless applications
Emrah Şamdan
(@emrahsamdan)
4/25/2019
Who am I?
● Developer for 6+ years
● Product guy for 2 years
● VP of Product for Thundra
● Organizing committee
● Serverlessdays İstanbul
On October 11st!
Agenda
● What’s chaos engineering?
● Why chaos testing on serverless?
● Best practices on chaos testing for serverless
● How to apply chaos testing on AWS Lambda
● How to apply silence in a world of chaos
Why chaos engineering?
Unit Tests
● My function is running properly
and meets the expectations.
Integration Tests
● My system is running properly
and meets the expectations.
UI/UX Tests
● It is like a charm!
Why chaos engineering?
Unit Tests
● My function is running properly
and meets the expectations.
Integration Tests
● My system is running properly
and meets the expectations.
UI/UX Tests
● It is like a charm!
Your third party API slows down so badly..
Some part of your system becomes unreachable.
Your cache/DB is down so you can’t load your data.
Chaos Engineering is the discipline of experimenting on a system
in order to build confidence in the system’s capability
to withstand turbulent conditions in production.
http://principlesofchaos.org/
Chaos Engineering is
● Like injecting vaccine to your system to make it more
immune
● To improve your system’s resilience by uncovering
weaknesses.
● Identifying failures before they become outages.
● To understand the steady state of your system and
challenge it.
Chaos Engineering is not
● Breaking down production for purpose.
● For blaming a group of people.
● Surprising your colleagues with partial outages.
● Taking down all the system at the same time.
History of chaos engineering?
2010 2011 2014 2019
Companies applying Chaos Engineering
States of chaos engineering
● Define steady state
● Hypothesis on steady state of the system with the designed failure
● Run your experiment
○ Define blast radius
○ Define halting condition
○ Have a rollback plan!
● Verify & Learn
○ If your system breaks you understood an issue before it causes an outage. Go fix it!
○ If it is resilient, congrats! Now, inject some other failure!
Don’t break on purpose!
● Start experimenting with the first row, the
leftmost cell: Known-knowns.
● Blast radius: The effect will make the
smallest effect.
● Put a stop button somewhere!
● Plan how you learn.
● You don’t need to do it on production for
the first time.
● The most important Let the other people
know! Surprising chaos is not funny. No, at
all!
Chaos examples
● Your system keeps records on the DB.
● DB is returning too slow for 1% of your customers.
Hypothesis: The system won’t experience an outage when DB is hardly
accessible.
Chaos examples
● Your system keeps records on the DB.
● DB is returning too slow for 1% of your customers.
Hypothesis: The system won’t experience an outage when DB is hardly
accessible.
Result: People experiences timeouts while waiting for results.
You never fail!
Chaos when everything is more granular.
SERVERLESS
More Granular Functions
More Granular Functions
More Granular Functions
Every service has its own failure mode
Lots of managed intermediate service which has its own bad-day
characteristics.
Different throttling, different retry mechanisms for different services.
Every function has its own configuration
● Timeouts
● IAM Roles
What would you do when your region is down?
Common weaknesses in serverless
● Nested functions with improper timeouts
Common weaknesses in serverless
● Unhandled errors from upstream services
Common weaknesses in serverless
● Failures in resources
Chaos experiments in serverless
● Inject latency to downstream services
● Inject failure to resources
Injecting latency
● Don’t attack your system.
● You don’t need to do on prod
first.
● There is no point to inject
latency to async calls.
Hypothesis: Entry point Lambda will
degrade gracefully when the
downstream Lambda times out or turns
really late.
Where else to inject?
Inject latency to resources, too.
How to inject latency
Injecting Latency to resources by Yan Cui
How to inject latency with Thundra
Injecting Error
● Connection errors with third party services
● Cache down
● AWS Resource is unreachable
What if we lose the connection to Redis?
Let’s inject error to Redis with Thundra
Common fixes
● Exponential backoff
● Properly tunes timeouts
● Circuit breakers
● Use async communication when possible
Don’t forget! Aim is
● Not to break but to improve
● Not to blame people but to give them room to fix
● Not to surprise your colleagues but to make your system resilient
Thank you !

More Related Content

What's hot

MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB
 
iOS Scroll Performance
iOS Scroll PerformanceiOS Scroll Performance
iOS Scroll PerformanceKyle Sherman
 
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...LetAgileFly
 
Faster to Master without Disaster
Faster to Master without DisasterFaster to Master without Disaster
Faster to Master without DisasterPat Hermens
 
Kanban presentation
Kanban presentationKanban presentation
Kanban presentationBijo Joseph
 
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014Matthew Cobby
 
Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Arto Santala
 
DevOps: Building by feature with immutable infrastructure at Serv.sg
DevOps: Building by feature with immutable infrastructure at Serv.sgDevOps: Building by feature with immutable infrastructure at Serv.sg
DevOps: Building by feature with immutable infrastructure at Serv.sgNicolas Mas
 
DevOps: Getting Started with Puppet on Windows
DevOps: Getting Started with Puppet on WindowsDevOps: Getting Started with Puppet on Windows
DevOps: Getting Started with Puppet on WindowsRob Reynolds
 
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...Gáspár Nagy
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
Agile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_pptAgile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_pptHitesh Kumar
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsMatthew Cwalinski
 
Kanban - A Crash Course
Kanban - A Crash CourseKanban - A Crash Course
Kanban - A Crash CourseSam McAfee
 
Kanban Basics for Beginners Revised
Kanban Basics for Beginners RevisedKanban Basics for Beginners Revised
Kanban Basics for Beginners RevisedZsolt Fabok
 
Building software by feature with immutable infrastructures on AWS
Building software by feature with immutable infrastructures on AWSBuilding software by feature with immutable infrastructures on AWS
Building software by feature with immutable infrastructures on AWSNicolas Mas
 

What's hot (20)

MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos MonkeyMongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
MongoDB World 2018: Tutorial - MongoDB Meets Chaos Monkey
 
iOS Scroll Performance
iOS Scroll PerformanceiOS Scroll Performance
iOS Scroll Performance
 
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...
Scrum Gathering 2012 Shanghai_敏捷测试与质量管理分会场演讲话题:getting to done by testing at ...
 
Faster to Master without Disaster
Faster to Master without DisasterFaster to Master without Disaster
Faster to Master without Disaster
 
Kanban Methodology
Kanban MethodologyKanban Methodology
Kanban Methodology
 
Kanban presentation
Kanban presentationKanban presentation
Kanban presentation
 
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014
Practical Continuous Deployment - Atlassian - London AUG 18 Feb 2014
 
Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)Automate Everything! (No stress development/Tallinn)
Automate Everything! (No stress development/Tallinn)
 
DevOps: Building by feature with immutable infrastructure at Serv.sg
DevOps: Building by feature with immutable infrastructure at Serv.sgDevOps: Building by feature with immutable infrastructure at Serv.sg
DevOps: Building by feature with immutable infrastructure at Serv.sg
 
Kanban stand-up meetings
Kanban stand-up meetingsKanban stand-up meetings
Kanban stand-up meetings
 
DevOps: Getting Started with Puppet on Windows
DevOps: Getting Started with Puppet on WindowsDevOps: Getting Started with Puppet on Windows
DevOps: Getting Started with Puppet on Windows
 
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
Scaffolding a legacy app with BDD scenarios using SpecFlow/Cucumber (HUSTEF 2...
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Agile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_pptAgile_SDLC_Node.js@Paypal_ppt
Agile_SDLC_Node.js@Paypal_ppt
 
Cypress testing
Cypress testingCypress testing
Cypress testing
 
Developer day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast DeploymentsDeveloper day - AWS: Fast Environments = Fast Deployments
Developer day - AWS: Fast Environments = Fast Deployments
 
Kanban - A Crash Course
Kanban - A Crash CourseKanban - A Crash Course
Kanban - A Crash Course
 
Kanban Basics for Beginners Revised
Kanban Basics for Beginners RevisedKanban Basics for Beginners Revised
Kanban Basics for Beginners Revised
 
Building software by feature with immutable infrastructures on AWS
Building software by feature with immutable infrastructures on AWSBuilding software by feature with immutable infrastructures on AWS
Building software by feature with immutable infrastructures on AWS
 

Similar to Applying Chaos Engineering to Build Resilient Serverless Applications

Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...
Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...
Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...PROIDEA
 
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...Thoughtworks
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automationRicardo Bánffy
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-KnolxKnoldus Inc.
 
Cypress Best Pratices for Test Automation
Cypress Best Pratices for Test AutomationCypress Best Pratices for Test Automation
Cypress Best Pratices for Test AutomationKnoldus Inc.
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsPeter Mounce
 
Test-Driven Development (TDD) in Swift
Test-Driven Development (TDD) in SwiftTest-Driven Development (TDD) in Swift
Test-Driven Development (TDD) in SwiftAmey Tavkar
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesAlex Cruise
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowAdam Doyle
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos EngineeringYury Roa
 
Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance TestingAjay Kumar Vaddadi
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Martin Spier
 
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...dcieslak
 
Super fast end-to-end-tests
Super fast end-to-end-testsSuper fast end-to-end-tests
Super fast end-to-end-testsLars Thorup
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesAshutosh Agarwal
 
Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!Izzet Mustafaiev
 
Software Testing
Software TestingSoftware Testing
Software TestingAndrew Wang
 

Similar to Applying Chaos Engineering to Build Resilient Serverless Applications (20)

Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...
Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...
Applying Chaos Engineering to build Resilient Serverless Applications - Emrah...
 
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...
Laptop Devops: Putting Modern Infrastructure Automation to Work For Local Dev...
 
Monitoring and automation
Monitoring and automationMonitoring and automation
Monitoring and automation
 
chaos-engineering-Knolx
chaos-engineering-Knolxchaos-engineering-Knolx
chaos-engineering-Knolx
 
Cypress Best Pratices for Test Automation
Cypress Best Pratices for Test AutomationCypress Best Pratices for Test Automation
Cypress Best Pratices for Test Automation
 
JUST EAT: Embracing DevOps
JUST EAT: Embracing DevOpsJUST EAT: Embracing DevOps
JUST EAT: Embracing DevOps
 
Test-Driven Development (TDD) in Swift
Test-Driven Development (TDD) in SwiftTest-Driven Development (TDD) in Swift
Test-Driven Development (TDD) in Swift
 
Tdd in swift
Tdd in swiftTdd in swift
Tdd in swift
 
DevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 SlidesDevOps Days Vancouver 2014 Slides
DevOps Days Vancouver 2014 Slides
 
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflowMay 2021 Spark Testing ... or how to farm reputation on StackOverflow
May 2021 Spark Testing ... or how to farm reputation on StackOverflow
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
Automated Fault Tolerance Testing
Automated Fault Tolerance TestingAutomated Fault Tolerance Testing
Automated Fault Tolerance Testing
 
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)Ensuring Performance in a Fast-Paced Environment (CMG 2014)
Ensuring Performance in a Fast-Paced Environment (CMG 2014)
 
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
RandomTest - Random Software Integration Tests That Just Work for C/C++, Java...
 
HowTo DR
HowTo DRHowTo DR
HowTo DR
 
Fast end-to-end-tests
Fast end-to-end-testsFast end-to-end-tests
Fast end-to-end-tests
 
Super fast end-to-end-tests
Super fast end-to-end-testsSuper fast end-to-end-tests
Super fast end-to-end-tests
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!Fault tolerance - look, it's simple!
Fault tolerance - look, it's simple!
 
Software Testing
Software TestingSoftware Testing
Software Testing
 

Recently uploaded

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 

Recently uploaded (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 

Applying Chaos Engineering to Build Resilient Serverless Applications

  • 1. Applying Chaos Engineering to build resilient serverless applications Emrah Şamdan (@emrahsamdan) 4/25/2019
  • 2. Who am I? ● Developer for 6+ years ● Product guy for 2 years ● VP of Product for Thundra ● Organizing committee ● Serverlessdays İstanbul On October 11st!
  • 3. Agenda ● What’s chaos engineering? ● Why chaos testing on serverless? ● Best practices on chaos testing for serverless ● How to apply chaos testing on AWS Lambda ● How to apply silence in a world of chaos
  • 4. Why chaos engineering? Unit Tests ● My function is running properly and meets the expectations. Integration Tests ● My system is running properly and meets the expectations. UI/UX Tests ● It is like a charm!
  • 5. Why chaos engineering? Unit Tests ● My function is running properly and meets the expectations. Integration Tests ● My system is running properly and meets the expectations. UI/UX Tests ● It is like a charm!
  • 6.
  • 7.
  • 8. Your third party API slows down so badly..
  • 9. Some part of your system becomes unreachable.
  • 10. Your cache/DB is down so you can’t load your data.
  • 11. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production. http://principlesofchaos.org/
  • 12. Chaos Engineering is ● Like injecting vaccine to your system to make it more immune ● To improve your system’s resilience by uncovering weaknesses. ● Identifying failures before they become outages. ● To understand the steady state of your system and challenge it.
  • 13. Chaos Engineering is not ● Breaking down production for purpose. ● For blaming a group of people. ● Surprising your colleagues with partial outages. ● Taking down all the system at the same time.
  • 14.
  • 15. History of chaos engineering? 2010 2011 2014 2019
  • 17. States of chaos engineering ● Define steady state ● Hypothesis on steady state of the system with the designed failure ● Run your experiment ○ Define blast radius ○ Define halting condition ○ Have a rollback plan! ● Verify & Learn ○ If your system breaks you understood an issue before it causes an outage. Go fix it! ○ If it is resilient, congrats! Now, inject some other failure!
  • 18. Don’t break on purpose! ● Start experimenting with the first row, the leftmost cell: Known-knowns. ● Blast radius: The effect will make the smallest effect. ● Put a stop button somewhere! ● Plan how you learn. ● You don’t need to do it on production for the first time. ● The most important Let the other people know! Surprising chaos is not funny. No, at all!
  • 19. Chaos examples ● Your system keeps records on the DB. ● DB is returning too slow for 1% of your customers. Hypothesis: The system won’t experience an outage when DB is hardly accessible.
  • 20. Chaos examples ● Your system keeps records on the DB. ● DB is returning too slow for 1% of your customers. Hypothesis: The system won’t experience an outage when DB is hardly accessible. Result: People experiences timeouts while waiting for results.
  • 21.
  • 23. Chaos when everything is more granular. SERVERLESS
  • 27. Every service has its own failure mode Lots of managed intermediate service which has its own bad-day characteristics. Different throttling, different retry mechanisms for different services.
  • 28. Every function has its own configuration ● Timeouts ● IAM Roles
  • 29.
  • 30. What would you do when your region is down?
  • 31.
  • 32. Common weaknesses in serverless ● Nested functions with improper timeouts
  • 33. Common weaknesses in serverless ● Unhandled errors from upstream services
  • 34. Common weaknesses in serverless ● Failures in resources
  • 35. Chaos experiments in serverless ● Inject latency to downstream services ● Inject failure to resources
  • 36. Injecting latency ● Don’t attack your system. ● You don’t need to do on prod first. ● There is no point to inject latency to async calls. Hypothesis: Entry point Lambda will degrade gracefully when the downstream Lambda times out or turns really late.
  • 37. Where else to inject? Inject latency to resources, too.
  • 38. How to inject latency
  • 39. Injecting Latency to resources by Yan Cui
  • 40. How to inject latency with Thundra
  • 41. Injecting Error ● Connection errors with third party services ● Cache down ● AWS Resource is unreachable
  • 42. What if we lose the connection to Redis?
  • 43. Let’s inject error to Redis with Thundra
  • 44. Common fixes ● Exponential backoff ● Properly tunes timeouts ● Circuit breakers ● Use async communication when possible
  • 45. Don’t forget! Aim is ● Not to break but to improve ● Not to blame people but to give them room to fix ● Not to surprise your colleagues but to make your system resilient