SlideShare a Scribd company logo
1 of 42
CHAOS ENGINEERING
– OR LET'S SHAKE THE
TREE
J I M M Y D A H L Q V I S T , K N O W I T
" Failures are given and
everything will eventually
fail over time "
Werner Vogels
CTO – Amazon.com
TRIBUTE
• Nora Jones
• Adrian Cockroft
• Adrian Hornsby
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
" By running experiments on a
regular basis that simulate a
Regional outage, we were able to
identify any systemic weaknesses
early and fix them. "
Netflix Blog
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix – Chaos engineering book.
2018 Concept is spread, ChaosConf started.
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix – Chaos engineering book.
2018 Concept is spread, ChaosConf started.
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent conditions
in production.
" Failures are given and
everything will eventually
fail over time "
Werner Vogels
CTO – Amazon.com
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent conditions
in production.
Unit testing
Component X
Input Output
Integration testing
Component A
Input OutputOutput / Input
Component B
Distributed System
Input
Output
Distributed System
Input
Output Corrupt?
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent
conditions in production.
" Chaos doesn't cause problems.
It reveals them. "
Nora Jones
Slack - Head of chaos engineering
and human factors
Before practicing chaos
Socialize
Start small
Use an opt in model, not an opt out.
Only include services that like to be chaosed.
Start with a success!
Don't start in production.
Steady state
Define the steady state. Build a hypothesis about the
steady state. What does our system look like when it's
behaving normally.
Monitoring
Understand your key business metrics and KPIs.
Netflix key business metric is SPS.
First experiment
Graceful restarts and degradations
Design your next experiments
" You have to know the past to
understand the present. "
Carl Sagan
Move to production
Don't forget about your customers!
Don't destroy the customer experience!
Make sure you can abort!
Only run during business hour.
Automate everything
Run often
Automatic safeguards
Percentage of traffic
Netflix Chaos Automation Platform (ChAP)
Change of mindset
What happens IF this fails to,
what happens WHEN this fails.
Lesson learnt
Takeaways
Everyone can be doing Chaos Engineering
Chaos Engineering is a learning opportunity
Be conscious about customers, involve business
" Chaos doesn't cause problems.
It reveals them. "
Nora Jones
Slack - Head of chaos engineering
and human factors
Tack!

More Related Content

Similar to CHAOS ENGINEERING – OR LET'S SHAKE THE TREE

Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
Noah Sussman
 

Similar to CHAOS ENGINEERING – OR LET'S SHAKE THE TREE (20)

Chaos engineering
Chaos engineering Chaos engineering
Chaos engineering
 
Green Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos EngineeringGreen Custard Friday Talk 19: Chaos Engineering
Green Custard Friday Talk 19: Chaos Engineering
 
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days AustinChaos Engineering Talk at DevOps Days Austin
Chaos Engineering Talk at DevOps Days Austin
 
ChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptxChaosEngineeringITEA.pptx
ChaosEngineeringITEA.pptx
 
Pivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos EngineeringPivotal APJ Security Chaos Engineering
Pivotal APJ Security Chaos Engineering
 
Craft 2019 - Security Chaos Engineering - Security Precognition
Craft 2019 - Security Chaos Engineering - Security PrecognitionCraft 2019 - Security Chaos Engineering - Security Precognition
Craft 2019 - Security Chaos Engineering - Security Precognition
 
Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...Chaos Engineering – why we should all practice breaking things on purpose by ...
Chaos Engineering – why we should all practice breaking things on purpose by ...
 
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
 
Crash wars - The handling awakens v3.0
Crash wars - The handling awakens v3.0Crash wars - The handling awakens v3.0
Crash wars - The handling awakens v3.0
 
Chaos is a ladder !
Chaos is a ladder !Chaos is a ladder !
Chaos is a ladder !
 
Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014Continuous Automated Testing - Cast conference workshop august 2014
Continuous Automated Testing - Cast conference workshop august 2014
 
Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018Using security to drive chaos engineering - April 2018
Using security to drive chaos engineering - April 2018
 
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos EngineeringRSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
RSA Conference APJ 2019 DevSecOps Days Security Chaos Engineering
 
ChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos TestingChaoSlingr: Introducing Security based Chaos Testing
ChaoSlingr: Introducing Security based Chaos Testing
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesExactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test Oracles
 
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red TeamWhat is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
 
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering HamburgPrinciples Of Chaos Engineering - Chaos Engineering Hamburg
Principles Of Chaos Engineering - Chaos Engineering Hamburg
 
Resilience and chaos engineering
Resilience and chaos engineeringResilience and chaos engineering
Resilience and chaos engineering
 
Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017 Choose your own adventure Chaos Engineering - QCon NYC 2017
Choose your own adventure Chaos Engineering - QCon NYC 2017
 
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
 

More from Jimmy Dahlqvist

More from Jimmy Dahlqvist (20)

AWS ECS and AWS Fargate demystified: run serverless containers
AWS ECS and AWS Fargate demystified: run serverless containersAWS ECS and AWS Fargate demystified: run serverless containers
AWS ECS and AWS Fargate demystified: run serverless containers
 
Cloud-grilled delights a high-tech approach to perfect BBQ
Cloud-grilled delights a high-tech approach to perfect BBQCloud-grilled delights a high-tech approach to perfect BBQ
Cloud-grilled delights a high-tech approach to perfect BBQ
 
Building-resilient-serverless-workloads-Navigating-through-failure
Building-resilient-serverless-workloads-Navigating-through-failureBuilding-resilient-serverless-workloads-Navigating-through-failure
Building-resilient-serverless-workloads-Navigating-through-failure
 
Serverless website analytics with Lambda@Edge
Serverless website analytics with Lambda@EdgeServerless website analytics with Lambda@Edge
Serverless website analytics with Lambda@Edge
 
Encrypting data in S3 with Stepfunctions
Encrypting data in S3 with StepfunctionsEncrypting data in S3 with Stepfunctions
Encrypting data in S3 with Stepfunctions
 
Building a serverless AI powered translation service
Building a serverless AI powered translation serviceBuilding a serverless AI powered translation service
Building a serverless AI powered translation service
 
Serverless cloud architecture patterns
Serverless cloud architecture patternsServerless cloud architecture patterns
Serverless cloud architecture patterns
 
AI Powered event-driven translation bot
AI Powered event-driven translation botAI Powered event-driven translation bot
AI Powered event-driven translation bot
 
Serverless and event-driven in a world of IoT
Serverless and event-driven in a world of IoTServerless and event-driven in a world of IoT
Serverless and event-driven in a world of IoT
 
Event-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoTEvent-driven and serverless in the world of IoT
Event-driven and serverless in the world of IoT
 
IoT Enabled Smoker for Great BBQ
IoT Enabled Smoker for Great BBQIoT Enabled Smoker for Great BBQ
IoT Enabled Smoker for Great BBQ
 
Building a serverless event driven Slack Bot
Building a serverless event driven Slack BotBuilding a serverless event driven Slack Bot
Building a serverless event driven Slack Bot
 
IoT Enabled smoker for Great BBQ
IoT Enabled smoker for Great BBQIoT Enabled smoker for Great BBQ
IoT Enabled smoker for Great BBQ
 
IoT enable smoker for great BBQ
IoT enable smoker  for great BBQIoT enable smoker  for great BBQ
IoT enable smoker for great BBQ
 
Autoscaled Github Runners using StepFunctions
Autoscaled Github Runners using StepFunctionsAutoscaled Github Runners using StepFunctions
Autoscaled Github Runners using StepFunctions
 
EventBridge Patterns and real world use case
EventBridge Patterns and real world use caseEventBridge Patterns and real world use case
EventBridge Patterns and real world use case
 
re:Invent Recap Breakfast
re:Invent Recap Breakfastre:Invent Recap Breakfast
re:Invent Recap Breakfast
 
CI/CD As first and last line of defence
CI/CD As first and last line of defenceCI/CD As first and last line of defence
CI/CD As first and last line of defence
 
Introduction to testing in Cloud / AWS
Introduction to testing in Cloud / AWSIntroduction to testing in Cloud / AWS
Introduction to testing in Cloud / AWS
 
Road to an asynchronous device registration API
Road to an asynchronous device registration APIRoad to an asynchronous device registration API
Road to an asynchronous device registration API
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
UK Journal
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Recently uploaded (20)

ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 

CHAOS ENGINEERING – OR LET'S SHAKE THE TREE

  • 1. CHAOS ENGINEERING – OR LET'S SHAKE THE TREE J I M M Y D A H L Q V I S T , K N O W I T
  • 2.
  • 3. " Failures are given and everything will eventually fail over time " Werner Vogels CTO – Amazon.com
  • 4. TRIBUTE • Nora Jones • Adrian Cockroft • Adrian Hornsby
  • 5. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 6.
  • 7. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 8.
  • 9. " By running experiments on a regular basis that simulate a Regional outage, we were able to identify any systemic weaknesses early and fix them. " Netflix Blog
  • 10. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 11.
  • 12. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix – Chaos engineering book. 2018 Concept is spread, ChaosConf started.
  • 13. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix – Chaos engineering book. 2018 Concept is spread, ChaosConf started.
  • 14. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 15. " Failures are given and everything will eventually fail over time " Werner Vogels CTO – Amazon.com
  • 16. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 17.
  • 19. Integration testing Component A Input OutputOutput / Input Component B
  • 21.
  • 23. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 24.
  • 25. " Chaos doesn't cause problems. It reveals them. " Nora Jones Slack - Head of chaos engineering and human factors
  • 28. Start small Use an opt in model, not an opt out. Only include services that like to be chaosed. Start with a success! Don't start in production.
  • 29. Steady state Define the steady state. Build a hypothesis about the steady state. What does our system look like when it's behaving normally.
  • 30. Monitoring Understand your key business metrics and KPIs. Netflix key business metric is SPS.
  • 32. Design your next experiments " You have to know the past to understand the present. " Carl Sagan
  • 33. Move to production Don't forget about your customers! Don't destroy the customer experience! Make sure you can abort! Only run during business hour.
  • 34. Automate everything Run often Automatic safeguards Percentage of traffic Netflix Chaos Automation Platform (ChAP)
  • 35. Change of mindset What happens IF this fails to, what happens WHEN this fails.
  • 37.
  • 38.
  • 39.
  • 40. Takeaways Everyone can be doing Chaos Engineering Chaos Engineering is a learning opportunity Be conscious about customers, involve business
  • 41. " Chaos doesn't cause problems. It reveals them. " Nora Jones Slack - Head of chaos engineering and human factors
  • 42. Tack!

Editor's Notes

  1. And do run it during business hours, when everyone is at the office..... Monitor! Monitor! Monitor! And at first sight of problem. Abort the experiment. So make sure you can abort the experiment, make sure you have validated that you can abort. So if you hit the abort button, it does abort and not just keep running!