SlideShare a Scribd company logo
1 of 25
Download to read offline
Chaos Engineering
practices
Reliability as a Discipline
Vrijdag, 2 July 2021
fullstaq.com
2
Doing things with SRE and Observability
Arnold van Wijnbergen
a.vanwijnbergen@fullstaq.com
linkedin.com/in/IlovIT
fullstaq.com
3
Agenda
About myself
Why do we need Chaos Engineering
Historical facts
Chaos Engineering as a Discipline
Key Takeaways
fullstaq.com
4
About myself
● 20+ year experience in the IT industry
● Last 8 years doing some DevOps
● Since 2015 active with Chaos Engineering
● Love everything around Observability
● Supporter of Cloud Native
● Currently working as Product Owner @ Albert Heijn
● Responsible for
○ Observability
○ Monitoring
○ Logging
○ Alerting
○ Performance testing
○ Chaos engineering
○ Some Elastic and some Cilium
Why do we need
Chaos Engineering ?
fullstaq.com
6
Not for restarting nodes,
hosts or destroying our
services
fullstaq.com
7
Learn from failure and
mitigate risks, by breaking
things on purpose
8
fullstaq.com
Experimenting
failures
[in Production]
in order to reveal
weaknesses and
build confidence in
the resilience
capability.
Chaos Engineering is here to
prevent Chaos from
happening
See it as
Disaster Recovery
Testing
on Steroids
fullstaq.com
1
Historical facts
● Jesse Robbins Master of Disaster started at
Amazon with Gamedays
● Term Chaos introduced by Netflix (2010), to
fill-in the gap of doing proper resilience
testing in the Cloud.
● In 2011 Simian Army was born, most
famous about Chaos Monkey.
● In 2016 the Principles of Chaos went
publicly available.
● 2018 the first ChaosConf is organised by
Gremlin (Kolton Andrus).
● Since 2020 Chaos Engineering became part
of Well-Architected frameworks. See this as
the start of major adoption.
2021
2010
https://principlesofchaos.org/
2011
2016
2018
2020
fullstaq.com
1
Common reality with Distributed Systems
Just your ordinary Grocery store
fullstaq.com
1
Reliability becomes a product Feature
Innovation
Reliability
fullstaq.com
1
SREs love Chaos Engineering
fullstaq.com
1
It requires more then just Tools
Observability
SLO/SLI
Game days
Analysis
Evaluation
CI/CD
Testing
Chaos Tools
fullstaq.com
1
But how do we start ?
● Start with organising an event like a
Game day with the product and other
relevant teams like Incident Commands.
● Ensure that the goals and scope for
experiments are set and agreed.
● Ensure that you have enough time for
creating the hypotheses.
● Run the experiment!
● Evaluate the results and record the
evidence.
In fact we are
building a
discipline
fullstaq.com
1
Game days explained
● Creating a culture of
experimentation.
● Repeatable exercises to learn
from failures.
● Well-known method how AWS
validates their services and
major-incident process
resilience.
● Working on collaborative trust.
fullstaq.com
1
Goals made easy by SRE
● As team set your Objectives
● Use techniques like SLO/SLI
● Prepare your Observability
systems with the appropriate
Indicators to measure.
● Use these to validate your
expected system behaviour.
fullstaq.com
1
Simple flow that make the Scientific part work
Hypothesis
Experiment
Deviation
Evidence
fullstaq.com
2
Simple flow that make the Scientific part work
Hypothesis
Experiment
Deviation
Evidence
SLO/SLI
Game days
Chaos Tools
Observability
Game days Testing
Analysis
Evaluation
CI/CD
Analysis
2
Building a hypothesis
fullstaq.com
● Build an hypothesis around the Steady-state.
● Steady-state is when customers are happy
● Describe potential/real-world outages that
can/has happen due infrastructure, application
or connectivity failures and hard to predict
cascading effects.
● Get agreement on the Blast radius (scope)
● Describe the fault-injection implementation to
fulfill the experiment.
● Choose a strategy how to execute the
fault-injection experiment. Start small.
Latency
Routing
failures
Unavailability
Connectivity
failures
S
a
t
u
r
a
t
i
o
n
Data corruption
2
Execute the experiment
fullstaq.com
● Ensure everybody is well
informed.
● Ensure that Observability
tools are set ready.
● Record evidence!
Start learning from Failure !
2
Learn from Failure results
fullstaq.com
● Always ensure experiment results are
recorded, (historically) available and
analysed. This is your evidence!
● Look for deviations from the steady-state,
which you can learn from your Observability
system.
● Ensure that the whole experiment is
evaluated, preferable using a post-mortem
analysis.
● Extract improvements !!!
2
Key takeaways
fullstaq.com
● Chaos Engineering is not only about tools.
● Include multiple disciplines and involve other teams to make
the experiment a success.
● Use SRE principles that make things easier to
implement/analyse.
● Never do your first experiment on Production.
● When you are confident ensure that experiments are
automated and periodically gather evidence.
● Experiments can be used to feed with Progressive Delivery.
● Resilience is the ability to recover, but never forget the User
Experience.
fullstaq.com
2
QUESTIONS ?

More Related Content

What's hot

Robert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthRobert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthAnna Royzman
 
Fire alarms vs. Fire hoses: Keeping up with Dependencies
Fire alarms vs. Fire hoses: Keeping up with DependenciesFire alarms vs. Fire hoses: Keeping up with Dependencies
Fire alarms vs. Fire hoses: Keeping up with DependenciesWhiteSource
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
Get testing bottlenecks out of your pipelines
Get testing bottlenecks out of your pipelinesGet testing bottlenecks out of your pipelines
Get testing bottlenecks out of your pipelineslisacrispin
 
Defect root cause analysis, Андрей Титаренко
Defect root cause analysis, Андрей ТитаренкоDefect root cause analysis, Андрей Титаренко
Defect root cause analysis, Андрей ТитаренкоSigma Software
 
DevOps drivein - Mind the Gap
DevOps drivein - Mind the GapDevOps drivein - Mind the Gap
DevOps drivein - Mind the GapSerena Software
 
Changing rules 2_moretestsfindmoreissues_slideshare
Changing rules 2_moretestsfindmoreissues_slideshareChanging rules 2_moretestsfindmoreissues_slideshare
Changing rules 2_moretestsfindmoreissues_slideshareSOASTA
 
The Whole Team Approach to Quality in Continuous Delivery
The Whole Team Approach to Quality in Continuous DeliveryThe Whole Team Approach to Quality in Continuous Delivery
The Whole Team Approach to Quality in Continuous Deliverylisacrispin
 
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix It
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix ItPHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix It
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix ItMatt Toigo
 
Move test planning before implementation
Move test planning before implementationMove test planning before implementation
Move test planning before implementationTed Cheng
 
The Perfect Neos Project Setup
The Perfect Neos Project SetupThe Perfect Neos Project Setup
The Perfect Neos Project SetupKarsten Dambekalns
 
Do you even need to automate the GUI?
Do you even need to automate the GUI? Do you even need to automate the GUI?
Do you even need to automate the GUI? Matt Heusser
 
Testing Metrics - Making your tests visible
Testing Metrics - Making your tests visibleTesting Metrics - Making your tests visible
Testing Metrics - Making your tests visibleAlper Mermer
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesYelp Engineering
 
Ansible for Enterprise
Ansible for EnterpriseAnsible for Enterprise
Ansible for EnterpriseAnsible
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachVMware Tanzu
 

What's hot (20)

Robert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthRobert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software Health
 
Test Driven Development
Test Driven DevelopmentTest Driven Development
Test Driven Development
 
Fire alarms vs. Fire hoses: Keeping up with Dependencies
Fire alarms vs. Fire hoses: Keeping up with DependenciesFire alarms vs. Fire hoses: Keeping up with Dependencies
Fire alarms vs. Fire hoses: Keeping up with Dependencies
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
Unit testing for project managers
Unit testing for project managersUnit testing for project managers
Unit testing for project managers
 
Get testing bottlenecks out of your pipelines
Get testing bottlenecks out of your pipelinesGet testing bottlenecks out of your pipelines
Get testing bottlenecks out of your pipelines
 
Defect root cause analysis, Андрей Титаренко
Defect root cause analysis, Андрей ТитаренкоDefect root cause analysis, Андрей Титаренко
Defect root cause analysis, Андрей Титаренко
 
DevOps drivein - Mind the Gap
DevOps drivein - Mind the GapDevOps drivein - Mind the Gap
DevOps drivein - Mind the Gap
 
Changing rules 2_moretestsfindmoreissues_slideshare
Changing rules 2_moretestsfindmoreissues_slideshareChanging rules 2_moretestsfindmoreissues_slideshare
Changing rules 2_moretestsfindmoreissues_slideshare
 
The Whole Team Approach to Quality in Continuous Delivery
The Whole Team Approach to Quality in Continuous DeliveryThe Whole Team Approach to Quality in Continuous Delivery
The Whole Team Approach to Quality in Continuous Delivery
 
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix It
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix ItPHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix It
PHP World DC 2015 - What Can Go Wrong with Agile Development and How to Fix It
 
Please don't test your product - Agile Testing
Please don't test your product - Agile TestingPlease don't test your product - Agile Testing
Please don't test your product - Agile Testing
 
Move test planning before implementation
Move test planning before implementationMove test planning before implementation
Move test planning before implementation
 
The Perfect Neos Project Setup
The Perfect Neos Project SetupThe Perfect Neos Project Setup
The Perfect Neos Project Setup
 
Do you even need to automate the GUI?
Do you even need to automate the GUI? Do you even need to automate the GUI?
Do you even need to automate the GUI?
 
Agile testing
Agile testingAgile testing
Agile testing
 
Testing Metrics - Making your tests visible
Testing Metrics - Making your tests visibleTesting Metrics - Making your tests visible
Testing Metrics - Making your tests visible
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
 
Ansible for Enterprise
Ansible for EnterpriseAnsible for Enterprise
Ansible for Enterprise
 
Security as Code: A DevSecOps Approach
Security as Code: A DevSecOps ApproachSecurity as Code: A DevSecOps Approach
Security as Code: A DevSecOps Approach
 

Similar to Reliability as a Discipline

Embracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You GrowEmbracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You GrowPaul Balogh
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
DevOps in Practice: When does "Practice" Become "Doing"?
DevOps in Practice: When does "Practice" Become "Doing"?DevOps in Practice: When does "Practice" Become "Doing"?
DevOps in Practice: When does "Practice" Become "Doing"?Michael Elder
 
SplunkLive! London 2015 - DevOps Breakout
SplunkLive! London 2015 - DevOps BreakoutSplunkLive! London 2015 - DevOps Breakout
SplunkLive! London 2015 - DevOps BreakoutSplunk
 
Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Ian McDonald
 
verification_planning_systemverilog_uvm_2020
verification_planning_systemverilog_uvm_2020verification_planning_systemverilog_uvm_2020
verification_planning_systemverilog_uvm_2020Sameh El-Ashry
 
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...Andrey Falko
 
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxObservability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxMagnus Johansson
 
DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015Yuval Yeret
 
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...Agile Greece
 
Why Your Selenium Tests are so Dang Brittle, and What to Do About It
Why Your Selenium Tests are so Dang Brittle, and What to Do About ItWhy Your Selenium Tests are so Dang Brittle, and What to Do About It
Why Your Selenium Tests are so Dang Brittle, and What to Do About ItJay Aho
 
Pay pal paypal continuous performance as a self-service with fully-automated...
Pay pal  paypal continuous performance as a self-service with fully-automated...Pay pal  paypal continuous performance as a self-service with fully-automated...
Pay pal paypal continuous performance as a self-service with fully-automated...Dynatrace
 
Always Be Deploying. How to make R great for machine learning in (not only) E...
Always Be Deploying. How to make R great for machine learning in (not only) E...Always Be Deploying. How to make R great for machine learning in (not only) E...
Always Be Deploying. How to make R great for machine learning in (not only) E...Wit Jakuczun
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileIosif Itkin
 
Agile Java Testing With Open Source Frameworks
Agile Java Testing With Open Source FrameworksAgile Java Testing With Open Source Frameworks
Agile Java Testing With Open Source FrameworksViraf Karai
 
Appium, Test-Driven Development, and Continuous Integration
Appium, Test-Driven Development, and Continuous IntegrationAppium, Test-Driven Development, and Continuous Integration
Appium, Test-Driven Development, and Continuous IntegrationTechWell
 
224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case StudyESEM 2014
 
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...Jennifer Finney
 

Similar to Reliability as a Discipline (20)

Embracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You GrowEmbracing Disruption: Adding a Bit of Chaos to Help You Grow
Embracing Disruption: Adding a Bit of Chaos to Help You Grow
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
DevOps in Practice: When does "Practice" Become "Doing"?
DevOps in Practice: When does "Practice" Become "Doing"?DevOps in Practice: When does "Practice" Become "Doing"?
DevOps in Practice: When does "Practice" Become "Doing"?
 
SplunkLive! London 2015 - DevOps Breakout
SplunkLive! London 2015 - DevOps BreakoutSplunkLive! London 2015 - DevOps Breakout
SplunkLive! London 2015 - DevOps Breakout
 
Estimating test effort part 1 of 2
Estimating test effort part 1 of 2Estimating test effort part 1 of 2
Estimating test effort part 1 of 2
 
verification_planning_systemverilog_uvm_2020
verification_planning_systemverilog_uvm_2020verification_planning_systemverilog_uvm_2020
verification_planning_systemverilog_uvm_2020
 
Tce automation-d4
Tce automation-d4Tce automation-d4
Tce automation-d4
 
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
San Francisco Jenkins Area Meetup October 2016: Self-service secure test and ...
 
Observability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptxObservability - Stockholm Splunk UG Jan 19 2023.pptx
Observability - Stockholm Splunk UG Jan 19 2023.pptx
 
DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015DevOps/Flow workshop for agile india 2015
DevOps/Flow workshop for agile india 2015
 
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...
Ralph Jocham, effective agile - Scaled Scrum at Swiss Postal Services | Agile...
 
Why Your Selenium Tests are so Dang Brittle, and What to Do About It
Why Your Selenium Tests are so Dang Brittle, and What to Do About ItWhy Your Selenium Tests are so Dang Brittle, and What to Do About It
Why Your Selenium Tests are so Dang Brittle, and What to Do About It
 
Pay pal paypal continuous performance as a self-service with fully-automated...
Pay pal  paypal continuous performance as a self-service with fully-automated...Pay pal  paypal continuous performance as a self-service with fully-automated...
Pay pal paypal continuous performance as a self-service with fully-automated...
 
Always Be Deploying. How to make R great for machine learning in (not only) E...
Always Be Deploying. How to make R great for machine learning in (not only) E...Always Be Deploying. How to make R great for machine learning in (not only) E...
Always Be Deploying. How to make R great for machine learning in (not only) E...
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibile
 
Agile Java Testing With Open Source Frameworks
Agile Java Testing With Open Source FrameworksAgile Java Testing With Open Source Frameworks
Agile Java Testing With Open Source Frameworks
 
Appium, Test-Driven Development, and Continuous Integration
Appium, Test-Driven Development, and Continuous IntegrationAppium, Test-Driven Development, and Continuous Integration
Appium, Test-Driven Development, and Continuous Integration
 
224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study224 - Factors Impacting Rapid Releases: An Industrial Case Study
224 - Factors Impacting Rapid Releases: An Industrial Case Study
 
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...
How To Introduce Cloud Based Load Testing to Your Jenkins Continuous Delivery...
 

More from Arnold Van Wijnbergen

Contributing Today: Chaos Engineering mini demo Litmus Chaos
Contributing Today: Chaos Engineering mini demo Litmus ChaosContributing Today: Chaos Engineering mini demo Litmus Chaos
Contributing Today: Chaos Engineering mini demo Litmus ChaosArnold Van Wijnbergen
 
Kong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N TellKong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N TellArnold Van Wijnbergen
 
Why Tooling (Only) Isn’t The Answer
Why Tooling (Only) Isn’t The AnswerWhy Tooling (Only) Isn’t The Answer
Why Tooling (Only) Isn’t The AnswerArnold Van Wijnbergen
 
Life of an event - A never ending tool chain
Life of an event - A never ending tool chainLife of an event - A never ending tool chain
Life of an event - A never ending tool chainArnold Van Wijnbergen
 

More from Arnold Van Wijnbergen (6)

Security Analytics with OpenSearch
Security Analytics with OpenSearchSecurity Analytics with OpenSearch
Security Analytics with OpenSearch
 
Contributing Today: Chaos Engineering mini demo Litmus Chaos
Contributing Today: Chaos Engineering mini demo Litmus ChaosContributing Today: Chaos Engineering mini demo Litmus Chaos
Contributing Today: Chaos Engineering mini demo Litmus Chaos
 
Kong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N TellKong Ingress Controller - Fullstaq Show N Tell
Kong Ingress Controller - Fullstaq Show N Tell
 
Why Tooling (Only) Isn’t The Answer
Why Tooling (Only) Isn’t The AnswerWhy Tooling (Only) Isn’t The Answer
Why Tooling (Only) Isn’t The Answer
 
DevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshopDevOpsDays Amsterdam 2016 workshop
DevOpsDays Amsterdam 2016 workshop
 
Life of an event - A never ending tool chain
Life of an event - A never ending tool chainLife of an event - A never ending tool chain
Life of an event - A never ending tool chain
 

Recently uploaded

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 

Recently uploaded (20)

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 

Reliability as a Discipline

  • 1. Chaos Engineering practices Reliability as a Discipline Vrijdag, 2 July 2021
  • 2. fullstaq.com 2 Doing things with SRE and Observability Arnold van Wijnbergen a.vanwijnbergen@fullstaq.com linkedin.com/in/IlovIT
  • 3. fullstaq.com 3 Agenda About myself Why do we need Chaos Engineering Historical facts Chaos Engineering as a Discipline Key Takeaways
  • 4. fullstaq.com 4 About myself ● 20+ year experience in the IT industry ● Last 8 years doing some DevOps ● Since 2015 active with Chaos Engineering ● Love everything around Observability ● Supporter of Cloud Native ● Currently working as Product Owner @ Albert Heijn ● Responsible for ○ Observability ○ Monitoring ○ Logging ○ Alerting ○ Performance testing ○ Chaos engineering ○ Some Elastic and some Cilium
  • 5. Why do we need Chaos Engineering ?
  • 6. fullstaq.com 6 Not for restarting nodes, hosts or destroying our services
  • 7. fullstaq.com 7 Learn from failure and mitigate risks, by breaking things on purpose
  • 8. 8 fullstaq.com Experimenting failures [in Production] in order to reveal weaknesses and build confidence in the resilience capability.
  • 9. Chaos Engineering is here to prevent Chaos from happening
  • 10. See it as Disaster Recovery Testing on Steroids
  • 11. fullstaq.com 1 Historical facts ● Jesse Robbins Master of Disaster started at Amazon with Gamedays ● Term Chaos introduced by Netflix (2010), to fill-in the gap of doing proper resilience testing in the Cloud. ● In 2011 Simian Army was born, most famous about Chaos Monkey. ● In 2016 the Principles of Chaos went publicly available. ● 2018 the first ChaosConf is organised by Gremlin (Kolton Andrus). ● Since 2020 Chaos Engineering became part of Well-Architected frameworks. See this as the start of major adoption. 2021 2010 https://principlesofchaos.org/ 2011 2016 2018 2020
  • 12. fullstaq.com 1 Common reality with Distributed Systems Just your ordinary Grocery store
  • 13. fullstaq.com 1 Reliability becomes a product Feature Innovation Reliability
  • 15. fullstaq.com 1 It requires more then just Tools Observability SLO/SLI Game days Analysis Evaluation CI/CD Testing Chaos Tools
  • 16. fullstaq.com 1 But how do we start ? ● Start with organising an event like a Game day with the product and other relevant teams like Incident Commands. ● Ensure that the goals and scope for experiments are set and agreed. ● Ensure that you have enough time for creating the hypotheses. ● Run the experiment! ● Evaluate the results and record the evidence. In fact we are building a discipline
  • 17. fullstaq.com 1 Game days explained ● Creating a culture of experimentation. ● Repeatable exercises to learn from failures. ● Well-known method how AWS validates their services and major-incident process resilience. ● Working on collaborative trust.
  • 18. fullstaq.com 1 Goals made easy by SRE ● As team set your Objectives ● Use techniques like SLO/SLI ● Prepare your Observability systems with the appropriate Indicators to measure. ● Use these to validate your expected system behaviour.
  • 19. fullstaq.com 1 Simple flow that make the Scientific part work Hypothesis Experiment Deviation Evidence
  • 20. fullstaq.com 2 Simple flow that make the Scientific part work Hypothesis Experiment Deviation Evidence SLO/SLI Game days Chaos Tools Observability Game days Testing Analysis Evaluation CI/CD Analysis
  • 21. 2 Building a hypothesis fullstaq.com ● Build an hypothesis around the Steady-state. ● Steady-state is when customers are happy ● Describe potential/real-world outages that can/has happen due infrastructure, application or connectivity failures and hard to predict cascading effects. ● Get agreement on the Blast radius (scope) ● Describe the fault-injection implementation to fulfill the experiment. ● Choose a strategy how to execute the fault-injection experiment. Start small. Latency Routing failures Unavailability Connectivity failures S a t u r a t i o n Data corruption
  • 22. 2 Execute the experiment fullstaq.com ● Ensure everybody is well informed. ● Ensure that Observability tools are set ready. ● Record evidence! Start learning from Failure !
  • 23. 2 Learn from Failure results fullstaq.com ● Always ensure experiment results are recorded, (historically) available and analysed. This is your evidence! ● Look for deviations from the steady-state, which you can learn from your Observability system. ● Ensure that the whole experiment is evaluated, preferable using a post-mortem analysis. ● Extract improvements !!!
  • 24. 2 Key takeaways fullstaq.com ● Chaos Engineering is not only about tools. ● Include multiple disciplines and involve other teams to make the experiment a success. ● Use SRE principles that make things easier to implement/analyse. ● Never do your first experiment on Production. ● When you are confident ensure that experiments are automated and periodically gather evidence. ● Experiments can be used to feed with Progressive Delivery. ● Resilience is the ability to recover, but never forget the User Experience.