SlideShare a Scribd company logo
1 of 27
Making Disaster Routine
Anticipating and Practicing Failures Using Active Monitoring and Chaos
Engineering
Peter Varhol and Gerie Owen
About me
• International speaker and writer
• Graduate degrees in Math, CS, Psychology
• Technology communicator
• Former university professor, tech journalist
• Cat owner and distance runner
• peter@petervarhol.com
Gerie Owen
3
• QA Evangelist, test manager
• Subject matter expert on testing for
TechTarget’s SearchSoftwareQuality.com
• International and domestic conference
presenter
• Marathon runner & running coach
gowen@qualitestgroup.com
Agenda
• DevOps and disaster
• Preparing for disaster
• Principles of chaos
• Monitoring for disaster
• Getting back on your feet
• Conclusions
What is DevOps?
• Containerized development, rapid iteration with real-time
performance insights, intelligent feedback, diagnostic services, an
integrated DevOps pipeline, and deployment to the cloud
• Boshe moi!
• In layman’s terms:
• We automatically integrate and build every time there is a valid check-in
• We run automated tests at all stages, including production
• We send the app to production when it has been integrated and tested
• Automation makes it all work like a Swiss watch
What is a Disaster?
• A serious disruption, occurring over a relatively short time, loss and
impacts, which exceeds the ability of the affected community or
society to cope using its own resources.
• Disruption
• Short timeframe
• Exceeds the ability to cope
What is a Disaster
• Consistency becomes uncertain
• Automated workflow breaks down
• Build fails; smoke tests are blocked
• Server farm goes offline
• Application won’t start again
• Showstopper bug in production
• Anything that disrupts consistency
Preparing for Disaster
• We don’t react well when things go wrong
• Disbelief
• Uncertainty
• Panic
• How can we prepare for the unknown?
We Can Learn from Aircrews
• US Airways Flight 1549
• Sullenberger and Skiles had never met before that day
• Yet worked from established procedures
• Practiced for hundreds of hours
• Immediately turned to checklists
• 90 seconds after the bird strike, they were in the Hudson
• You have to practice this
We Can Learn From Aircrews
• Indecision and panic are killers
• Checklists drive decision-making by focusing on essentials
• Courses of action are defined fast
• Practice makes disasters just another day in the office
• Clear and structured communications is essential
We Can Also Practice Disaster
• Chaos engineering
• Failure scenarios
• Application health monitoring
Chaos Engineering
• Distributed systems at scale
• Experiments to uncover systemic weaknesses
• Defining normal behavior
• Set your null and alternative hypothesis
• Introduce variables that reflect real world events
• servers crash
• hard drives malfunction
• network connections lost
• Try to disprove the null hypothesis
Chaos Engineering
• Practice in production
• Vary real world events
• Yes, there could be customer impact
• It is incumbent upon the chaos engineer to minimize customer impact
• But that is the point of the experiment
Chaos Monkey
• Now called Simian Army
• Developed by Netflix
• Causes breakdowns in their production environment
• Now consists of a variety of tools
• It’s all about resiliency
• Can our application survive?
Practice Failure Scenarios
• Each team member contributes one or more scenarios
• The more unlikely, the better
• Write up the scenarios
• Only the team leader sees them beforehand
• They can be real failures experienced or thought exercises
Practice Failure Scenarios
• Describe the scenarios to the team
• “Load is remaining constant but performance is gradually
deteriorating. We’re starting to get 404 and related errors. The server
farm seems to be operating correctly; it’s an application issue. Pings
are slowing down, but not drastically.”
• How do we diagnose and address?
We Don’t Need Another Hero
• Heroes use superhuman efforts to fix a disaster
• In doing so, they break with team conventions
• Teams function better together
• If a team has a hero:
• the team may not try as hard in the future
• the hero is not replicable
• the hero can’t solve every problem
Monitor Application Health in Production
• Ping just doesn’t cut it any more
• Availability and performance data
• Synthetic testing
• Health over time
• Track trends of performance, page painting, database calls
• Whatever might give you health trends
Directions for Monitoring
• Watermarks for action
• E.g., 25 percent of pages take longer than 2 seconds to load
• AI for prediction
• Based on similar results in the past, the application is likely to fail in six hours
• Analytics for trends
• A combination of six measures indicates unhealthy trends
The Power of Checklists
• Checklists are part of our daily lives
• They
• relieve the cognitive load of remembering to do’s
• organize complicated decision-making
• reduce risk in complicated activities by ensuring that critical tasks are not
overlooked.
Types of Checklists
Using Checklists in DevOps
• Checklists can be used to:
• Replace Test Cases
• Supplement Test Cases
• Verify Entry and Exit Criteria
• Sanity Testing
• Ambiguity Reviews
• Dev Estimates
Types of Checklists
• Project Set Up
• Application Specific Regression
• Process type specific
• Website Graphics
• Browser Dependencies
• Usability checks
What Does Thinking Of Failure Accomplish?
• Failure doesn’t come as a surprise
• It does so all too often
• We have procedures to deal with failure
• We have practice dealing with failure
• Failure is just another day at the office
A Final Lesson
• You are not alone
Conclusions
• Things will go wrong
• Don’t yell or panic
• Practice non-conforming situations regularly
• Make up unlikely scenarios; chances are they will happen
• Structured practices and communications may make work boring, but
they help when things start going wrong
• Ease into chaos engineering for resiliency
• Use your experiences to create checklists
Making disaster routine

More Related Content

What's hot

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
Continuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsContinuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsTechWell
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Henerey
 
Robert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthRobert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthAnna Royzman
 
DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management TalkMichael Rembetsy
 
Quality at Speed - Penny Wyatt
Quality at Speed - Penny WyattQuality at Speed - Penny Wyatt
Quality at Speed - Penny WyattAtlassian
 
Testing in a Continuous World
Testing in a Continuous WorldTesting in a Continuous World
Testing in a Continuous WorldLisi Hocke
 
The Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyThe Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyXebiaLabs
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioDevOpsDays Tel Aviv
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaKeet Sugathadasa
 
SDET approach for Agile Testing
SDET approach for Agile TestingSDET approach for Agile Testing
SDET approach for Agile TestingGopikrishna Kannan
 
Nf final chef-lisa-metrics-2015-ss
Nf final chef-lisa-metrics-2015-ssNf final chef-lisa-metrics-2015-ss
Nf final chef-lisa-metrics-2015-ssNicole Forsgren
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsRandy Shoup
 
TestDriven Development, Why How and Smells
TestDriven Development, Why How and SmellsTestDriven Development, Why How and Smells
TestDriven Development, Why How and SmellsProwareness
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesYelp Engineering
 
Moving QA from Reactive to Proactive with qTest
Moving QA from Reactive to Proactive  with qTestMoving QA from Reactive to Proactive  with qTest
Moving QA from Reactive to Proactive with qTestQASymphony
 
Soft Skills You Need Are Not Always Taught in Class
Soft Skills You Need Are Not Always Taught in ClassSoft Skills You Need Are Not Always Taught in Class
Soft Skills You Need Are Not Always Taught in ClassTechWell
 
DevOPs Transformation Workshop
DevOPs Transformation WorkshopDevOPs Transformation Workshop
DevOPs Transformation WorkshopJules Pierre-Louis
 

What's hot (20)

Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
Continuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsContinuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOps
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
Robert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software HealthRobert and Anne Sabourin: Gauging Software Health
Robert and Anne Sabourin: Gauging Software Health
 
DevOps Picc12 Management Talk
DevOps Picc12 Management TalkDevOps Picc12 Management Talk
DevOps Picc12 Management Talk
 
Quality at Speed - Penny Wyatt
Quality at Speed - Penny WyattQuality at Speed - Penny Wyatt
Quality at Speed - Penny Wyatt
 
Testing in a Continuous World
Testing in a Continuous WorldTesting in a Continuous World
Testing in a Continuous World
 
The Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the JourneyThe Business Case for DevOps - Justifying the Journey
The Business Case for DevOps - Justifying the Journey
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
 
SDET approach for Agile Testing
SDET approach for Agile TestingSDET approach for Agile Testing
SDET approach for Agile Testing
 
Nf final chef-lisa-metrics-2015-ss
Nf final chef-lisa-metrics-2015-ssNf final chef-lisa-metrics-2015-ss
Nf final chef-lisa-metrics-2015-ss
 
NYC MeetUp 10.9
NYC MeetUp 10.9NYC MeetUp 10.9
NYC MeetUp 10.9
 
Anatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and LessonsAnatomy of Three Incidents -- Commonalities and Lessons
Anatomy of Three Incidents -- Commonalities and Lessons
 
TestDriven Development, Why How and Smells
TestDriven Development, Why How and SmellsTestDriven Development, Why How and Smells
TestDriven Development, Why How and Smells
 
Microservices Summit - The Human Side of Services
Microservices Summit - The Human Side of ServicesMicroservices Summit - The Human Side of Services
Microservices Summit - The Human Side of Services
 
Moving QA from Reactive to Proactive with qTest
Moving QA from Reactive to Proactive  with qTestMoving QA from Reactive to Proactive  with qTest
Moving QA from Reactive to Proactive with qTest
 
DevOps: Hype or Hope
DevOps: Hype or HopeDevOps: Hype or Hope
DevOps: Hype or Hope
 
Soft Skills You Need Are Not Always Taught in Class
Soft Skills You Need Are Not Always Taught in ClassSoft Skills You Need Are Not Always Taught in Class
Soft Skills You Need Are Not Always Taught in Class
 
DevOPs Transformation Workshop
DevOPs Transformation WorkshopDevOPs Transformation Workshop
DevOPs Transformation Workshop
 

Similar to Making disaster routine

Tester Challenges in Agile ?
Tester Challenges in Agile ?Tester Challenges in Agile ?
Tester Challenges in Agile ?alind tiwari
 
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​Eggplant
 
Agile Acceptance testing with Fitnesse
Agile Acceptance testing with FitnesseAgile Acceptance testing with Fitnesse
Agile Acceptance testing with FitnesseClareMcLennan
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesPeter Varhol
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting AgileCoverity
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run itSkyscanner
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testingPeter Varhol
 
Java DevOps at Enterprise Scale
Java DevOps at Enterprise ScaleJava DevOps at Enterprise Scale
Java DevOps at Enterprise ScaleRyan McGuinness
 
Will The Test Leaders Stand Up?
Will The Test Leaders Stand Up?Will The Test Leaders Stand Up?
Will The Test Leaders Stand Up?Paul Gerrard
 
Mastering Complex Application Deployments
Mastering Complex Application DeploymentsMastering Complex Application Deployments
Mastering Complex Application DeploymentsIBM UrbanCode Products
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...QASymphony
 

Similar to Making disaster routine (20)

Tester Challenges in Agile ?
Tester Challenges in Agile ?Tester Challenges in Agile ?
Tester Challenges in Agile ?
 
Agile process
Agile processAgile process
Agile process
 
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
Strategy vs. Tactical Testing: Actions for Today, Plans for Tomorrow​
 
Agile Acceptance testing with Fitnesse
Agile Acceptance testing with FitnesseAgile Acceptance testing with Fitnesse
Agile Acceptance testing with Fitnesse
 
Using Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps PracticesUsing Machine Learning to Optimize DevOps Practices
Using Machine Learning to Optimize DevOps Practices
 
Software Testing
Software TestingSoftware Testing
Software Testing
 
Adopting Agile
Adopting AgileAdopting Agile
Adopting Agile
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run it
 
Fundamental of testing
Fundamental of testingFundamental of testing
Fundamental of testing
 
Unit 1.pptx
Unit 1.pptxUnit 1.pptx
Unit 1.pptx
 
How do we fix testing
How do we fix testingHow do we fix testing
How do we fix testing
 
Software testing - An Overview
Software testing - An OverviewSoftware testing - An Overview
Software testing - An Overview
 
Java DevOps at Enterprise Scale
Java DevOps at Enterprise ScaleJava DevOps at Enterprise Scale
Java DevOps at Enterprise Scale
 
Will The Test Leaders Stand Up?
Will The Test Leaders Stand Up?Will The Test Leaders Stand Up?
Will The Test Leaders Stand Up?
 
Mastering Complex Application Deployments
Mastering Complex Application DeploymentsMastering Complex Application Deployments
Mastering Complex Application Deployments
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Invite the tester to the party
Invite the tester to the partyInvite the tester to the party
Invite the tester to the party
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
Istqb foundation level day 1
Istqb foundation level   day 1Istqb foundation level   day 1
Istqb foundation level day 1
 
Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...Agile Transformation: People, Process and Tools to Make Your Transformation S...
Agile Transformation: People, Process and Tools to Make Your Transformation S...
 

More from Peter Varhol

Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesPeter Varhol
 
DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor SyndromePeter Varhol
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesPeter Varhol
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the futurePeter Varhol
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causationPeter Varhol
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisPeter Varhol
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systemsPeter Varhol
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsPeter Varhol
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debtPeter Varhol
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignitePeter Varhol
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightningPeter Varhol
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Peter Varhol
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varholPeter Varhol
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatracePeter Varhol
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolPeter Varhol
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Peter Varhol
 

More from Peter Varhol (16)

Not fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational valuesNot fair! testing AI bias and organizational values
Not fair! testing AI bias and organizational values
 
DevOps and the Impostor Syndrome
DevOps and the Impostor SyndromeDevOps and the Impostor Syndrome
DevOps and the Impostor Syndrome
 
Not fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational valuesNot fair! testing ai bias and organizational values
Not fair! testing ai bias and organizational values
 
162 the technologist of the future
162   the technologist of the future162   the technologist of the future
162 the technologist of the future
 
Correlation does not mean causation
Correlation does not mean causationCorrelation does not mean causation
Correlation does not mean causation
 
Digital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolisDigital transformation through devops dod indianapolis
Digital transformation through devops dod indianapolis
 
Testing for cognitive bias in ai systems
Testing for cognitive bias in ai systemsTesting for cognitive bias in ai systems
Testing for cognitive bias in ai systems
 
What Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing TeamsWhat Aircrews Can Teach Testing Teams
What Aircrews Can Teach Testing Teams
 
Identifying and measuring testing debt
Identifying and measuring testing debtIdentifying and measuring testing debt
Identifying and measuring testing debt
 
What aircrews can teach devops teams ignite
What aircrews can teach devops teams igniteWhat aircrews can teach devops teams ignite
What aircrews can teach devops teams ignite
 
Talking to people lightning
Talking to people lightningTalking to people lightning
Talking to people lightning
 
Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011Varhol oracle database_firewall_oct2011
Varhol oracle database_firewall_oct2011
 
Qa test managed_code_varhol
Qa test managed_code_varholQa test managed_code_varhol
Qa test managed_code_varhol
 
Testing a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatraceTesting a movingtarget_quest_dynatrace
Testing a movingtarget_quest_dynatrace
 
Talking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps toolTalking to people: the forgotten DevOps tool
Talking to people: the forgotten DevOps tool
 
Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012Moneyball peter varhol_starwest2012
Moneyball peter varhol_starwest2012
 

Recently uploaded

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Making disaster routine

  • 1. Making Disaster Routine Anticipating and Practicing Failures Using Active Monitoring and Chaos Engineering Peter Varhol and Gerie Owen
  • 2. About me • International speaker and writer • Graduate degrees in Math, CS, Psychology • Technology communicator • Former university professor, tech journalist • Cat owner and distance runner • peter@petervarhol.com
  • 3. Gerie Owen 3 • QA Evangelist, test manager • Subject matter expert on testing for TechTarget’s SearchSoftwareQuality.com • International and domestic conference presenter • Marathon runner & running coach gowen@qualitestgroup.com
  • 4. Agenda • DevOps and disaster • Preparing for disaster • Principles of chaos • Monitoring for disaster • Getting back on your feet • Conclusions
  • 5. What is DevOps? • Containerized development, rapid iteration with real-time performance insights, intelligent feedback, diagnostic services, an integrated DevOps pipeline, and deployment to the cloud • Boshe moi! • In layman’s terms: • We automatically integrate and build every time there is a valid check-in • We run automated tests at all stages, including production • We send the app to production when it has been integrated and tested • Automation makes it all work like a Swiss watch
  • 6. What is a Disaster? • A serious disruption, occurring over a relatively short time, loss and impacts, which exceeds the ability of the affected community or society to cope using its own resources. • Disruption • Short timeframe • Exceeds the ability to cope
  • 7. What is a Disaster • Consistency becomes uncertain • Automated workflow breaks down • Build fails; smoke tests are blocked • Server farm goes offline • Application won’t start again • Showstopper bug in production • Anything that disrupts consistency
  • 8. Preparing for Disaster • We don’t react well when things go wrong • Disbelief • Uncertainty • Panic • How can we prepare for the unknown?
  • 9. We Can Learn from Aircrews • US Airways Flight 1549 • Sullenberger and Skiles had never met before that day • Yet worked from established procedures • Practiced for hundreds of hours • Immediately turned to checklists • 90 seconds after the bird strike, they were in the Hudson • You have to practice this
  • 10. We Can Learn From Aircrews • Indecision and panic are killers • Checklists drive decision-making by focusing on essentials • Courses of action are defined fast • Practice makes disasters just another day in the office • Clear and structured communications is essential
  • 11. We Can Also Practice Disaster • Chaos engineering • Failure scenarios • Application health monitoring
  • 12. Chaos Engineering • Distributed systems at scale • Experiments to uncover systemic weaknesses • Defining normal behavior • Set your null and alternative hypothesis • Introduce variables that reflect real world events • servers crash • hard drives malfunction • network connections lost • Try to disprove the null hypothesis
  • 13. Chaos Engineering • Practice in production • Vary real world events • Yes, there could be customer impact • It is incumbent upon the chaos engineer to minimize customer impact • But that is the point of the experiment
  • 14. Chaos Monkey • Now called Simian Army • Developed by Netflix • Causes breakdowns in their production environment • Now consists of a variety of tools • It’s all about resiliency • Can our application survive?
  • 15. Practice Failure Scenarios • Each team member contributes one or more scenarios • The more unlikely, the better • Write up the scenarios • Only the team leader sees them beforehand • They can be real failures experienced or thought exercises
  • 16. Practice Failure Scenarios • Describe the scenarios to the team • “Load is remaining constant but performance is gradually deteriorating. We’re starting to get 404 and related errors. The server farm seems to be operating correctly; it’s an application issue. Pings are slowing down, but not drastically.” • How do we diagnose and address?
  • 17. We Don’t Need Another Hero • Heroes use superhuman efforts to fix a disaster • In doing so, they break with team conventions • Teams function better together • If a team has a hero: • the team may not try as hard in the future • the hero is not replicable • the hero can’t solve every problem
  • 18. Monitor Application Health in Production • Ping just doesn’t cut it any more • Availability and performance data • Synthetic testing • Health over time • Track trends of performance, page painting, database calls • Whatever might give you health trends
  • 19. Directions for Monitoring • Watermarks for action • E.g., 25 percent of pages take longer than 2 seconds to load • AI for prediction • Based on similar results in the past, the application is likely to fail in six hours • Analytics for trends • A combination of six measures indicates unhealthy trends
  • 20. The Power of Checklists • Checklists are part of our daily lives • They • relieve the cognitive load of remembering to do’s • organize complicated decision-making • reduce risk in complicated activities by ensuring that critical tasks are not overlooked.
  • 22. Using Checklists in DevOps • Checklists can be used to: • Replace Test Cases • Supplement Test Cases • Verify Entry and Exit Criteria • Sanity Testing • Ambiguity Reviews • Dev Estimates
  • 23. Types of Checklists • Project Set Up • Application Specific Regression • Process type specific • Website Graphics • Browser Dependencies • Usability checks
  • 24. What Does Thinking Of Failure Accomplish? • Failure doesn’t come as a surprise • It does so all too often • We have procedures to deal with failure • We have practice dealing with failure • Failure is just another day at the office
  • 25. A Final Lesson • You are not alone
  • 26. Conclusions • Things will go wrong • Don’t yell or panic • Practice non-conforming situations regularly • Make up unlikely scenarios; chances are they will happen • Structured practices and communications may make work boring, but they help when things start going wrong • Ease into chaos engineering for resiliency • Use your experiences to create checklists