SlideShare a Scribd company logo
1 of 20
Download to read offline
Root Cause Analysis
Fact and Fiction
Dustin Collins @dustinmm80
CyberArk Conjur
Origins of Root Cause Analysis
- First implemented in 1958 in Toyota manufacturing plants
(5 Whys)
- Has since been adopted and tailored to many other
industries
Why do RCA?
To better understand the underlying causes of problems so
we can address them and prevent them happening again.
When to do RCA?
- When issues happen more than once
- When an outage affects many users
- When a system is not functioning as designed
5 Whys
- Start with a problem statement
- Ask why
- Repeat until root cause is found
Issues with 5 Whys
Incorrect or leading problem statement can point to the wrong
issue.
Not very useful in complex situations, where you can’t answer
why in the moment.
Issues with 5 Whys
It’s not repeatable
- Different people may get different results
- Same people at a different time may get different results
Issues with 5 Whys
Linear thinking leads teams to drive towards one root cause.
There is usually more than one root cause.
Human error is not a valid root cause.
Biases
Hindsight bias
- “Should have known better”.
Outcome bias
- “Since the outcome was bad, the plan was bad”.
What changed?
Cynefin
Conceptual framework created by Dave Snowden to organize
intellectual capital at IBM.
Uses quadrants to organize problems by complexity and
suggests a course of action.
Cynefin
quadrant
steps to take
causality
knowledge
Cynefin
EntropyProgress
Simple/Obvious
Sense Categorize Respond
Alert: Disk almost full Disk full Prune Docker images
Internal webservice is not responding
Complicated
Sense Analyze Respond
Failing pipelines ran at
same time as other builds
Concurrent pipeline
collisions cause node
resource contention
Update pipelines to splay
across nodes
Increasing CI pipeline failures, happens only during the day
Complex
Probe Sense Respond
- Investigate logs
- Inspect exponential
backoff code
- Run load tests
- View existing alerts
- Analyze performance from
load test
Add backpressure throttling
to discovery service
Discovery service became overloaded, triggering cascading failure in
other services
Chaotic
Act Sense Respond
Move instances to another
AZ
Still unable to connect Cannot connect in new AZ
either
Unable to connect to instances in AWS us-east-1a. No AWS service
warnings.
Disorder
Reduce Analyze Iterate
What do we know for sure? What do we agree on? Move to a quadrant,
continue
Stems from a lack of agreement on the problem
Takeaways
- Be aware of the limitations of the RCA techniques you use
- Emergent behavior arises from complexity and increased rate of change
- Consider trying Cynefin to help you approach complex problems
Thank you!
Dustin Collins @dustinmm80

More Related Content

What's hot

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Henerey
 
How effective feedback can improve your software
How effective feedback can improve your softwareHow effective feedback can improve your software
How effective feedback can improve your softwareSven Peters
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from AnsibleGreg DeKoenigsberg
 
Reliable tests with selenium web driver
Reliable tests with selenium web driverReliable tests with selenium web driver
Reliable tests with selenium web driverPawelPabich
 
DevOps vs The Enterprise
DevOps vs The EnterpriseDevOps vs The Enterprise
DevOps vs The EnterpriseCloudCheckr
 
Silver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSilver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSeniorStoryteller
 
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionAgile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionMichael Palotas
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
OSGi Best and Worst Practices
OSGi Best and Worst PracticesOSGi Best and Worst Practices
OSGi Best and Worst PracticesChris Aniszczyk
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsBert Jan Schrijver
 
DevOps - Automating Legacy
DevOps - Automating LegacyDevOps - Automating Legacy
DevOps - Automating LegacyDavid Tank
 
Breathing the breath of the monster combining agile and context-driven
Breathing the breath of the monster   combining agile and context-drivenBreathing the breath of the monster   combining agile and context-driven
Breathing the breath of the monster combining agile and context-drivenIlari Henrik Aegerter
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run itSkyscanner
 
A Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileA Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileIlari Henrik Aegerter
 
Agile Software Development for Non-Developers
Agile Software Development for Non-DevelopersAgile Software Development for Non-Developers
Agile Software Development for Non-Developershamvocke
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos EngineeringYury Roa
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureAtlassian
 

What's hot (20)

Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
How effective feedback can improve your software
How effective feedback can improve your softwareHow effective feedback can improve your software
How effective feedback can improve your software
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from Ansible
 
Reliable tests with selenium web driver
Reliable tests with selenium web driverReliable tests with selenium web driver
Reliable tests with selenium web driver
 
DevOps vs The Enterprise
DevOps vs The EnterpriseDevOps vs The Enterprise
DevOps vs The Enterprise
 
Silver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security SolutionsSilver Lining for Miles: DevOps for Building Security Solutions
Silver Lining for Miles: DevOps for Building Security Solutions
 
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detectionAgile bodensee - Agile Testing: Bug prevention vs. bug detection
Agile bodensee - Agile Testing: Bug prevention vs. bug detection
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Shift left-testing
Shift left-testingShift left-testing
Shift left-testing
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
OSGi Best and Worst Practices
OSGi Best and Worst PracticesOSGi Best and Worst Practices
OSGi Best and Worst Practices
 
JavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systemsJavaLand 2022 - Debugging distributed systems
JavaLand 2022 - Debugging distributed systems
 
DevOps - Automating Legacy
DevOps - Automating LegacyDevOps - Automating Legacy
DevOps - Automating Legacy
 
Breathing the breath of the monster combining agile and context-driven
Breathing the breath of the monster   combining agile and context-drivenBreathing the breath of the monster   combining agile and context-driven
Breathing the breath of the monster combining agile and context-driven
 
You build it, you run it
You build it, you run itYou build it, you run it
You build it, you run it
 
A Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and AgileA Happy Marriage between Context-Driven and Agile
A Happy Marriage between Context-Driven and Agile
 
Agile Software Development for Non-Developers
Agile Software Development for Non-DevelopersAgile Software Development for Non-Developers
Agile Software Development for Non-Developers
 
Chaos Engineering
Chaos EngineeringChaos Engineering
Chaos Engineering
 
How to Build a Healthy On-Call Culture
How to Build a Healthy On-Call CultureHow to Build a Healthy On-Call Culture
How to Build a Healthy On-Call Culture
 
Delhi first draft
Delhi first draftDelhi first draft
Delhi first draft
 

Similar to Root Cause Analysis: Fact and Fiction

Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopJAXLondon2014
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0Joakim Lindbom
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to ProductionKarthik Gaekwad
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users AnonymousDave Haeffner
 
Scaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearScaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearTechWell
 
Building an automated database deployment pipeline
Building an automated database deployment pipelineBuilding an automated database deployment pipeline
Building an automated database deployment pipelineRed Gate Software
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)ClubHack
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Citrix
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosCharity Majors
 
Green Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsGreen Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsBrian Leach
 
Using XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyUsing XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyniksilver
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroPaul Boos
 
Moving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeMoving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeXebiaLabs
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...Joakim Lindbom
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedTyler Treat
 
DevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDavid Hsu
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-outmralexjuarez
 
How to test a Mainframe Application
How to test a Mainframe ApplicationHow to test a Mainframe Application
How to test a Mainframe ApplicationMichael Erichsen
 
DevOps Transition Strategies
DevOps Transition StrategiesDevOps Transition Strategies
DevOps Transition StrategiesAlec Lazarescu
 

Similar to Root Cause Analysis: Fact and Fiction (20)

Testing within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris GollopTesting within an Agile Environment - Beyza Sakir and Chris Gollop
Testing within an Agile Environment - Beyza Sakir and Chris Gollop
 
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.02014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
2014-10 DevOps NFi - Why it's a good idea to deploy 10 times per day v1.0
 
Kanban 101
Kanban 101Kanban 101
Kanban 101
 
30 days or less: New Features to Production
30 days or less: New Features to Production30 days or less: New Features to Production
30 days or less: New Features to Production
 
Selenium Users Anonymous
Selenium Users AnonymousSelenium Users Anonymous
Selenium Users Anonymous
 
Scaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without FearScaling Your Tests: Continued Change Without Fear
Scaling Your Tests: Continued Change Without Fear
 
Building an automated database deployment pipeline
Building an automated database deployment pipelineBuilding an automated database deployment pipeline
Building an automated database deployment pipeline
 
Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)Mere Paas Teensy Hai (Nikhil Mittal)
Mere Paas Teensy Hai (Nikhil Mittal)
 
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
Synergy 2015 Session Slides: SYN408 XenDesktop 7.6 Architecture - Dealing Wit...
 
Chaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just ChaosChaos Engineering Without Observability ... Is Just Chaos
Chaos Engineering Without Observability ... Is Just Chaos
 
Green Screen ci at Travis Perkins
Green Screen ci at Travis PerkinsGreen Screen ci at Travis Perkins
Green Screen ci at Travis Perkins
 
Using XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technologyUsing XP practices on 1960s green screen technology
Using XP practices on 1960s green screen technology
 
DevOps - Boldly Go for Distro
DevOps - Boldly Go for DistroDevOps - Boldly Go for Distro
DevOps - Boldly Go for Distro
 
Moving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your CodeMoving to Continuous Delivery Without Breaking Your Code
Moving to Continuous Delivery Without Breaking Your Code
 
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
2015 10 dev ops n-fi - why it's a good idea to deploy 10 times per day v1.0 -...
 
The Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going DistributedThe Economics of Scale: Promises and Perils of Going Distributed
The Economics of Scale: Promises and Perils of Going Distributed
 
DevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on KubernetesDevOps - Chaos Engineering on Kubernetes
DevOps - Chaos Engineering on Kubernetes
 
Ohio 2012-help-sysad-out
Ohio 2012-help-sysad-outOhio 2012-help-sysad-out
Ohio 2012-help-sysad-out
 
How to test a Mainframe Application
How to test a Mainframe ApplicationHow to test a Mainframe Application
How to test a Mainframe Application
 
DevOps Transition Strategies
DevOps Transition StrategiesDevOps Transition Strategies
DevOps Transition Strategies
 

Recently uploaded

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Recently uploaded (20)

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 

Root Cause Analysis: Fact and Fiction

  • 1. Root Cause Analysis Fact and Fiction Dustin Collins @dustinmm80 CyberArk Conjur
  • 2. Origins of Root Cause Analysis - First implemented in 1958 in Toyota manufacturing plants (5 Whys) - Has since been adopted and tailored to many other industries
  • 3. Why do RCA? To better understand the underlying causes of problems so we can address them and prevent them happening again.
  • 4. When to do RCA? - When issues happen more than once - When an outage affects many users - When a system is not functioning as designed
  • 5. 5 Whys - Start with a problem statement - Ask why - Repeat until root cause is found
  • 6. Issues with 5 Whys Incorrect or leading problem statement can point to the wrong issue. Not very useful in complex situations, where you can’t answer why in the moment.
  • 7. Issues with 5 Whys It’s not repeatable - Different people may get different results - Same people at a different time may get different results
  • 8. Issues with 5 Whys Linear thinking leads teams to drive towards one root cause. There is usually more than one root cause. Human error is not a valid root cause.
  • 9. Biases Hindsight bias - “Should have known better”. Outcome bias - “Since the outcome was bad, the plan was bad”.
  • 11. Cynefin Conceptual framework created by Dave Snowden to organize intellectual capital at IBM. Uses quadrants to organize problems by complexity and suggests a course of action.
  • 14. Simple/Obvious Sense Categorize Respond Alert: Disk almost full Disk full Prune Docker images Internal webservice is not responding
  • 15. Complicated Sense Analyze Respond Failing pipelines ran at same time as other builds Concurrent pipeline collisions cause node resource contention Update pipelines to splay across nodes Increasing CI pipeline failures, happens only during the day
  • 16. Complex Probe Sense Respond - Investigate logs - Inspect exponential backoff code - Run load tests - View existing alerts - Analyze performance from load test Add backpressure throttling to discovery service Discovery service became overloaded, triggering cascading failure in other services
  • 17. Chaotic Act Sense Respond Move instances to another AZ Still unable to connect Cannot connect in new AZ either Unable to connect to instances in AWS us-east-1a. No AWS service warnings.
  • 18. Disorder Reduce Analyze Iterate What do we know for sure? What do we agree on? Move to a quadrant, continue Stems from a lack of agreement on the problem
  • 19. Takeaways - Be aware of the limitations of the RCA techniques you use - Emergent behavior arises from complexity and increased rate of change - Consider trying Cynefin to help you approach complex problems