SlideShare a Scribd company logo
The Troubleshooting Chart 
James Wing 
Seattle DevOps Meetup 
September 30, 2014
Agenda 
• The Troubleshooting Chart 
• Process 
• Mindset 
• Chart Components 
– Success 
– Timeline 
• Communication
Data Feeds 
Successful Feeds 
Time 
100K 
0 
Break 
Fix
The Troubleshooting Chart 
Success Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix
Typical Process 
① “Try it again” 
② Reboot, restart, refresh, etc. 
③ Look for obvious errors 
④ Search knowledge base, Google 
⑤ Check recent release notes 
⑥ Draw a chart!
Example: Late Reports 
Customers Developers 
Reports are not available until 
late in the day 
Logs show ETL processing 
finished at 9 AM. No errors. 
It’s 2 PM and still not ready 
Aaahh!
Superstition
Superstition vs. Denial 
Customer 
• Expectations of how it 
should work 
• Experience of how it 
does or does not work 
Developer 
• Superstition! 
• No repro steps? 
• Which component (dev)? 
• What is the 
impact/priority?
Late Reports 
Success Rate 
Ready by 10 AM ? 
Time 
100% 
Break? 
0% Now it’s Busted
Belief
Success Rate 
Success Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix
Why Success Rate? 
• Customers complain about lack of success 
– Not errors 
– Not system problems 
• Triaging the complaint means 
– Understanding what is not succeeding now 
– Historical context of success 
– Identifying what a fix must do
Partial vs. Complete Failure 
Success Rate 
Time 
100% 
0%
Find 100% of Failure 
System Status Incident Scope 
99% success 
All customers 
All features 
100% failure 
Customers X, Y 
Feature Z 
Success Rate 
Time 
100% 
0% 
Success Rate 
Time 
100% 
0%
The Error Chart 
Error Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix
Success vs. Errors 
Error Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix 
Success Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix
Error Data 
Good 
• Errors (might) indicate root cause 
• By Geeks, for Geeks 
Bad 
• Errors != Absence of Success 
• Poor data quality 
• No impact 
Ugly 
• Signal-to-Noise Ratio 
• Technical artifacts
Success Data Sources 
• Ideal sources 
– Transaction or Operational data stores 
– Custom data collection 
– Logs? 
• Ideal form 
– 1 record per attempt 
– Date, time, duration 
– Outcome as success or failure 
– Attributes for customer, feature, 
process step, etc.
Timeline 
Success Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Working Again 
Break 
Fix
You Are Here 
Success Rate 
Time 
100% 
0% 
Used to Work 
Now it’s Busted 
Break 
? 
Working Again 
Fix 
Complaint
Example: FTP 
Problem 
• Integration solution using customized FTP 
• Intermittent failure reported, “Sometimes it 
works!” 
Data 
• FTP logs 
• Application exceptions
FTP – Initial Fix 
Success Rate 
Time 
100% 
0% 
Working Again! 
Fix 
Working…Better?
FTP – Starting Over 
Success Rate 
Time 
100% 
0% 
Used to Work Still Busted
FTP – More Fixes 
Success Rate 
Time 
100% 
0% 
Working Again! 
Fix 1 
Fix 2 
Fix 3 
Fix 4 
Used to Work Still Busted
Communication 
• Charts make you look smart 
• Confirm issue with customer 
• Help other departments, customers 
• Supervise troubleshooting
Blame
Wrapping Up 
• Troubleshooting = Success Rate + Timeline 
• Success not the same as absence of Errors 
• Superstition and Belief 
• Charts make you look smart 
• Verify fixes

More Related Content

What's hot

1.introduction to signals
1.introduction to signals1.introduction to signals
1.introduction to signals
INDIAN NAVY
 
Rc phase shift oscillator
Rc phase shift oscillatorRc phase shift oscillator
Rc phase shift oscillator
Raj Mehra Mehar
 
Feedback amplifiers
Feedback amplifiersFeedback amplifiers
Feedback amplifiers
ForwardBlog Enewzletter
 
Scr basics
Scr   basicsScr   basics
Root locus compensation
Root locus compensationRoot locus compensation
Root locus compensation
Ramaiahsubasri
 
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTESPIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTES
Dr.YNM
 
Pulse width modulation (PWM)
Pulse width modulation (PWM)Pulse width modulation (PWM)
Pulse width modulation (PWM)
amar pandey
 
Unit 2 resonance circuit
Unit 2 resonance circuitUnit 2 resonance circuit
Unit 2 resonance circuit
ACE ENGINEERING COLLEGE
 
Feedback amplifiers
Feedback  amplifiersFeedback  amplifiers
Feedback amplifiers
Harit Mohan
 
Power electronics note
Power electronics notePower electronics note
Power electronics note
ravalgautu
 
Function generator
Function generatorFunction generator
Function generator
Poojith Chowdhary
 
Ac fundamentals
Ac fundamentalsAc fundamentals
Ac fundamentals
University of Potsdam
 
Nyquist stability criterion
Nyquist stability criterionNyquist stability criterion
Nyquist stability criterion
jawaharramaya
 
Fundamentals of electrical engineering
Fundamentals of electrical engineeringFundamentals of electrical engineering
Fundamentals of electrical engineering
Satish Kansal
 
Power supply
Power supplyPower supply
Star delta trsformation
Star delta trsformationStar delta trsformation
Star delta trsformation
Hem Bhattarai
 
Amplifier frequency response(part 1)
Amplifier frequency response(part 1)Amplifier frequency response(part 1)
Amplifier frequency response(part 1)
Jamil Ahmed
 
Full Wave Bridge Rectifier simulation (with/without filter capacitor)
Full Wave Bridge Rectifier simulation (with/without filter capacitor)Full Wave Bridge Rectifier simulation (with/without filter capacitor)
Full Wave Bridge Rectifier simulation (with/without filter capacitor)
Jaspreet Singh
 
Block diagram Examples
Block diagram ExamplesBlock diagram Examples
Block diagram Examples
Sagar Kuntumal
 
Flipflop
FlipflopFlipflop
Flipflop
sohamdodia27
 

What's hot (20)

1.introduction to signals
1.introduction to signals1.introduction to signals
1.introduction to signals
 
Rc phase shift oscillator
Rc phase shift oscillatorRc phase shift oscillator
Rc phase shift oscillator
 
Feedback amplifiers
Feedback amplifiersFeedback amplifiers
Feedback amplifiers
 
Scr basics
Scr   basicsScr   basics
Scr basics
 
Root locus compensation
Root locus compensationRoot locus compensation
Root locus compensation
 
PIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTESPIC MICROCONTROLLERS -CLASS NOTES
PIC MICROCONTROLLERS -CLASS NOTES
 
Pulse width modulation (PWM)
Pulse width modulation (PWM)Pulse width modulation (PWM)
Pulse width modulation (PWM)
 
Unit 2 resonance circuit
Unit 2 resonance circuitUnit 2 resonance circuit
Unit 2 resonance circuit
 
Feedback amplifiers
Feedback  amplifiersFeedback  amplifiers
Feedback amplifiers
 
Power electronics note
Power electronics notePower electronics note
Power electronics note
 
Function generator
Function generatorFunction generator
Function generator
 
Ac fundamentals
Ac fundamentalsAc fundamentals
Ac fundamentals
 
Nyquist stability criterion
Nyquist stability criterionNyquist stability criterion
Nyquist stability criterion
 
Fundamentals of electrical engineering
Fundamentals of electrical engineeringFundamentals of electrical engineering
Fundamentals of electrical engineering
 
Power supply
Power supplyPower supply
Power supply
 
Star delta trsformation
Star delta trsformationStar delta trsformation
Star delta trsformation
 
Amplifier frequency response(part 1)
Amplifier frequency response(part 1)Amplifier frequency response(part 1)
Amplifier frequency response(part 1)
 
Full Wave Bridge Rectifier simulation (with/without filter capacitor)
Full Wave Bridge Rectifier simulation (with/without filter capacitor)Full Wave Bridge Rectifier simulation (with/without filter capacitor)
Full Wave Bridge Rectifier simulation (with/without filter capacitor)
 
Block diagram Examples
Block diagram ExamplesBlock diagram Examples
Block diagram Examples
 
Flipflop
FlipflopFlipflop
Flipflop
 

Similar to The Troubleshooting Chart

Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'
DevOpsDays DFW
 
PI Boot Camp 2015.06 Participant Packet
PI Boot Camp 2015.06 Participant PacketPI Boot Camp 2015.06 Participant Packet
PI Boot Camp 2015.06 Participant Packet
Mike Rudolf
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Keet Sugathadasa
 
dev@InterConnect workshop - Lean and DevOps
dev@InterConnect workshop - Lean and DevOpsdev@InterConnect workshop - Lean and DevOps
dev@InterConnect workshop - Lean and DevOps
Sanjeev Sharma
 
The Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow ApplicationsThe Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow Applications
Dreamforce
 
PA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdfPA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdf
hulk smash
 
Pmg tag bpm_presentation
Pmg tag bpm_presentationPmg tag bpm_presentation
Pmg tag bpm_presentation
Melanie Brandt
 
Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18
eamador1
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
DevOps.com
 
Understanding Process Mining & Its Applications
Understanding Process Mining & Its ApplicationsUnderstanding Process Mining & Its Applications
Understanding Process Mining & Its Applications
Navish Agarwal
 
IPO Model PowerPoint Presentation Slides
IPO Model PowerPoint Presentation Slides IPO Model PowerPoint Presentation Slides
IPO Model PowerPoint Presentation Slides
SlideTeam
 
Six Sigma Overview
Six Sigma OverviewSix Sigma Overview
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
Smart ERP Solutions, Inc.
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
Randy Shoup
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Defect Metrics for Organization and Project Health
Defect Metrics for Organization and Project HealthDefect Metrics for Organization and Project Health
Defect Metrics for Organization and Project Health
Josiah Renaudin
 
Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecamp
Craig Sullivan
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)
Ontico
 
Goal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter ZaitsevGoal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter Zaitsev
Fuenteovejuna
 
Unlock the Power of the Salesforce Service Cloud
Unlock the Power of the Salesforce Service CloudUnlock the Power of the Salesforce Service Cloud
Unlock the Power of the Salesforce Service Cloud
Perficient, Inc.
 

Similar to The Troubleshooting Chart (20)

Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'Hidden Costs of Chasing the Mythical 'Five Nines'
Hidden Costs of Chasing the Mythical 'Five Nines'
 
PI Boot Camp 2015.06 Participant Packet
PI Boot Camp 2015.06 Participant PacketPI Boot Camp 2015.06 Participant Packet
PI Boot Camp 2015.06 Participant Packet
 
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet SugathadasaSite Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
Site Reliability Engineering (SRE) - Tech Talk by Keet Sugathadasa
 
dev@InterConnect workshop - Lean and DevOps
dev@InterConnect workshop - Lean and DevOpsdev@InterConnect workshop - Lean and DevOps
dev@InterConnect workshop - Lean and DevOps
 
The Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow ApplicationsThe Business of Flow - Point and Click Workflow Applications
The Business of Flow - Point and Click Workflow Applications
 
PA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdfPA2557_SQM_Lecture7 - Defect Prevention.pdf
PA2557_SQM_Lecture7 - Defect Prevention.pdf
 
Pmg tag bpm_presentation
Pmg tag bpm_presentationPmg tag bpm_presentation
Pmg tag bpm_presentation
 
Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18Better Business Process - PBI World Tour Charlotte '18
Better Business Process - PBI World Tour Charlotte '18
 
Using Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS PlatformUsing Time Series for Full Observability of a SaaS Platform
Using Time Series for Full Observability of a SaaS Platform
 
Understanding Process Mining & Its Applications
Understanding Process Mining & Its ApplicationsUnderstanding Process Mining & Its Applications
Understanding Process Mining & Its Applications
 
IPO Model PowerPoint Presentation Slides
IPO Model PowerPoint Presentation Slides IPO Model PowerPoint Presentation Slides
IPO Model PowerPoint Presentation Slides
 
Six Sigma Overview
Six Sigma OverviewSix Sigma Overview
Six Sigma Overview
 
Working Effectively with PeopleSoft Support
Working Effectively with PeopleSoft SupportWorking Effectively with PeopleSoft Support
Working Effectively with PeopleSoft Support
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Defect Metrics for Organization and Project Health
Defect Metrics for Organization and Project HealthDefect Metrics for Organization and Project Health
Defect Metrics for Organization and Project Health
 
Why do my AB tests suck? measurecamp
Why do my AB tests suck?   measurecampWhy do my AB tests suck?   measurecamp
Why do my AB tests suck? measurecamp
 
Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)Goal driven performance optimization (Пётр Зайцев)
Goal driven performance optimization (Пётр Зайцев)
 
Goal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter ZaitsevGoal Driven Performance Optimization, Peter Zaitsev
Goal Driven Performance Optimization, Peter Zaitsev
 
Unlock the Power of the Salesforce Service Cloud
Unlock the Power of the Salesforce Service CloudUnlock the Power of the Salesforce Service Cloud
Unlock the Power of the Salesforce Service Cloud
 

Recently uploaded

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Neo4j
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
Edge AI and Vision Alliance
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 

Recently uploaded (20)

The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid ResearchHarnessing the Power of NLP and Knowledge Graphs for Opioid Research
Harnessing the Power of NLP and Knowledge Graphs for Opioid Research
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
“How Axelera AI Uses Digital Compute-in-memory to Deliver Fast and Energy-eff...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Artificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic WarfareArtificial Intelligence and Electronic Warfare
Artificial Intelligence and Electronic Warfare
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 

The Troubleshooting Chart

  • 1. The Troubleshooting Chart James Wing Seattle DevOps Meetup September 30, 2014
  • 2. Agenda • The Troubleshooting Chart • Process • Mindset • Chart Components – Success – Timeline • Communication
  • 3. Data Feeds Successful Feeds Time 100K 0 Break Fix
  • 4. The Troubleshooting Chart Success Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix
  • 5. Typical Process ① “Try it again” ② Reboot, restart, refresh, etc. ③ Look for obvious errors ④ Search knowledge base, Google ⑤ Check recent release notes ⑥ Draw a chart!
  • 6. Example: Late Reports Customers Developers Reports are not available until late in the day Logs show ETL processing finished at 9 AM. No errors. It’s 2 PM and still not ready Aaahh!
  • 8. Superstition vs. Denial Customer • Expectations of how it should work • Experience of how it does or does not work Developer • Superstition! • No repro steps? • Which component (dev)? • What is the impact/priority?
  • 9. Late Reports Success Rate Ready by 10 AM ? Time 100% Break? 0% Now it’s Busted
  • 11. Success Rate Success Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix
  • 12. Why Success Rate? • Customers complain about lack of success – Not errors – Not system problems • Triaging the complaint means – Understanding what is not succeeding now – Historical context of success – Identifying what a fix must do
  • 13. Partial vs. Complete Failure Success Rate Time 100% 0%
  • 14. Find 100% of Failure System Status Incident Scope 99% success All customers All features 100% failure Customers X, Y Feature Z Success Rate Time 100% 0% Success Rate Time 100% 0%
  • 15. The Error Chart Error Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix
  • 16. Success vs. Errors Error Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix Success Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix
  • 17. Error Data Good • Errors (might) indicate root cause • By Geeks, for Geeks Bad • Errors != Absence of Success • Poor data quality • No impact Ugly • Signal-to-Noise Ratio • Technical artifacts
  • 18. Success Data Sources • Ideal sources – Transaction or Operational data stores – Custom data collection – Logs? • Ideal form – 1 record per attempt – Date, time, duration – Outcome as success or failure – Attributes for customer, feature, process step, etc.
  • 19. Timeline Success Rate Time 100% 0% Used to Work Now it’s Busted Working Again Break Fix
  • 20. You Are Here Success Rate Time 100% 0% Used to Work Now it’s Busted Break ? Working Again Fix Complaint
  • 21. Example: FTP Problem • Integration solution using customized FTP • Intermittent failure reported, “Sometimes it works!” Data • FTP logs • Application exceptions
  • 22. FTP – Initial Fix Success Rate Time 100% 0% Working Again! Fix Working…Better?
  • 23. FTP – Starting Over Success Rate Time 100% 0% Used to Work Still Busted
  • 24. FTP – More Fixes Success Rate Time 100% 0% Working Again! Fix 1 Fix 2 Fix 3 Fix 4 Used to Work Still Busted
  • 25. Communication • Charts make you look smart • Confirm issue with customer • Help other departments, customers • Supervise troubleshooting
  • 26. Blame
  • 27. Wrapping Up • Troubleshooting = Success Rate + Timeline • Success not the same as absence of Errors • Superstition and Belief • Charts make you look smart • Verify fixes

Editor's Notes

  1. Why and how to use data in troubleshooting DevOps – Metrics, Collaboration, Feedback James Monitoring Startup Mercent
  2. Troubleshooting process Using data Mindset Badmouth developers
  3. History Spoiled by data Generalized form
  4. Apply to other issues Success Rate vs. Time Activity and Product, important part of a nutritious process Visualize over calculate (not a math test) Structure troubleshooting Understand and communicate impact
  5. Break/fix issue No guidance for structured path to root cause Escalate? Easy vs. Hard based on these steps Hard = Intermittent, partial, degraded perf, cross-system
  6. Customer says reports not ready Dev says processing complete, has log entry to prove it Recrimination, personal attacks
  7. Temptation to ridicule users User observations without insight to internals Indicator that you need a chart
  8. Lessons – Denial result of Superstition More than just ridicule, barriers to getting Dev help Some denial seems reasonable Hard to get help from Developers
  9. How to chart this? % of compliance with required 10 AM No history, we didn’t measure it SHOCK – users were right? Used to work?
  10. Mindset - How can I find issues Devs cannot? Cannot find problems you KNOW aren’t there At end of issue, seems crazy you didn’t notice problems Devs will fix when they believe in problem CSI vs. X-Files Believe first - build chart for now, go through motions Success Rate…
  11. Axis detail…Success Rate Apply belief in initial stages of investigation Past and present success rate
  12. Customers perception of success Will ignore errors if they can Make errors go away Fix must do what?
  13. Unless it’s BAD, hard to see issue Good news, the rest of your system works fine
  14. Impact assessment Future fix verification Understand scope of the issue Promote urgency / prevent panic
  15. Why not errors?
  16. Aren’t these the same charts, just inverted? Maybe… Yes - only if they are based on the same underlying data set No - if they based on different data, for example Success data from an operational or transactional store Error data from log files
  17. Cultural bias towards geeky error logs Devs don’t do Excel Do invest in error data quality, helps in quick steps above
  18. Where is success stored? “Order Table” HARD to know what success really is Order feed succeeds, but doesn’t contain “all” of the orders
  19. Timeline as key to troubleshooting ‘puzzle’ Start with much less information
  20. Success investigation -> Used to work Why complain now? Error appearance and disappearance Find EXACTLY when it broke – Release, maintenance, partner something Match to possible explanations Who broke something? Vs. Who broke the X feature last night around 11:15 PM? Consistent explanation at END, chart does not stop at fix
  21. Found several problems over 3 months 1 of 3 web servers configured wrong New customer setup improvements Several bug fixes
  22. Classic mistake - announcing ‘fix’ before proof
  23. Time to start over Not sure about “Used to Work” Find 100%? Confusing… Small data set is hard – New customers, debugging, different needs Scripted file upload every 15 minutes – data quality
  24. Many fixes – more like process improvement Never too late to start Always verify fixes with chart
  25. Overestimation of smart Demonstrate control I feel your pain Click tracking example
  26. Supposed to say… Inevitable, sort of. Not you-blame-them, get confession Need root cause, extract confession Developers perfectionist Junior programmers, no points Retrospectives?
  27. Act smart? Superstion as trigger, Belief as prereq. to fix