SlideShare a Scribd company logo
1 of 115
Download to read offline
What We Can Learn from Big Bugs that Got Away 
Ken Johnston, Group ManagerOffice, Internet Platforms & Operation 
EuroSTAR2010
I Want to know more about YOU 
•Who wandered in here by accident 
•Who is at EuroSTARfor the first time 
•How long have you been in Software Testing 
•Have you ever missed a bug 
•Have you ever heard…
“HOW COULD YOU MISS THAT BUG!!!”
Def. –Rolling around in something disgusting
Ken’s 
Big 
Bug 
Story
It all began one dark and stormy night!
Session Overview 
•About you, me and setting the tone 
•Bug Wallowing 1 –A self reflective journey 
•Bug Wallowing 2 –Group Therapy 
•Root Cause Analysis 101 
▫Sentinel Events 
▫Pattern Analysis 
▫Formal RCA program overview 
•Bug Wallowing 3 
•Five Whys 
•Bug Wallow 4 
•Fishbone 
•Bug Wallowing 5 
•Crafting a good bug story 
P 
P
Learning Objectives 
1.Be armed to deal with the question, “How did test miss this bug.” 
2.Learn a little about formal RCA and the use of the 5 Whys and Fishbone tools 
3.Have a number of highly instructive bug stories from within your organization that you can take home
Def. –Roll in something: to lie down and roll around in something
“HOW COULD YOU MISS THAT BUG!!!”
Time for some “Group Bug” Therapy
Repeat After Me 
•I did not design the bug. 
•I did not code the bug. 
•I found crashing bugs, data corruption bugs, fit and finish bugs. 
•I found hundreds of bugs.
Repeat After Me 
•So what if I missed a bug. 
•I didn’t write the bug in the first place.
Activity Share your Bug Story 
•Take the next 10 minutes 
•Groups of 2 or 3 
•Think of a bug that got away 
•Minimum One Bug story each 
•Questions to ask 
▫How long after ship did you see this 
▫How big was the impact 
▫How did it get missed 
▫What did you change because of this bug
That’s Time
Time to Share 
•Next 5 minutes or so 
•Did you have any Ah Ha moments?
Why do we Wallow in Bugs that got away? 
•Take 3-5 minutes to discuss in your groups
That’s time
Time to Share 
•What did you come up with? 
•Why do we wallow? 
•Why do we RCA bugs? 
•My List 
▫To learn from mistakes 
▫To systematically identify areas for improvement 
▫To prevent repetition of mistakes 
▫Bugs are stories and organizations are driven by the stories they tell
Firstwe need a commonbaselineto work from
Root Cause Analysis 300 Level 
•Two approaches to RCA 
▫Sentinel Event 
▫Pattern Analysis 
•Formal RCA Program 
▫Data Collection 
▫Data Analysis and Assessment 
▫Corrective Actions 
•The Pit and the Pendulum 
▫Risks of RCA 
▫Benefits of RCA 
Based upon Ch. 11 
PDF available to EuroSTAR 
attendees 
http://defectprevention.org
RCA –Sentinel Event Bugs 
•How do you know it’s a Sentinel Event Bug? 
•If you make the front page of the http://wsj.com 
•Production Outage 
▫I have a lot of these stories 
•Security vulnerabilities 
•The last bug taken before ship 
▫“How could we have missed this!” 
•Any big bug that got away 
•Nothing to do with the X-Men
RCA Pattern Analysis 
•Pattern Analysis requires a lot of bugs 
•Pattern Analysis can be done over time 
•Pattern Analysis is best served within a formal RCA Program. 
▫Cut some of the slides from this presentation 
▫The full set of slides can be found in the appendix on the EuroSTARconference website
Phases of an RCA Program 
1.Event Identification 
2.Data Collection 
3.Data Analysis and Assessment 
4.Corrective Action 
5.Inform and Apply 
6.Follow-up, measurement and reportingEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
P
Phase 2: Data CollectionExercise 
•Data Channels 5 Minute Discussion in Groups 
▫What are the sources of data in my organization 
▫Which are practical 
▫Which are the most costly to implement 
▫Which are most likely to yield results 
▫Do you have time to implement these 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
That’s time
Phase 2: Data Collection Time to Share 
•What sources did you come up with?
Phase 2: Data Collection(Sources of Data) 
•Defect and Test Case Management tracking system 
•Source code repository and Test code coverage data 
•Voice of the Customer 
▫Product support and Customer or marketing data 
▫Individual surveys and interviews 
•Findings from previous RCA Studies 
•Crash data through Windows Error Reporting 
•Services have tickets and data center telemetry 
▫Heuristic Data of live site now vs. historic 
More about WER @ https://winqual.microsoft.com/
Phase 2: Data Collection(Tracking System) 
•Prepare a list of Sentinel Events 
•Gather and Prepare the Preliminary Data 
•Route Single Event through Process 
•Create an RCA Tracking Database 
Data Elements of RCATracking System 
•Event or Study ID, Title & Dates 
•Related Defect links 
•Failure areas and Source Code 
•Timeline of events before and after (vital for services) 
•Team Contacts and Owners 
•RCA Analysts and Contacts 
•Expert Groups and Contacts 
•Cause of defect and corrective action 
•Survey Data and Resultson effectiveness of corrective action 
•Log Events in RCA system 
•Analyze events 
•NOTE: Meta Data better suited for lists, documents and shares
Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone) Good article from ASQ – http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.htmlEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase III: Data Analysis and Assessment(the Five Whys) 
•Brief History -http://en.wikipedia.org/wiki/5_Whys 
▫Developed by SakichiToyoda 
▫First used in Toyota Motor Corporation 
▫Common tool within Kaizen, Lean Manufacturing & Six Sigma 
•What is it 
▫Simply put -ask why 5 times to get to the root cause of a problem 
•Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html 
▫why was the website down? The CPU utilization on all our front-end servers went to 100% 
▫why did the CPU usage spike? A new bit of code contained an infinite loop! 
▫why did that code get written? So-and-so made a mistake 
▫why did his mistake get checked in? He didn't write a unit test for the feature 
▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Def. –indulge in something excessively: to take pleasure or be immersed in something in a self- indulgent way
Insert Bug Story Videos
Five Whys Exercise 
•Take 5-10 minutes 
•Use one of these bugs or one of your own 
•Try the five whys and see if you can find a root cause
That’s time 
One does not worry about grace or dignity
Time to Share 
•Time for about 2 examples 
•What about the 5 Whys worked for you 
•Where did it fall short?
Phase III: Data Analysis and Assessment(the Five Whys) 
•Criticism of five whys 
▫Not reproducible across individuals 
▫Shown that investigators tent do stop a symptoms rather than root cause 
▫Relies upon the investigators knowledge
•Brief History -http://en.wikipedia.org/wiki/Ishikawa_diagram 
▫Developed by Kaoru Ishikawa in the 1960s 
▫One of the 7 basic quality management tools 
•Can use with 5 whys 
▫Put each why off the first tree point 
▫Ask why for each one of these issues 
▫Keep going until you find one or more root causes 
•Some industries have common causes mapped to the fishbone 
▫Original 4 Ms–Machine, Method, Material, Man power 
▫The 8 Ps (Used in Service Industry) –People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product 
▫Ken’s List –People& Training, Tools, Inspection and supervision, Pressure or Stress, Process & Accountability, Recognition & Awareness 
Event IdentificationData Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
Phase III: Data Analysis and Assessment(Fishbone Diagram)
Pressure or Stress 
Recognition & Awareness 
Process & Accountability 
Tools 
Inspection & Supervision 
People & Training 
Brownout across 3 largest datacenters
•Deployment tool changes 
▫Warn but do not prevent multi-DC deployments 
▫Automatically generate rollback script 
▫Cross service monitors will cancel and roll back a bad deployment automatically 
•Process changes 
▫Deployment code review 
▫Deployment checklist 
▫Audits and Fire drills 
Audited all alerts, escalation aliases and contact #s 
Fire drill email and phone 
•New Tools 
▫Per-Alert fault injection 
•Recognition 
▫SWAT DRI team for most senior DRIs
Fishbone Exercise 
•Take 5-10 minutes 
•Have a handout for you 
•Use the same bug from the five whys exercise
That’s time
Time to Share 
•Time to share 
▫Who did the same bug as the five whys? 
▫Who did a different bug? 
•What about the fishbone worked for you? 
•Where did it fall short?
Phase III: Data Analysis and Assessment(the Fishbone) 
•Criticism of Fishbone 
▫Requires a lot of experts for each branch 
▫Cumbersome
Phase V: Inform and Apply 
•Host a Management Review 
▫Managers will like RCA more than bugs 
▫You are eliminating a problem not just finding it 
•Implementation is a project, treat it that way 
▫Assign Owners 
▫Build and Maintain Schedule 
▫Create a Feedback Loop 
▫Establish a Monthly Status Report 
▫Track and correct the corrective actionEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase VI: Follow-up, Measurement, and Reporting 
•More than Just 
•Six Sigma type approaches 
•Longitudinal Analysis 
▫Draws from Longitudinal Data Analysis - http://gseacademic.harvard.edu/alda/ 
▫Study Over Time 
•Develop failure types and risk areas/components 
•Inspect similar products/areas for baseline 
•Gather and inspect process data 
•Examine Data for Trends 
•Report out 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Def. –have huge amount of something: to have an ample or excessive supply of something
RCA Pit and Pendulum
Risks of Root Cause Analysis 
•Begins with inadequate data 
•Go after too much data too early 
•Draws incorrect conclusion or makes invalid recommendations 
▫Anyone experience this before 
•Focus on the wrong set of defects 
•Ends at the wrong level –too early or late 
•Investment is not always predictable 
▫Can be high cost with low ROI 
•Over focus on data can detract from the story
Benefits of Structured RCA Study 
•Can start as small pilots 
•Uses an identical process regardless of type, age or scope of defect 
•Avoids repeat failures 
•Can be the shortest path to determining and correcting causes of failure 
•Lowers Maintenance Costs 
•Builds a culture of 
▫Accountability 
▫Continuous Improvement
Achieve Balance 
•Full Blow RCA with large pattern analysis rarely meets ROI goals. 
•Limit the scope 
▫Few Data Sources 
▫Beware of the RCA Tax 
•Focus on Sentinel Events 
▫Provides opportunity for clear visible winds 
▫If it’s a bug that got away you’ll be doing a Post Mortem anyway 
▫Sentinel events provide an opportunity to change the dialogue
I’ve had enough
Telling a Tall Tale
So why a focus on Bugs that got away 
•Bugs that got away are Sentinel Events 
•They are great stories 
▫There is never an end to bugs 
•Bug Stories are Organizational Knowledge 
•Tribal Knowledge drives organizations 
•Stories are powerful change enablers
Stories Work! Biographies 
Allegories
Gloves on the Boardroom Table 
•The Heart of Change 
▫Requires an emotional component 
▫What is more emotional than “How could test miss this bug!” 
•Not all change stories involve yelling 
•Visual and tactile help too 
▫Handout of “Gloves on the boardroom table” 
▫john@SAGEKotter.com 
“I love your idea. And you have my permission.”
Organizational Development 
•I worked in Engineering Excellence 
▫We were Performance Improvement organization 
▫Enterprise Change Management 
•Let me bring in some OD concepts
Knowledge Management (KM) 
comprises a range of practices used in an organization to identify, create, represent, distribute and enable adoption of insightsand experiences. 
Such insights and experiences comprise knowledge, either embodied in individualsor embedded in organizationalprocessesor practice. 
http://en.wikipedia.org/wiki/Knowledge_management
What are Organizations Made of? 
PEOPLE
What do people do? 
Talk about stuff
Tribal Knowledge 
Institutional memoryis a collective set of facts, concepts, experiences and know-howheld by a group of people. 
http://en.wikipedia.org/wiki/Institutional_memory
Organizational Storytelling 
The study of organizational storytelling, sometimes called “Narrative Knowledge,” attempts to recount events in the form of a storywithin the context of an organization 
http://en.wikipedia.org/wiki/Organizational_Storytelling
So, what is a bug story? 
be part of the Organizational Narrative Knowledge 
that should…
Springboard Story 
•Very simple, very quick, very brief 
▫Think elevator ride 
•Non-threatening 
•Enables listener to visualize 
•Catalyzes understanding 
•Spark new stories in the mind 
•Do not transfer large amounts of information
Story Telling Tips 
•Brain’s are not computers 
▫Brain Movies –“The brain assembles perceptions by the simultaneous interaction of whole concepts, whole images.” 
•The Central Movie –a country or organization 
▫Universal Principles –freedom, democracy, constitutional government 
▫Long-term goals –education, “life, liberty, pursuit of happiness” 
▫Operating methods –free markets, due process, federal and state governments 
•Capture the Audience 
▫“One time there was this bug we missed…” 
•3D Story Telling pg85-87 
▫Details (facts, information) 
▫Dialogue (characters) 
▫Drama (a bug that got away?) Brain Movies, The Central Movie, and 3D Story Telling from“The Leader’s Voice”
Our Last Exercise! 
•Your own bug story in 10 minutes 
▫Take 10 minutes outlining your story 
▫Goal is a 1-2 minute story 
Think short and tight 
•Remember to 
▫Hook the audience 
▫3D Storytelling –Details, Dialogue, Drama 
▫RCA –what change do you want to convey?
My Bug Story -Template 
•Title 
•The Hook 
•Details –Who, what, when, product/project 
•Dialogue –Yelling, Crying, Funny? 
•Drama –What is the tension? Anyone Fired? 
•What were the Root Causes 
•What did you change and why?
That’s lunch time
Time to Share 
•3 volunteers to come up and tell their bug story
Resources 
•“The Leader’s Guide to Storytelling” by Steve Denning 
▫Resources –http://www.stevedenning.com/launchgifts.html 
▫Audio Interview -The knowledge-based organization: Using stories to embody and transfer knowledge 
http://www.storytellingwithchildren.com/2008/01/12/steve- denning-the-knowledge-based-organization/ 
•“The Leader’s Voice” by Crossland& Clark 
▫http://roncrossland.com/ 
•Defect Prevention Chapter 11 RCA 
▫http://defectprevention.org 
•“The Heart of Change” by Cr. John P. Kotter 
▫Gloves story can be found on pages 11-12 http://www.linkageinc.com/pdfs/disl/KotterPG.pdf
http://www.hwtsam.comhttp://blogs.msdn.com/kenjhttp://twitter.com/rkjohnston 
Chapter 14 (Software + Services Testing) from “How We Test Software at Microsoft” provided on conference CD courtesy of Microsoft Press 
Ken Johnston –Microsoft STARWest2009 Tutorial TJ 
What We Can Learn from Big Bugs that Got Away
Appendix 
•What follows are a series of slides to teach RCA. 
•Some of the slides are integrated in this tutorial on Bugs that Got Away but not all.
Firstwe need a commonbaselineto work from
Root Cause Analysis 300 Level 
•Two approaches to RCA 
▫Sentinel Event 
▫Pattern Analysis 
•Formal RCA Program 
▫When to do an RCA Study 
▫Staffing for Success 
▫Phases of an RCA Study 
•The Pit and the Pendulum 
▫Risks of RCA 
▫Benefits of RCA 
Based upon Ch. 11 
http://defectprevention.org
RCA Sentinel Event 
A sentinel eventis defined by the Joint Commission on Accreditation of Healthcare Organizations(JCAHO) as any unanticipated event in a healthcare setting resulting in death or serious physical or psychologicalinjuryto a personor persons, 
http://en.wikipedia.org/wiki/Sentinel_event
RCA –The Sentinel Event of Bugs 
•Home Page of http://wsj.com 
•Production Outage 
▫I have a lot of these stories 
•Security vulnerabilities 
•The last bug taken before ship 
•“How could we have missed this!” 
•Big Bugs that Got Away
RCA –Office 14 Sentinel Bug Process 
•Why SharePoint as the repository 
▫Attachments 
▫Collaborating 
▫Workflow 
▫Reporting Dash 
▫Wiki 
▫Exchange contacts 
▫Offline 
•Simple Light Weight Approach 
•Focus on recall class bugs from O14 Beta 1 
▫Will need the answers anyway to get through triage 
▫Usually logged in the bug but not easy to find or learn from 
▫No consistent process across teams 
•Develop a common template in Word 
•Track on a SharePoint site with some meta data
Office 14 Root Cause “Template” 
•Tenets/Best Practices 
•History/Summary 
•Bugs 
▫Bug number(s) 
▫Bug description 
•Root Cause Questions 
▫Would this get found in our Test Focus/Pass for this area? 
▫When did it get broken? 
▫Was ownership confused? 
▫Would we have assumed that another team would have also seen it? 
▫Would it have been reasonable to assume that the fix that caused the regression would have broken this? 
▫Would a code review have likely identified the issue? 
▫Was there a partner team(s) involved? 
▫Were there multiple PRs involved? 
▫Was the feature "Hot" coming into the close of the milestone? 
•Engineering Recommendations: 
▫Recommendation(s)/Owners 
▫1. 
▫2. 
▫3.
O14 Example Beta1 End Game 
•Word: Japanese Indented Bullets when saved lose their indents 
▫Repro: 
Set Japanese to be your primary editing language 
Create a bulleted list with indents 
Save/Close/Re-open 
Result: indents are gone 
Expect: no loss of indents 
▫Happens with all docs created with that setting in 12 and 14
O14 Example RCA Recommendations 
•Engineering Recommendation: 
▫Automatethis case and use the code change to inform other automation needed for this area (lists, styles, paragraph props) 
▫Ensure that ICTsdogfoodthe product 
▫Make new push for testers to use international settingsmore frequently, with an eye on Beta2 languages and risks associated with each language equivalence class –we’ll most likely drive a Mini-pass on all our features with this setting for Beta2 
▫Add this area to testing executed during regression checks onallstyle-related fixes.
RCA Sentinel Bug Approach 
•Big Bugs that got away are Sentinel Events 
•On bug is indicative of other risk 
•The more big bugs the more patterns 
•Nothing to do with X-Men
Formal RCA Program(Sentinel Events and Pattern Analysis) 
•Started at any time during SDLC 
•Often launched after a single expensive bug 
▫Security vulnerabilities 
▫Production Outage 
I have a lot of these stories 
•Can be Resource Intensive -so be deliberate
Staffing for Success –RCA Study Analyst 
•A Single Analyst or a Team 
▫Could be you after today 
•Senior with wide range of development process knowledge 
•Component Level and System Level analysis 
•Work with all types -Development, Testing, Program Management, Operations, Support 
▫May include marketing and field personnel 
•Skills 
▫Defect and low-level code analysis 
▫Efficiency Diagnosis 
▫RCA Analysis and even understanding 
▫Algorithm and metric development 
▫Data analysis and presentation
Phases of an RCA Program 
1.Event Identification 
2.Data Collection 
3.Data Analysis and Assessment 
4.Corrective Action 
5.Inform and Apply 
6.Follow-up, measurement and reportingEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase I: Event Identification 
•The Sentinel Event 
▫Bug that got away and customer found 
▫Does not need to be a defect 
▫One or multiple 
•Often too many bugs to pick from 
▫For an RCA program first establish criteria for a sentinel event 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase I: Event Identification (Sentinel Event Criteria) 
•Not all bugs will yield a true “root” cause 
•Focus on most severe/undesirable event 
▫“I remember this one bug…” 
•Risk based assessment criteria 
▫Severity 
▫Risk of recurrence 
▫Cost –actual and opportunity 
Identify Sentinel Event Criteria 
Identify Data Channels 
Route Single Event through Process 
Prepare Data & Map Fields (defect tracking system query) 
Log Event in RCA Tracking Database 
Event to Analyze 
Sentinel Event Data Chanel Loop
Phase I: Event Identification (Data Chanel –Sources of Data) 
•Defect and Test Case Management tracking system 
•Source code repository and Test code coverage data 
•Voice of the Customer 
▫Product support and Customer or marketing data 
▫Individual surveys and interviews 
•Findings from previous RCA Studies 
•Crash data through Windows Error Reporting 
•Services have tickets and data center telemetry 
▫Client and Cloud testing session tomorrow 
More about WER @ https://winqual.microsoft.com/
Phase I: Event Identification(Tracking System) 
•Prepare a list of Sentinel Events 
•Gather and Prepare the Preliminary Data 
•Route Single Event through Process 
•Create an RCA Tracking Database 
Data Elements of RCATracking System 
•Event or Study ID, Title & Dates 
•Related Defect links 
•Failure areas and Source Code 
•Timeline of events before and after (vital for services) 
•Team Contacts and Owners 
•RCA Analysts and Contacts 
•Expert Groups and Contacts 
•Cause of defect and corrective action 
•Survey Data and Resultson effectiveness of corrective action 
•Log Events in RCA system 
•Analyze events 
•NOTE: Meta Data better suited for lists, documents and shares
Phase II: Data Collection 
•Use Common Sense and Trust Gut Feel 
▫“Hey did you hear about the bug…” 
▫“I heard BillGwas doing a demon when…” 
•Use a survey to gather additional data 
▫Was this noticed and ignored 
▫Is this a common error type 
▫Could this have been prevented 
•Gather common data on several sentinel eventsEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase II: Data Collection 
•Windows Customized (Visual Studio Team System) 
▫Part of Defect Tracking System 
▫Connect to source code 
▫Attachments 
▫Collaborating 
▫Workflow 
Windows ezRCAProgram 
TheGoal 
Reduce DefectsThroughout the Product Cycle 
The Questions 
•What type of defect? 
•What phasewas the defect introduced? 
•What was the extent of the fix? 
•How long did it take to fix the defect? 
The Source 
•Product Studio Extension (Per Bug Report) 
Leverage Points 
•Distributed Workflow 
•Quick and Easy Data Collection 
•AggregateAnalysis and Trend Charts 
•Subcomponent-Level Data Also Available 
•Focus on Individual Improvement 
•Windows Vista ran a full RCA program 
•Windows 7 moved to ezRCA 
▫Cut many of the other data sources 
▫Focus on meta data around bugs
Windows “ezRCA” Approach 
Windows ezRCAProgram 
TheGoal 
Reduce DefectsThroughout the Product Cycle 
The Questions 
•What type of defect? 
•What phasewas the defect introduced? 
•What was the extent of the fix? 
•How long did it take to fix the defect? 
The Source 
•Product Studio Extension (Per Bug Report) 
Leverage Points 
•Distributed Workflow 
•Quick and Easy Data Collection 
•AggregateAnalysis and Trend Charts 
•Subcomponent-Level Data Also Available 
•Focus on Individual Improvement
Windows EZ RCA Diagnosis 
As isNew 
•Diagnosis is currently required for all bugs and defaults to NA 
•This field should only be activated if the bug is resolved “Fixed” or “Won’t Fix” 
•There should be no default value 
•Change/combine Hardware & No HW to Hardware Issue 
NOTE: Items in REDare new or changed 
Assignment Error 
Build Error 
Concurrency Error 
Data Checking Error 
Data Corruption 
Doc Error 
Environment Error 
Error Handling Problem 
Hardware Issue 
Ignored Failure 
Incorrect Program State 
Interface Error 
Missing Method/Function 
Logic Error 
Not Applicable 
Other 
Resource Issue 
Simple Coding Error 
System Error 
User Misunderstanding
Windows ezRCAValues 
•Initial classification of root causes 
•Root cause helps us identify the nature of the kinds of mistakes we are making 
•This will be a required field for Developers when resolving a bug that is ‘Fixed’ or ‘Won’t Fix’ 
•This will be a single-select dropdown list and developers will be expected to select the item that is most applicable 
•This field is not intended to replace deep RCA studies and more information will likely be required based on analysis of this data 
•For gathering further information, use the Prevention Tab, Test Follow-up Tab, and Bug Analysis Tabs in Product Studio or Soapbox (NOTE: Much of this will be consolidated in the future)
Windows Additional RCA data 
•Symptom and Prevention categorization 
•Link to more info 
•Anonymous submission
ezRCAPivot Points 
ezRCA 
•Data on Lots of Bugs 
•Few Questions & Answers 
•Quick, Easy 
•Fully Distributed 
Traditional RCA 
•Data on Select Fixed Bugs 
•Detailed Analysis of Defect 
•Multiple-Data Sources 
•Significant Investment 
•Can be Resource-Limited
Phase II: Data Collection Keys to Success 
•For Sentinel Events open template is fine 
•For ezRCAExtend bug tracking system with ezDataCollection 
▫Keep system light weight 
▫Limit required fields 
▫Provide opportunity to expand within bug 
•For Formal RCA will need multiple data sources and extensible schema 
•Recommend you start with Sentinel Events and progress to a formal programEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Keep going with formal RCA 
•Some tools you can use with Sentinel Events and ezRCA 
•What good tester doesn’t make you wallow in the details.
Phase III: Data Analysis and Assessment 
•Analysis Performed by 
▫RCA Team 
▫Research Team 
▫Related expertsEvent Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
•Log all outputs in RCA System 
•Be judicious with Experts time
Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone) 
Good article from ASQ – 
http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.html
Phase III: Data Analysis and Assessment(the Five Whys) 
•Brief History -http://en.wikipedia.org/wiki/5_Whys 
▫Developed by SakichiToyoda 
▫First used in Toyota (Kaizen), Six Sigma tool 
•What is it 
▫Simply put -ask why 5 times to get to the root cause of a problem 
•Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html 
▫why was the website down? The CPU utilization on all our front-end servers went to 100% 
▫why did the CPU usage spike? A new bit of code contained an infinite loop! 
▫why did that code get written? So-and-so made a mistake 
▫why did his mistake get checked in? He didn't write a unit test for the feature 
▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD 
•Criticism of five whys 
▫Not reproducible across individuals 
▫Shown that investigators tent do stop a symptoms rather than root cause 
▫Relies upon the investigators knowledge
Phase III: Data Analysis and Assessment(the Five Whys) 
•Brief History -http://en.wikipedia.org/wiki/5_Whys 
▫Developed by SakichiToyoda 
▫First used in Toyota Motor Corporation 
▫Common tool within Kaizen, Lean Manufacturing & Six Sigma 
•What is it 
▫Simply put -ask why 5 times to get to the root cause of a problem 
•Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html 
▫why was the website down? The CPU utilization on all our front-end servers went to 100% 
▫why did the CPU usage spike? A new bit of code contained an infinite loop! 
▫why did that code get written? So-and-so made a mistake 
▫why did his mistake get checked in? He didn't write a unit test for the feature 
▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD
•Brief History -http://en.wikipedia.org/wiki/Ishikawa_diagram 
▫Developed by Kaoru Ishikawa in the 1960s 
▫One of the 7 basic quality management tools 
•Can use with 5 Whys 
▫Put each why off the first tree point 
▫Ask why for each one of these issues 
▫Keep going until you find one or more root causes 
•Some industries have common causes mapped to the fishbone 
▫Original 4 Ms–Machine, Method, Material, Man power 
▫The 8 Ps (Used in Service Industry) –People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product 
▫Ken’s List –People, Process, Tools, Accountability, Training, Recognition and awareness, Inspection and supervision, Pressure or Stress 
Event IdentificationData Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
Phase III: Data Analysis and Assessment(Fishbone Diagram)
Trending Per-Subcomponent 
•Trends Matter 
▫Uptick Warrants More Investigation? 
▫Perform a Traditional RCA for That Set of Events 
•Profile 
▫The State of the Code 
▫Personal Improvements 
▫Identify Key Events 
Last 5 Weeks
Analysis is not yet at solutions 
•Five Whys and Fishbone Diagram help get to root causes 
•Data and trending can provide timely alerts and catches regressions 
•Root causes are then analyzed for corrective actions
Event Identification 
Data CollectionData Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
Phase III: Analysis is not the solution(Fishbone Diagram) 
•Five Whys and Fishbone Diagram are tools to get to root causes 
•Data and trending of bugs can provide timely alerts and catches regressions 
•Root causes are then analyzed for corrective actions
Phase IV: Corrective Actions 
Event IdentificationData Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting 
•Identify Trends and Group Them into Corrective Themes 
▫May be solutions related to Fishbone Diagram mapping buckets 
•Meet with the experts again 
▫Remember my warning not to burn out your experts 
•Determine Prioritization Factors and Costing for Corrective Actions 
▫Consider Return on Investment (ROI) 
Should have capture direct cost and opportunity cost during Data Collection 
▫Speed to implement 
▫Likelihood of solution being highly effective 
▫Simplicity of solution 
▫Is the solution automatable or process driven
Bug Wallow #3: Our Corrective Actions 
•Email and Provisioning used Production Data 
•Both sanitized the data 
•Both impacted production 
•What did we change? 
▫Stress Tests have no Internet Access 
▫Sanitized Date Diff feature
Phase V: Inform and Apply 
•Host a Management Review 
▫Managers will like RCA more than bugs 
▫You are eliminating a problem not just finding it 
•Implementation is a project, treat it that way 
▫Assign Owners 
▫Build and Maintain Schedule 
▫Create a Feedback Loop 
▫Establish a Monthly Status Report 
▫Track and correct the corrective action 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Phase VI: Follow-up, Measurement, and Reporting 
•More than Just 
•Six Sigma type approaches 
•Longitudinal Analysis 
▫Draws from Longitudinal Data Analysis - http://gseacademic.harvard.edu/alda/ 
▫Study Over Time 
•Develop failure types and risk areas/components 
•Inspect similar products/areas for baseline 
•Gather and inspect process data 
•Examine Data for Trends 
•Report out 
Event Identification 
Data Collection 
Data Analysis and Assessment 
Corrective Actions 
Inform and Apply 
Follow up, Measurement, and Reporting
Flatonium2007 
•Need to insert video 
•20 new machines added to the data center 
•5 machines put into production early 
•Machines needed to be Nuked-N-Paved (NNP) 
•Oops
RCA Pit and Pendulum
Risks of Root Cause Analysis 
•Begins with inadequate data 
•Go after too much data too early 
•Draws incorrect conclusion or makes invalid recommendations 
▫Anyone experience this before 
•Focus on the wrong set of defects 
•Ends at the wrong level –too early or late 
•Investment is not always predictable 
▫Can be high cost with low ROI 
•Over focus on data can detract from the story
Benefits of Structured RCA Study 
•Can start as small pilots 
•Uses an identical process regardless of type, age or scope of defect 
•Avoids repeat failures 
•Can be the shortest path to determining and correcting causes of failure 
•Lowers Maintenance Costs 
•Builds a culture of 
▫Accountability 
▫Continuous Improvement
I’ve had enough

More Related Content

What's hot

Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...
Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...
Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...TEST Huddle
 
Michael Bolton - Two Futures of Software Testing
Michael Bolton - Two Futures of Software TestingMichael Bolton - Two Futures of Software Testing
Michael Bolton - Two Futures of Software TestingTEST Huddle
 
Klaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumKlaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumTEST Huddle
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...TEST Huddle
 
Geoff Thompson - Why Do We Bother With Test Strategies
Geoff Thompson - Why Do We Bother With Test StrategiesGeoff Thompson - Why Do We Bother With Test Strategies
Geoff Thompson - Why Do We Bother With Test StrategiesTEST Huddle
 
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...TEST Huddle
 
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...TEST Huddle
 
'Acceptance Testing' by Erik Boelen
'Acceptance Testing' by Erik Boelen'Acceptance Testing' by Erik Boelen
'Acceptance Testing' by Erik BoelenTEST Huddle
 
'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de BurgtTEST Huddle
 
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!TEST Huddle
 
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010TEST Huddle
 
Niels Malotaux - Help We Have a QA Problem!
Niels Malotaux -  Help We Have a QA Problem!Niels Malotaux -  Help We Have a QA Problem!
Niels Malotaux - Help We Have a QA Problem!TEST Huddle
 
John Fodeh - Spend Wisely, Test Well
John Fodeh - Spend Wisely, Test WellJohn Fodeh - Spend Wisely, Test Well
John Fodeh - Spend Wisely, Test WellTEST Huddle
 
Johan Jonasson - Introducing Exploratory Testing to Save the Project
Johan Jonasson - Introducing Exploratory Testing to Save the ProjectJohan Jonasson - Introducing Exploratory Testing to Save the Project
Johan Jonasson - Introducing Exploratory Testing to Save the ProjectTEST Huddle
 
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...TEST Huddle
 
Derk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsDerk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsTEST Huddle
 
Dorothy Graham - Can The Past Tell Us The Future
Dorothy Graham -  Can The Past Tell Us The FutureDorothy Graham -  Can The Past Tell Us The Future
Dorothy Graham - Can The Past Tell Us The FutureTEST Huddle
 
Vipul Kocher - Software Testing, A Framework Based Approach
Vipul Kocher - Software Testing, A Framework Based ApproachVipul Kocher - Software Testing, A Framework Based Approach
Vipul Kocher - Software Testing, A Framework Based ApproachTEST Huddle
 

What's hot (20)

Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...
Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...
Michael Roar Borlund & Christian Carlsen - Real Exploratory Testing, Now With...
 
Michael Bolton - Two Futures of Software Testing
Michael Bolton - Two Futures of Software TestingMichael Bolton - Two Futures of Software Testing
Michael Bolton - Two Futures of Software Testing
 
Klaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using ScrumKlaus Olsen - Agile Test Management Using Scrum
Klaus Olsen - Agile Test Management Using Scrum
 
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
Peter Zimmerer - Establishing Testing Knowledge and Experience Sharing at Sie...
 
Geoff Thompson - Why Do We Bother With Test Strategies
Geoff Thompson - Why Do We Bother With Test StrategiesGeoff Thompson - Why Do We Bother With Test Strategies
Geoff Thompson - Why Do We Bother With Test Strategies
 
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...
Christian Bk Hansen - Agile on Huge Banking Mainframe Legacy Systems - EuroST...
 
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...
Clive Bates - A Pragmatic Approach to Improving Your Testing Process - EuroST...
 
'Acceptance Testing' by Erik Boelen
'Acceptance Testing' by Erik Boelen'Acceptance Testing' by Erik Boelen
'Acceptance Testing' by Erik Boelen
 
'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt'How To Apply Lean Test Management' by Bob van de Burgt
'How To Apply Lean Test Management' by Bob van de Burgt
 
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
Gitte Ottosen - Agility and Process Maturity, Of Course They Mix!
 
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010
Tafline Murnane - The Carrot or The Whip-What Motivates Testers? - EuroSTAR 2010
 
A Taste of Exploratory Testing
A Taste of Exploratory TestingA Taste of Exploratory Testing
A Taste of Exploratory Testing
 
Niels Malotaux - Help We Have a QA Problem!
Niels Malotaux -  Help We Have a QA Problem!Niels Malotaux -  Help We Have a QA Problem!
Niels Malotaux - Help We Have a QA Problem!
 
John Fodeh - Spend Wisely, Test Well
John Fodeh - Spend Wisely, Test WellJohn Fodeh - Spend Wisely, Test Well
John Fodeh - Spend Wisely, Test Well
 
Johan Jonasson - Introducing Exploratory Testing to Save the Project
Johan Jonasson - Introducing Exploratory Testing to Save the ProjectJohan Jonasson - Introducing Exploratory Testing to Save the Project
Johan Jonasson - Introducing Exploratory Testing to Save the Project
 
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...
Ajay Balamnrugadas - Weekend Testing, Skilled Software Testing Unleashed - Eu...
 
Agile Testing
Agile TestingAgile Testing
Agile Testing
 
Derk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both WorldsDerk jan de Grood - ET, Best of Both Worlds
Derk jan de Grood - ET, Best of Both Worlds
 
Dorothy Graham - Can The Past Tell Us The Future
Dorothy Graham -  Can The Past Tell Us The FutureDorothy Graham -  Can The Past Tell Us The Future
Dorothy Graham - Can The Past Tell Us The Future
 
Vipul Kocher - Software Testing, A Framework Based Approach
Vipul Kocher - Software Testing, A Framework Based ApproachVipul Kocher - Software Testing, A Framework Based Approach
Vipul Kocher - Software Testing, A Framework Based Approach
 

Similar to Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010

Evaluating Flow Cytometry Hardware - Advice for making the right choice.
Evaluating Flow Cytometry Hardware - Advice for making the right choice.Evaluating Flow Cytometry Hardware - Advice for making the right choice.
Evaluating Flow Cytometry Hardware - Advice for making the right choice.Ryan Duggan
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype TestingDave Hora
 
Solving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous TestingSolving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous TestingPerfecto by Perforce
 
Root Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective ActionsRoot Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective ActionsHannah Stewart
 
DevSecCon London 2017: Shift happens ... by Colin Domoney
DevSecCon London 2017: Shift happens ... by Colin Domoney DevSecCon London 2017: Shift happens ... by Colin Domoney
DevSecCon London 2017: Shift happens ... by Colin Domoney DevSecCon
 
Its Not You Its Me MSSP Couples Counseling
Its Not You Its Me   MSSP Couples CounselingIts Not You Its Me   MSSP Couples Counseling
Its Not You Its Me MSSP Couples CounselingAtif Ghauri
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAAiman Hud
 
Lean and Continuous delivery
Lean and Continuous deliveryLean and Continuous delivery
Lean and Continuous deliveryLean India Summit
 
Gap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramGap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramTriumvirate Environmental
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routinePeter Varhol
 
Building an Open Source AppSec Pipeline
Building an Open Source AppSec PipelineBuilding an Open Source AppSec Pipeline
Building an Open Source AppSec PipelineMatt Tesauro
 
DevOps for Speed and Agility - DevOpsTO May 2014
DevOps for Speed and Agility - DevOpsTO May 2014DevOps for Speed and Agility - DevOpsTO May 2014
DevOps for Speed and Agility - DevOpsTO May 2014DevOps Ltd.
 
Remote usability testing and remote user research for usability
Remote usability testing and remote user research for usabilityRemote usability testing and remote user research for usability
Remote usability testing and remote user research for usabilityUser Vision
 
Lecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyLecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyRyan Olaybal
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionMazenetsolution
 
Filar seymour oreilly_bot_story_
Filar seymour oreilly_bot_story_Filar seymour oreilly_bot_story_
Filar seymour oreilly_bot_story_EndgameInc
 
Automating Ensemble Monitoring and Reporting
Automating Ensemble Monitoring and ReportingAutomating Ensemble Monitoring and Reporting
Automating Ensemble Monitoring and ReportingInterSystems Corporation
 

Similar to Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010 (20)

Evaluating Flow Cytometry Hardware - Advice for making the right choice.
Evaluating Flow Cytometry Hardware - Advice for making the right choice.Evaluating Flow Cytometry Hardware - Advice for making the right choice.
Evaluating Flow Cytometry Hardware - Advice for making the right choice.
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Learn Learning + Prototype Testing
Learn Learning + Prototype TestingLearn Learning + Prototype Testing
Learn Learning + Prototype Testing
 
Root Cause Analysis تحليل أسباب جذور المشكلة
Root Cause Analysis تحليل أسباب جذور المشكلةRoot Cause Analysis تحليل أسباب جذور المشكلة
Root Cause Analysis تحليل أسباب جذور المشكلة
 
Solving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous TestingSolving the 3 Biggest Questions in Continuous Testing
Solving the 3 Biggest Questions in Continuous Testing
 
Root Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective ActionsRoot Cause Analysis and Corrective Actions
Root Cause Analysis and Corrective Actions
 
DevSecCon London 2017: Shift happens ... by Colin Domoney
DevSecCon London 2017: Shift happens ... by Colin Domoney DevSecCon London 2017: Shift happens ... by Colin Domoney
DevSecCon London 2017: Shift happens ... by Colin Domoney
 
Its Not You Its Me MSSP Couples Counseling
Its Not You Its Me   MSSP Couples CounselingIts Not You Its Me   MSSP Couples Counseling
Its Not You Its Me MSSP Couples Counseling
 
POLITEKNIK MALAYSIA
POLITEKNIK MALAYSIAPOLITEKNIK MALAYSIA
POLITEKNIK MALAYSIA
 
Lean and Continuous delivery
Lean and Continuous deliveryLean and Continuous delivery
Lean and Continuous delivery
 
Gap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramGap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S Program
 
Making disaster routine
Making disaster routineMaking disaster routine
Making disaster routine
 
Building an Open Source AppSec Pipeline
Building an Open Source AppSec PipelineBuilding an Open Source AppSec Pipeline
Building an Open Source AppSec Pipeline
 
DevOps for Speed and Agility - DevOpsTO May 2014
DevOps for Speed and Agility - DevOpsTO May 2014DevOps for Speed and Agility - DevOpsTO May 2014
DevOps for Speed and Agility - DevOpsTO May 2014
 
Istqb foundation level day 1
Istqb foundation level   day 1Istqb foundation level   day 1
Istqb foundation level day 1
 
Remote usability testing and remote user research for usability
Remote usability testing and remote user research for usabilityRemote usability testing and remote user research for usability
Remote usability testing and remote user research for usability
 
Lecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in CompanyLecture 4 Root Cause Analysis in Company
Lecture 4 Root Cause Analysis in Company
 
Testing- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solutionTesting- Fundamentals of Testing-Mazenet solution
Testing- Fundamentals of Testing-Mazenet solution
 
Filar seymour oreilly_bot_story_
Filar seymour oreilly_bot_story_Filar seymour oreilly_bot_story_
Filar seymour oreilly_bot_story_
 
Automating Ensemble Monitoring and Reporting
Automating Ensemble Monitoring and ReportingAutomating Ensemble Monitoring and Reporting
Automating Ensemble Monitoring and Reporting
 

More from TEST Huddle

Why We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureWhy We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureTEST Huddle
 
Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar TEST Huddle
 
Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway TEST Huddle
 
Being a Tester in Scrum
Being a Tester in ScrumBeing a Tester in Scrum
Being a Tester in ScrumTEST Huddle
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsTEST Huddle
 
Using Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkUsing Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkTEST Huddle
 
Big Data: The Magic to Attain New Heights
Big Data:  The Magic to Attain New HeightsBig Data:  The Magic to Attain New Heights
Big Data: The Magic to Attain New HeightsTEST Huddle
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?TEST Huddle
 
TDD For The Rest Of Us
TDD For The Rest Of UsTDD For The Rest Of Us
TDD For The Rest Of UsTEST Huddle
 
Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)TEST Huddle
 
Creating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesCreating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesTEST Huddle
 
Is There A Risk?
Is There A Risk?Is There A Risk?
Is There A Risk?TEST Huddle
 
Are Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageAre Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageTEST Huddle
 
Growing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersGrowing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersTEST Huddle
 
Do we need testers on agile teams?
Do we need testers on agile teams?Do we need testers on agile teams?
Do we need testers on agile teams?TEST Huddle
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfullyTEST Huddle
 
Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey TEST Huddle
 
Practical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsPractical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsTEST Huddle
 
Thinking Through Your Role
Thinking Through Your RoleThinking Through Your Role
Thinking Through Your RoleTEST Huddle
 
Using Selenium 3 0
Using Selenium 3 0Using Selenium 3 0
Using Selenium 3 0TEST Huddle
 

More from TEST Huddle (20)

Why We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- AccentureWhy We Need Diversity in Testing- Accenture
Why We Need Diversity in Testing- Accenture
 
Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar Keys to continuous testing for faster delivery euro star webinar
Keys to continuous testing for faster delivery euro star webinar
 
Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway Why you Shouldnt Automated But You Will Anyway
Why you Shouldnt Automated But You Will Anyway
 
Being a Tester in Scrum
Being a Tester in ScrumBeing a Tester in Scrum
Being a Tester in Scrum
 
Leveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional TestsLeveraging Visual Testing with Your Functional Tests
Leveraging Visual Testing with Your Functional Tests
 
Using Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test WorkUsing Test Trees to get an Overview of Test Work
Using Test Trees to get an Overview of Test Work
 
Big Data: The Magic to Attain New Heights
Big Data:  The Magic to Attain New HeightsBig Data:  The Magic to Attain New Heights
Big Data: The Magic to Attain New Heights
 
Will Robots Replace Testers?
Will Robots Replace Testers?Will Robots Replace Testers?
Will Robots Replace Testers?
 
TDD For The Rest Of Us
TDD For The Rest Of UsTDD For The Rest Of Us
TDD For The Rest Of Us
 
Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)Scaling Agile with LeSS (Large Scale Scrum)
Scaling Agile with LeSS (Large Scale Scrum)
 
Creating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger EnterprisesCreating Agile Test Strategies for Larger Enterprises
Creating Agile Test Strategies for Larger Enterprises
 
Is There A Risk?
Is There A Risk?Is There A Risk?
Is There A Risk?
 
Are Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test CoverageAre Your Tests Well-Travelled? Thoughts About Test Coverage
Are Your Tests Well-Travelled? Thoughts About Test Coverage
 
Growing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for TestersGrowing a Company Test Community: Roles and Paths for Testers
Growing a Company Test Community: Roles and Paths for Testers
 
Do we need testers on agile teams?
Do we need testers on agile teams?Do we need testers on agile teams?
Do we need testers on agile teams?
 
How to use selenium successfully
How to use selenium successfullyHow to use selenium successfully
How to use selenium successfully
 
Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey Testers & Teams on the Agile Fluency™ Journey
Testers & Teams on the Agile Fluency™ Journey
 
Practical Test Strategy Using Heuristics
Practical Test Strategy Using HeuristicsPractical Test Strategy Using Heuristics
Practical Test Strategy Using Heuristics
 
Thinking Through Your Role
Thinking Through Your RoleThinking Through Your Role
Thinking Through Your Role
 
Using Selenium 3 0
Using Selenium 3 0Using Selenium 3 0
Using Selenium 3 0
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Ken Johnston - Big Bugs That Got Away - EuroSTAR 2010

  • 1. What We Can Learn from Big Bugs that Got Away Ken Johnston, Group ManagerOffice, Internet Platforms & Operation EuroSTAR2010
  • 2. I Want to know more about YOU •Who wandered in here by accident •Who is at EuroSTARfor the first time •How long have you been in Software Testing •Have you ever missed a bug •Have you ever heard…
  • 3. “HOW COULD YOU MISS THAT BUG!!!”
  • 4. Def. –Rolling around in something disgusting
  • 6. It all began one dark and stormy night!
  • 7.
  • 8. Session Overview •About you, me and setting the tone •Bug Wallowing 1 –A self reflective journey •Bug Wallowing 2 –Group Therapy •Root Cause Analysis 101 ▫Sentinel Events ▫Pattern Analysis ▫Formal RCA program overview •Bug Wallowing 3 •Five Whys •Bug Wallow 4 •Fishbone •Bug Wallowing 5 •Crafting a good bug story P P
  • 9. Learning Objectives 1.Be armed to deal with the question, “How did test miss this bug.” 2.Learn a little about formal RCA and the use of the 5 Whys and Fishbone tools 3.Have a number of highly instructive bug stories from within your organization that you can take home
  • 10. Def. –Roll in something: to lie down and roll around in something
  • 11. “HOW COULD YOU MISS THAT BUG!!!”
  • 12. Time for some “Group Bug” Therapy
  • 13. Repeat After Me •I did not design the bug. •I did not code the bug. •I found crashing bugs, data corruption bugs, fit and finish bugs. •I found hundreds of bugs.
  • 14. Repeat After Me •So what if I missed a bug. •I didn’t write the bug in the first place.
  • 15. Activity Share your Bug Story •Take the next 10 minutes •Groups of 2 or 3 •Think of a bug that got away •Minimum One Bug story each •Questions to ask ▫How long after ship did you see this ▫How big was the impact ▫How did it get missed ▫What did you change because of this bug
  • 17. Time to Share •Next 5 minutes or so •Did you have any Ah Ha moments?
  • 18. Why do we Wallow in Bugs that got away? •Take 3-5 minutes to discuss in your groups
  • 20. Time to Share •What did you come up with? •Why do we wallow? •Why do we RCA bugs? •My List ▫To learn from mistakes ▫To systematically identify areas for improvement ▫To prevent repetition of mistakes ▫Bugs are stories and organizations are driven by the stories they tell
  • 21. Firstwe need a commonbaselineto work from
  • 22. Root Cause Analysis 300 Level •Two approaches to RCA ▫Sentinel Event ▫Pattern Analysis •Formal RCA Program ▫Data Collection ▫Data Analysis and Assessment ▫Corrective Actions •The Pit and the Pendulum ▫Risks of RCA ▫Benefits of RCA Based upon Ch. 11 PDF available to EuroSTAR attendees http://defectprevention.org
  • 23. RCA –Sentinel Event Bugs •How do you know it’s a Sentinel Event Bug? •If you make the front page of the http://wsj.com •Production Outage ▫I have a lot of these stories •Security vulnerabilities •The last bug taken before ship ▫“How could we have missed this!” •Any big bug that got away •Nothing to do with the X-Men
  • 24. RCA Pattern Analysis •Pattern Analysis requires a lot of bugs •Pattern Analysis can be done over time •Pattern Analysis is best served within a formal RCA Program. ▫Cut some of the slides from this presentation ▫The full set of slides can be found in the appendix on the EuroSTARconference website
  • 25. Phases of an RCA Program 1.Event Identification 2.Data Collection 3.Data Analysis and Assessment 4.Corrective Action 5.Inform and Apply 6.Follow-up, measurement and reportingEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting P
  • 26. Phase 2: Data CollectionExercise •Data Channels 5 Minute Discussion in Groups ▫What are the sources of data in my organization ▫Which are practical ▫Which are the most costly to implement ▫Which are most likely to yield results ▫Do you have time to implement these Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 28. Phase 2: Data Collection Time to Share •What sources did you come up with?
  • 29. Phase 2: Data Collection(Sources of Data) •Defect and Test Case Management tracking system •Source code repository and Test code coverage data •Voice of the Customer ▫Product support and Customer or marketing data ▫Individual surveys and interviews •Findings from previous RCA Studies •Crash data through Windows Error Reporting •Services have tickets and data center telemetry ▫Heuristic Data of live site now vs. historic More about WER @ https://winqual.microsoft.com/
  • 30. Phase 2: Data Collection(Tracking System) •Prepare a list of Sentinel Events •Gather and Prepare the Preliminary Data •Route Single Event through Process •Create an RCA Tracking Database Data Elements of RCATracking System •Event or Study ID, Title & Dates •Related Defect links •Failure areas and Source Code •Timeline of events before and after (vital for services) •Team Contacts and Owners •RCA Analysts and Contacts •Expert Groups and Contacts •Cause of defect and corrective action •Survey Data and Resultson effectiveness of corrective action •Log Events in RCA system •Analyze events •NOTE: Meta Data better suited for lists, documents and shares
  • 31. Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone) Good article from ASQ – http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.htmlEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 32. Phase III: Data Analysis and Assessment(the Five Whys) •Brief History -http://en.wikipedia.org/wiki/5_Whys ▫Developed by SakichiToyoda ▫First used in Toyota Motor Corporation ▫Common tool within Kaizen, Lean Manufacturing & Six Sigma •What is it ▫Simply put -ask why 5 times to get to the root cause of a problem •Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html ▫why was the website down? The CPU utilization on all our front-end servers went to 100% ▫why did the CPU usage spike? A new bit of code contained an infinite loop! ▫why did that code get written? So-and-so made a mistake ▫why did his mistake get checked in? He didn't write a unit test for the feature ▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 33. Def. –indulge in something excessively: to take pleasure or be immersed in something in a self- indulgent way
  • 35. Five Whys Exercise •Take 5-10 minutes •Use one of these bugs or one of your own •Try the five whys and see if you can find a root cause
  • 36. That’s time One does not worry about grace or dignity
  • 37. Time to Share •Time for about 2 examples •What about the 5 Whys worked for you •Where did it fall short?
  • 38. Phase III: Data Analysis and Assessment(the Five Whys) •Criticism of five whys ▫Not reproducible across individuals ▫Shown that investigators tent do stop a symptoms rather than root cause ▫Relies upon the investigators knowledge
  • 39. •Brief History -http://en.wikipedia.org/wiki/Ishikawa_diagram ▫Developed by Kaoru Ishikawa in the 1960s ▫One of the 7 basic quality management tools •Can use with 5 whys ▫Put each why off the first tree point ▫Ask why for each one of these issues ▫Keep going until you find one or more root causes •Some industries have common causes mapped to the fishbone ▫Original 4 Ms–Machine, Method, Material, Man power ▫The 8 Ps (Used in Service Industry) –People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product ▫Ken’s List –People& Training, Tools, Inspection and supervision, Pressure or Stress, Process & Accountability, Recognition & Awareness Event IdentificationData Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting Phase III: Data Analysis and Assessment(Fishbone Diagram)
  • 40. Pressure or Stress Recognition & Awareness Process & Accountability Tools Inspection & Supervision People & Training Brownout across 3 largest datacenters
  • 41. •Deployment tool changes ▫Warn but do not prevent multi-DC deployments ▫Automatically generate rollback script ▫Cross service monitors will cancel and roll back a bad deployment automatically •Process changes ▫Deployment code review ▫Deployment checklist ▫Audits and Fire drills Audited all alerts, escalation aliases and contact #s Fire drill email and phone •New Tools ▫Per-Alert fault injection •Recognition ▫SWAT DRI team for most senior DRIs
  • 42. Fishbone Exercise •Take 5-10 minutes •Have a handout for you •Use the same bug from the five whys exercise
  • 44. Time to Share •Time to share ▫Who did the same bug as the five whys? ▫Who did a different bug? •What about the fishbone worked for you? •Where did it fall short?
  • 45. Phase III: Data Analysis and Assessment(the Fishbone) •Criticism of Fishbone ▫Requires a lot of experts for each branch ▫Cumbersome
  • 46. Phase V: Inform and Apply •Host a Management Review ▫Managers will like RCA more than bugs ▫You are eliminating a problem not just finding it •Implementation is a project, treat it that way ▫Assign Owners ▫Build and Maintain Schedule ▫Create a Feedback Loop ▫Establish a Monthly Status Report ▫Track and correct the corrective actionEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 47. Phase VI: Follow-up, Measurement, and Reporting •More than Just •Six Sigma type approaches •Longitudinal Analysis ▫Draws from Longitudinal Data Analysis - http://gseacademic.harvard.edu/alda/ ▫Study Over Time •Develop failure types and risk areas/components •Inspect similar products/areas for baseline •Gather and inspect process data •Examine Data for Trends •Report out Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 48. Def. –have huge amount of something: to have an ample or excessive supply of something
  • 49. RCA Pit and Pendulum
  • 50. Risks of Root Cause Analysis •Begins with inadequate data •Go after too much data too early •Draws incorrect conclusion or makes invalid recommendations ▫Anyone experience this before •Focus on the wrong set of defects •Ends at the wrong level –too early or late •Investment is not always predictable ▫Can be high cost with low ROI •Over focus on data can detract from the story
  • 51. Benefits of Structured RCA Study •Can start as small pilots •Uses an identical process regardless of type, age or scope of defect •Avoids repeat failures •Can be the shortest path to determining and correcting causes of failure •Lowers Maintenance Costs •Builds a culture of ▫Accountability ▫Continuous Improvement
  • 52. Achieve Balance •Full Blow RCA with large pattern analysis rarely meets ROI goals. •Limit the scope ▫Few Data Sources ▫Beware of the RCA Tax •Focus on Sentinel Events ▫Provides opportunity for clear visible winds ▫If it’s a bug that got away you’ll be doing a Post Mortem anyway ▫Sentinel events provide an opportunity to change the dialogue
  • 55. So why a focus on Bugs that got away •Bugs that got away are Sentinel Events •They are great stories ▫There is never an end to bugs •Bug Stories are Organizational Knowledge •Tribal Knowledge drives organizations •Stories are powerful change enablers
  • 57. Gloves on the Boardroom Table •The Heart of Change ▫Requires an emotional component ▫What is more emotional than “How could test miss this bug!” •Not all change stories involve yelling •Visual and tactile help too ▫Handout of “Gloves on the boardroom table” ▫john@SAGEKotter.com “I love your idea. And you have my permission.”
  • 58. Organizational Development •I worked in Engineering Excellence ▫We were Performance Improvement organization ▫Enterprise Change Management •Let me bring in some OD concepts
  • 59. Knowledge Management (KM) comprises a range of practices used in an organization to identify, create, represent, distribute and enable adoption of insightsand experiences. Such insights and experiences comprise knowledge, either embodied in individualsor embedded in organizationalprocessesor practice. http://en.wikipedia.org/wiki/Knowledge_management
  • 60. What are Organizations Made of? PEOPLE
  • 61. What do people do? Talk about stuff
  • 62. Tribal Knowledge Institutional memoryis a collective set of facts, concepts, experiences and know-howheld by a group of people. http://en.wikipedia.org/wiki/Institutional_memory
  • 63. Organizational Storytelling The study of organizational storytelling, sometimes called “Narrative Knowledge,” attempts to recount events in the form of a storywithin the context of an organization http://en.wikipedia.org/wiki/Organizational_Storytelling
  • 64. So, what is a bug story? be part of the Organizational Narrative Knowledge that should…
  • 65. Springboard Story •Very simple, very quick, very brief ▫Think elevator ride •Non-threatening •Enables listener to visualize •Catalyzes understanding •Spark new stories in the mind •Do not transfer large amounts of information
  • 66. Story Telling Tips •Brain’s are not computers ▫Brain Movies –“The brain assembles perceptions by the simultaneous interaction of whole concepts, whole images.” •The Central Movie –a country or organization ▫Universal Principles –freedom, democracy, constitutional government ▫Long-term goals –education, “life, liberty, pursuit of happiness” ▫Operating methods –free markets, due process, federal and state governments •Capture the Audience ▫“One time there was this bug we missed…” •3D Story Telling pg85-87 ▫Details (facts, information) ▫Dialogue (characters) ▫Drama (a bug that got away?) Brain Movies, The Central Movie, and 3D Story Telling from“The Leader’s Voice”
  • 67. Our Last Exercise! •Your own bug story in 10 minutes ▫Take 10 minutes outlining your story ▫Goal is a 1-2 minute story Think short and tight •Remember to ▫Hook the audience ▫3D Storytelling –Details, Dialogue, Drama ▫RCA –what change do you want to convey?
  • 68. My Bug Story -Template •Title •The Hook •Details –Who, what, when, product/project •Dialogue –Yelling, Crying, Funny? •Drama –What is the tension? Anyone Fired? •What were the Root Causes •What did you change and why?
  • 70. Time to Share •3 volunteers to come up and tell their bug story
  • 71. Resources •“The Leader’s Guide to Storytelling” by Steve Denning ▫Resources –http://www.stevedenning.com/launchgifts.html ▫Audio Interview -The knowledge-based organization: Using stories to embody and transfer knowledge http://www.storytellingwithchildren.com/2008/01/12/steve- denning-the-knowledge-based-organization/ •“The Leader’s Voice” by Crossland& Clark ▫http://roncrossland.com/ •Defect Prevention Chapter 11 RCA ▫http://defectprevention.org •“The Heart of Change” by Cr. John P. Kotter ▫Gloves story can be found on pages 11-12 http://www.linkageinc.com/pdfs/disl/KotterPG.pdf
  • 72. http://www.hwtsam.comhttp://blogs.msdn.com/kenjhttp://twitter.com/rkjohnston Chapter 14 (Software + Services Testing) from “How We Test Software at Microsoft” provided on conference CD courtesy of Microsoft Press Ken Johnston –Microsoft STARWest2009 Tutorial TJ What We Can Learn from Big Bugs that Got Away
  • 73. Appendix •What follows are a series of slides to teach RCA. •Some of the slides are integrated in this tutorial on Bugs that Got Away but not all.
  • 74. Firstwe need a commonbaselineto work from
  • 75. Root Cause Analysis 300 Level •Two approaches to RCA ▫Sentinel Event ▫Pattern Analysis •Formal RCA Program ▫When to do an RCA Study ▫Staffing for Success ▫Phases of an RCA Study •The Pit and the Pendulum ▫Risks of RCA ▫Benefits of RCA Based upon Ch. 11 http://defectprevention.org
  • 76. RCA Sentinel Event A sentinel eventis defined by the Joint Commission on Accreditation of Healthcare Organizations(JCAHO) as any unanticipated event in a healthcare setting resulting in death or serious physical or psychologicalinjuryto a personor persons, http://en.wikipedia.org/wiki/Sentinel_event
  • 77. RCA –The Sentinel Event of Bugs •Home Page of http://wsj.com •Production Outage ▫I have a lot of these stories •Security vulnerabilities •The last bug taken before ship •“How could we have missed this!” •Big Bugs that Got Away
  • 78. RCA –Office 14 Sentinel Bug Process •Why SharePoint as the repository ▫Attachments ▫Collaborating ▫Workflow ▫Reporting Dash ▫Wiki ▫Exchange contacts ▫Offline •Simple Light Weight Approach •Focus on recall class bugs from O14 Beta 1 ▫Will need the answers anyway to get through triage ▫Usually logged in the bug but not easy to find or learn from ▫No consistent process across teams •Develop a common template in Word •Track on a SharePoint site with some meta data
  • 79. Office 14 Root Cause “Template” •Tenets/Best Practices •History/Summary •Bugs ▫Bug number(s) ▫Bug description •Root Cause Questions ▫Would this get found in our Test Focus/Pass for this area? ▫When did it get broken? ▫Was ownership confused? ▫Would we have assumed that another team would have also seen it? ▫Would it have been reasonable to assume that the fix that caused the regression would have broken this? ▫Would a code review have likely identified the issue? ▫Was there a partner team(s) involved? ▫Were there multiple PRs involved? ▫Was the feature "Hot" coming into the close of the milestone? •Engineering Recommendations: ▫Recommendation(s)/Owners ▫1. ▫2. ▫3.
  • 80. O14 Example Beta1 End Game •Word: Japanese Indented Bullets when saved lose their indents ▫Repro: Set Japanese to be your primary editing language Create a bulleted list with indents Save/Close/Re-open Result: indents are gone Expect: no loss of indents ▫Happens with all docs created with that setting in 12 and 14
  • 81. O14 Example RCA Recommendations •Engineering Recommendation: ▫Automatethis case and use the code change to inform other automation needed for this area (lists, styles, paragraph props) ▫Ensure that ICTsdogfoodthe product ▫Make new push for testers to use international settingsmore frequently, with an eye on Beta2 languages and risks associated with each language equivalence class –we’ll most likely drive a Mini-pass on all our features with this setting for Beta2 ▫Add this area to testing executed during regression checks onallstyle-related fixes.
  • 82. RCA Sentinel Bug Approach •Big Bugs that got away are Sentinel Events •On bug is indicative of other risk •The more big bugs the more patterns •Nothing to do with X-Men
  • 83. Formal RCA Program(Sentinel Events and Pattern Analysis) •Started at any time during SDLC •Often launched after a single expensive bug ▫Security vulnerabilities ▫Production Outage I have a lot of these stories •Can be Resource Intensive -so be deliberate
  • 84. Staffing for Success –RCA Study Analyst •A Single Analyst or a Team ▫Could be you after today •Senior with wide range of development process knowledge •Component Level and System Level analysis •Work with all types -Development, Testing, Program Management, Operations, Support ▫May include marketing and field personnel •Skills ▫Defect and low-level code analysis ▫Efficiency Diagnosis ▫RCA Analysis and even understanding ▫Algorithm and metric development ▫Data analysis and presentation
  • 85. Phases of an RCA Program 1.Event Identification 2.Data Collection 3.Data Analysis and Assessment 4.Corrective Action 5.Inform and Apply 6.Follow-up, measurement and reportingEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 86. Phase I: Event Identification •The Sentinel Event ▫Bug that got away and customer found ▫Does not need to be a defect ▫One or multiple •Often too many bugs to pick from ▫For an RCA program first establish criteria for a sentinel event Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 87. Phase I: Event Identification (Sentinel Event Criteria) •Not all bugs will yield a true “root” cause •Focus on most severe/undesirable event ▫“I remember this one bug…” •Risk based assessment criteria ▫Severity ▫Risk of recurrence ▫Cost –actual and opportunity Identify Sentinel Event Criteria Identify Data Channels Route Single Event through Process Prepare Data & Map Fields (defect tracking system query) Log Event in RCA Tracking Database Event to Analyze Sentinel Event Data Chanel Loop
  • 88. Phase I: Event Identification (Data Chanel –Sources of Data) •Defect and Test Case Management tracking system •Source code repository and Test code coverage data •Voice of the Customer ▫Product support and Customer or marketing data ▫Individual surveys and interviews •Findings from previous RCA Studies •Crash data through Windows Error Reporting •Services have tickets and data center telemetry ▫Client and Cloud testing session tomorrow More about WER @ https://winqual.microsoft.com/
  • 89. Phase I: Event Identification(Tracking System) •Prepare a list of Sentinel Events •Gather and Prepare the Preliminary Data •Route Single Event through Process •Create an RCA Tracking Database Data Elements of RCATracking System •Event or Study ID, Title & Dates •Related Defect links •Failure areas and Source Code •Timeline of events before and after (vital for services) •Team Contacts and Owners •RCA Analysts and Contacts •Expert Groups and Contacts •Cause of defect and corrective action •Survey Data and Resultson effectiveness of corrective action •Log Events in RCA system •Analyze events •NOTE: Meta Data better suited for lists, documents and shares
  • 90. Phase II: Data Collection •Use Common Sense and Trust Gut Feel ▫“Hey did you hear about the bug…” ▫“I heard BillGwas doing a demon when…” •Use a survey to gather additional data ▫Was this noticed and ignored ▫Is this a common error type ▫Could this have been prevented •Gather common data on several sentinel eventsEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 91. Phase II: Data Collection •Windows Customized (Visual Studio Team System) ▫Part of Defect Tracking System ▫Connect to source code ▫Attachments ▫Collaborating ▫Workflow Windows ezRCAProgram TheGoal Reduce DefectsThroughout the Product Cycle The Questions •What type of defect? •What phasewas the defect introduced? •What was the extent of the fix? •How long did it take to fix the defect? The Source •Product Studio Extension (Per Bug Report) Leverage Points •Distributed Workflow •Quick and Easy Data Collection •AggregateAnalysis and Trend Charts •Subcomponent-Level Data Also Available •Focus on Individual Improvement •Windows Vista ran a full RCA program •Windows 7 moved to ezRCA ▫Cut many of the other data sources ▫Focus on meta data around bugs
  • 92. Windows “ezRCA” Approach Windows ezRCAProgram TheGoal Reduce DefectsThroughout the Product Cycle The Questions •What type of defect? •What phasewas the defect introduced? •What was the extent of the fix? •How long did it take to fix the defect? The Source •Product Studio Extension (Per Bug Report) Leverage Points •Distributed Workflow •Quick and Easy Data Collection •AggregateAnalysis and Trend Charts •Subcomponent-Level Data Also Available •Focus on Individual Improvement
  • 93. Windows EZ RCA Diagnosis As isNew •Diagnosis is currently required for all bugs and defaults to NA •This field should only be activated if the bug is resolved “Fixed” or “Won’t Fix” •There should be no default value •Change/combine Hardware & No HW to Hardware Issue NOTE: Items in REDare new or changed Assignment Error Build Error Concurrency Error Data Checking Error Data Corruption Doc Error Environment Error Error Handling Problem Hardware Issue Ignored Failure Incorrect Program State Interface Error Missing Method/Function Logic Error Not Applicable Other Resource Issue Simple Coding Error System Error User Misunderstanding
  • 94. Windows ezRCAValues •Initial classification of root causes •Root cause helps us identify the nature of the kinds of mistakes we are making •This will be a required field for Developers when resolving a bug that is ‘Fixed’ or ‘Won’t Fix’ •This will be a single-select dropdown list and developers will be expected to select the item that is most applicable •This field is not intended to replace deep RCA studies and more information will likely be required based on analysis of this data •For gathering further information, use the Prevention Tab, Test Follow-up Tab, and Bug Analysis Tabs in Product Studio or Soapbox (NOTE: Much of this will be consolidated in the future)
  • 95. Windows Additional RCA data •Symptom and Prevention categorization •Link to more info •Anonymous submission
  • 96. ezRCAPivot Points ezRCA •Data on Lots of Bugs •Few Questions & Answers •Quick, Easy •Fully Distributed Traditional RCA •Data on Select Fixed Bugs •Detailed Analysis of Defect •Multiple-Data Sources •Significant Investment •Can be Resource-Limited
  • 97. Phase II: Data Collection Keys to Success •For Sentinel Events open template is fine •For ezRCAExtend bug tracking system with ezDataCollection ▫Keep system light weight ▫Limit required fields ▫Provide opportunity to expand within bug •For Formal RCA will need multiple data sources and extensible schema •Recommend you start with Sentinel Events and progress to a formal programEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 98. Keep going with formal RCA •Some tools you can use with Sentinel Events and ezRCA •What good tester doesn’t make you wallow in the details.
  • 99. Phase III: Data Analysis and Assessment •Analysis Performed by ▫RCA Team ▫Research Team ▫Related expertsEvent Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting •Log all outputs in RCA System •Be judicious with Experts time
  • 100. Phase III: Data Analysis and Assessment(the Five Whys and the Fish Bone) Good article from ASQ – http://www.asq.org/learn-about-quality/cause-analysis-tools/overview/fishbone.html
  • 101. Phase III: Data Analysis and Assessment(the Five Whys) •Brief History -http://en.wikipedia.org/wiki/5_Whys ▫Developed by SakichiToyoda ▫First used in Toyota (Kaizen), Six Sigma tool •What is it ▫Simply put -ask why 5 times to get to the root cause of a problem •Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html ▫why was the website down? The CPU utilization on all our front-end servers went to 100% ▫why did the CPU usage spike? A new bit of code contained an infinite loop! ▫why did that code get written? So-and-so made a mistake ▫why did his mistake get checked in? He didn't write a unit test for the feature ▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD •Criticism of five whys ▫Not reproducible across individuals ▫Shown that investigators tent do stop a symptoms rather than root cause ▫Relies upon the investigators knowledge
  • 102. Phase III: Data Analysis and Assessment(the Five Whys) •Brief History -http://en.wikipedia.org/wiki/5_Whys ▫Developed by SakichiToyoda ▫First used in Toyota Motor Corporation ▫Common tool within Kaizen, Lean Manufacturing & Six Sigma •What is it ▫Simply put -ask why 5 times to get to the root cause of a problem •Fun Example from -http://startuplessonslearned.blogspot.com/2008/11/five-whys.html ▫why was the website down? The CPU utilization on all our front-end servers went to 100% ▫why did the CPU usage spike? A new bit of code contained an infinite loop! ▫why did that code get written? So-and-so made a mistake ▫why did his mistake get checked in? He didn't write a unit test for the feature ▫why didn't he write a unit test? He's a new employee, and he was not properly trained in TDD
  • 103. •Brief History -http://en.wikipedia.org/wiki/Ishikawa_diagram ▫Developed by Kaoru Ishikawa in the 1960s ▫One of the 7 basic quality management tools •Can use with 5 Whys ▫Put each why off the first tree point ▫Ask why for each one of these issues ▫Keep going until you find one or more root causes •Some industries have common causes mapped to the fishbone ▫Original 4 Ms–Machine, Method, Material, Man power ▫The 8 Ps (Used in Service Industry) –People, Process, Policies, Procedures, Price, Promotion, Place/Plant, Product ▫Ken’s List –People, Process, Tools, Accountability, Training, Recognition and awareness, Inspection and supervision, Pressure or Stress Event IdentificationData Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting Phase III: Data Analysis and Assessment(Fishbone Diagram)
  • 104. Trending Per-Subcomponent •Trends Matter ▫Uptick Warrants More Investigation? ▫Perform a Traditional RCA for That Set of Events •Profile ▫The State of the Code ▫Personal Improvements ▫Identify Key Events Last 5 Weeks
  • 105. Analysis is not yet at solutions •Five Whys and Fishbone Diagram help get to root causes •Data and trending can provide timely alerts and catches regressions •Root causes are then analyzed for corrective actions
  • 106. Event Identification Data CollectionData Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting Phase III: Analysis is not the solution(Fishbone Diagram) •Five Whys and Fishbone Diagram are tools to get to root causes •Data and trending of bugs can provide timely alerts and catches regressions •Root causes are then analyzed for corrective actions
  • 107. Phase IV: Corrective Actions Event IdentificationData Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting •Identify Trends and Group Them into Corrective Themes ▫May be solutions related to Fishbone Diagram mapping buckets •Meet with the experts again ▫Remember my warning not to burn out your experts •Determine Prioritization Factors and Costing for Corrective Actions ▫Consider Return on Investment (ROI) Should have capture direct cost and opportunity cost during Data Collection ▫Speed to implement ▫Likelihood of solution being highly effective ▫Simplicity of solution ▫Is the solution automatable or process driven
  • 108. Bug Wallow #3: Our Corrective Actions •Email and Provisioning used Production Data •Both sanitized the data •Both impacted production •What did we change? ▫Stress Tests have no Internet Access ▫Sanitized Date Diff feature
  • 109. Phase V: Inform and Apply •Host a Management Review ▫Managers will like RCA more than bugs ▫You are eliminating a problem not just finding it •Implementation is a project, treat it that way ▫Assign Owners ▫Build and Maintain Schedule ▫Create a Feedback Loop ▫Establish a Monthly Status Report ▫Track and correct the corrective action Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 110. Phase VI: Follow-up, Measurement, and Reporting •More than Just •Six Sigma type approaches •Longitudinal Analysis ▫Draws from Longitudinal Data Analysis - http://gseacademic.harvard.edu/alda/ ▫Study Over Time •Develop failure types and risk areas/components •Inspect similar products/areas for baseline •Gather and inspect process data •Examine Data for Trends •Report out Event Identification Data Collection Data Analysis and Assessment Corrective Actions Inform and Apply Follow up, Measurement, and Reporting
  • 111. Flatonium2007 •Need to insert video •20 new machines added to the data center •5 machines put into production early •Machines needed to be Nuked-N-Paved (NNP) •Oops
  • 112. RCA Pit and Pendulum
  • 113. Risks of Root Cause Analysis •Begins with inadequate data •Go after too much data too early •Draws incorrect conclusion or makes invalid recommendations ▫Anyone experience this before •Focus on the wrong set of defects •Ends at the wrong level –too early or late •Investment is not always predictable ▫Can be high cost with low ROI •Over focus on data can detract from the story
  • 114. Benefits of Structured RCA Study •Can start as small pilots •Uses an identical process regardless of type, age or scope of defect •Avoids repeat failures •Can be the shortest path to determining and correcting causes of failure •Lowers Maintenance Costs •Builds a culture of ▫Accountability ▫Continuous Improvement