Lea Dit 2010 Td Presentation Au Email[1]


Published on


  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Lea Dit 2010 Td Presentation Au Email[1]

  2. 2. Some of our recent clients...<br />Thinking Dimensions International - operating KEPNERandFOURIE RCA company initiatives for the last 23 years<br />Specialise in RCA for IT, Telecoms & Manufacturing<br />Barclays IT<br />Macquarie ITG<br />Unisys<br />Woolworths IT<br />Capita UK<br />SITA Global<br />BT Financial<br />McDonalds IT<br />
  3. 3. AGENDA<br />“Most incident<br />investigators ask<br />the wrong <br />questions, so don’t<br />change your people, <br />change the<br />questions they are<br />asking”<br />Introduction <br />Intro Client Case <br />Stakeholder commitment<br />Managing Information<br />Quality of Information<br />Investigation support<br />Process demonstration<br />Client outcomes<br />Questions & answers<br />
  4. 4. Investigation Info <br />“It takes a company without a formal and effective Root Cause Analysis culture, up to 3 days to restore service incidents, but up to 25 days to find the root cause”<br />KEPNERandFOURIE 2010<br />
  5. 5. Client Case situation<br />Lack of Stakeholder commitment<br />Poor management of information<br />Working with poor quality information<br /> Poor incident investigation support<br />International<br />Australian <br />Investment<br /> Bank’s IT<br /> Division<br />2007-2010<br />
  6. 6. Client situation - results<br />Reduced downtime of critical systems by at least 60%<br />Virtually eliminated recurring incidents<br />Level of escalations dropped > 50%<br />Visible improvement of productivity<br />“The key to success<br />is to be insistent <br />about specificity –<br />the more specific<br />you are the better<br />your chances to<br />solve the incident.”<br />KEPNERandFOURIE<br />
  7. 7. How did they do it?<br />Decided to<br />follow four strategies<br />to improve the<br />management<br />& quality of<br />Incident Investigation<br />information<br />Improve Stakeholder involvement & commitment<br />Improve management of information<br />Improve quality of information thus decreasing incident investigation cycles<br />Improve support for incident investigations<br />
  8. 8. Strategy 1: Improve stakeholder commitment<br />Client Actions<br />Introduced a formal division wide<br />Root Cause Analysis (RCA)system <br />Provided common processes in troubleshooting and solution finding<br />Introduced stakeholder/info source analysis<br />Provided an easy way for SME’s to contribute meaningfully<br />Specific challenges<br />Lack of cross-silo collaboration<br />Poor stakeholder buy-in<br />Reluctant contributions from subject matter experts (SME’s)<br />
  9. 9. Best in class <br />3 hrs<br />Stakeholder Commitment<br />Resolution time to repair a critical outage (3 hrs vs 45 hours)<br />71% increased improvement in mean-time-to-repair of critical bus apps vs 11% decline<br />98% availability of critical business applications vs 82% availability<br />Aberdeen Group<br />Boston Feb 2010<br />J DeBarros & G Patil<br />
  10. 10. Best in class with RCA<br />Stakeholder Commitment<br />69% of Best in Class Co’s implemented RCA over the last 2 years with 50% improvement in productivity and 19% improvement in profitability. 28% indicated they will do RCA in next year<br />19% of Average rated Co’s implemented RCA with a 12% improvement of productivity. Only 19% is planning to do RCA in next 12 months<br />The Laggards did not do any RCA with a 9% drop in productivity. Nearly 30% to implement RCA<br />
  11. 11. Client case situation<br />
  12. 12. Common process<br />Everybody uses the same process for finding causes and solutions<br />The process determines which questions to ask at each step for each type of incident investigation approach<br />Designed for minimalistic information combined with a good focus to provide quick answers<br />Step 1: Identify Problem Situation<br />Step 2: Gather Incident Information<br />Step 3: Analyse Incident Information<br />Step 4: Determine Conclusion<br />
  13. 13. Stakeholder analysis<br />What do you know?<br />What don’t you know?<br />Who has the information?<br />How will you obtain the missing information?<br />Decision makers<br />Implementers<br />Influencers<br />
  14. 14. Strategy 1: Improve stakeholder commitment<br />SPECIFIC RESULTS ACHIEVED<br /><ul><li>Incident is first attempted in natural teams but if not resolved, Management gives permission to ask for appropriate SME’s
  15. 15. Management sanctioning incident investigation meetings, because they know it will provide results
  16. 16. Achieving more in less time and not adverse to attending Incident Investigation meetings
  17. 17. Management promoting the use of the formal RCA processes</li></ul>“If a team could<br />not solve a<br />problem, the<br />person with the<br />information was<br />not invited!”<br />Chuck Kepner<br />
  18. 18. Strategy 2: Improve management of information<br />Client actions<br />Introduced “rules of engagement”<br />Introduced a framework of “levels of troubleshooting” to align with PM’s severity levels<br />Taught staff to trust the processes to deliver the correct answers – templates with questions<br />Introduced the “minimalistic” principle<br />Specific challenges<br />Inappropriate use of information sources<br />Either too much or too little information<br />High level of escalations<br />Duplication of efforts<br />
  19. 19. Rules of engagement<br />TOP – Commitment to training of key staff<br /> and facilitators. Publicise the rules for engagement<br />Top<br />MIDDLE – Commitment to declare a situation as an <br />unresolved incident. Gives instruction for direct <br />reports to do a RCA exercise to resolve incident<br />Middle<br />WORKFORCE – Allow IT professionals 2-8 hours to resolve a problem. If not, they would be allowed to escalate incident and apply the RCA process<br />workforce<br />
  20. 20. Levels of troubleshooting<br />SEV 3: - Thinking on Your Feet – “Checklist” problem solving using appropriate checklists. Leadership would allow the IT professional to resolve an incident within 8 hours. If this does not happen the incident is escalated. <br />SEV 2: - Intuitive Analysis – Leadership instructs and allows the natural team to perform an intuitive RCA on the incident. If not resolved the team escalates the incident. <br />SEV 1: - Investigative Analysis – In-house trained RCA facilitators have the permission of Leadership to assemble a cross-silo team to formally investigate the incident with the appropriate RCA tools to systematically arrive at the TRUE & ROOT causes for a problem situation<br />
  21. 21. “Minimalistic principle”..<br />“Too much information<br />can cause confusion.<br />The key is to get all the<br />relevant information onto one page and that is normally substantially less than gathering ‘all’ the Information.”<br />Innovation – the FreeZone thinking experience.<br />by Kepner & Fourie<br />Only need to analyse the information that would be relevant to the incident<br />Worked questions within a customised “factor analysis” framework<br />Get a quick factual “snapshot” of the characteristics of the incident and then use SME experience and gut feel to explain the snapshot<br />Test SME inputs against logic of snapshot<br />
  22. 22. Example of templates with questions<br />
  23. 23. Strategy 2: Improve management of information<br />SPECIFIC RESULTS ACHIEVED<br /><ul><li>Staff knew exactly when to apply a formal RCA process, when to involve a facilitator and when to call on a cross-functional SME
  24. 24. Gave IT professionals the confidence that they were working through a problem situation systematically and comprehensively
  25. 25. Developed a “no-nonsense” incident investigation culture – you ask a question; you either have the answer or you need to go and get it.</li></ul>“Every incident <br />has multiple<br />entry points. To<br />be successful in<br />solving the<br />incident you need<br />to find the correct<br />entry point.”<br />Matt Fourie<br />
  26. 26. Strategy 3: Improve quality of information<br />Specific challenges<br />Wasted time and effort having to do too many replications<br />Mostly dealing with raw data instead of information<br />Long investigation cycle times<br />High levels of recurring incidents<br />Client Actions<br />Introduced a set of interrogative questions to convert raw data into meaningful information<br />Created “deductive” reasoning culture to arrive at answers quickly and effectively <br />Testing possible causes on paper to eliminate 90% of replication time, effort and money<br />
  27. 27. Incident statement - sample<br />
  28. 28. Snapshot info for causes<br />OBJECT – What object and which other object(s) not?<br />FAULT – What fault and which other typical faults not?<br />USERS – Who has the problem and who does not?<br />WHERE – Where are these users and where could they have been but are not?<br />TIMING – When did it happen first time and when not?<br />PATTERN – What is the pattern of faults and what could it have been but is not?<br />CYCLE– In which cycle does the problem occur and in which cycle does it not occur?<br />
  29. 29. CauseWise sample<br />
  30. 30. Snapshot info for Solutions<br />Four Question Drill<br />What are the results you want to achieve with this solution?<br />What are the existing problems you would like to remove with this solution?<br />What are the potential risks you would like to avoid with this solution?<br />What money and time do you have or do you need to preserve? What are the restrictions out of your control?<br />
  31. 31. SolutionWise Demo<br />
  32. 32.
  33. 33.
  34. 34.
  35. 35.
  36. 36. Reducing cycle times<br />X<br />X<br />X<br />Server slow<br />X<br />X<br />
  37. 37. Strategy 3: Improve quality of information<br />SPECIFIC RESULTS ACHIEVED<br /><ul><li>Incident root cause found first time every time.
  38. 38. Meetings became more productive
  39. 39. RCA method always created a better and common understanding of the problem situation to all stakeholders
  40. 40. Recurring incidents were virtually eliminated
  41. 41. Cycle times for incident investigations reduced drastically</li></ul>I keep six honest serving-men:<br />(They taught me all I knew) <br />Their names are What and <br />Where and When And How and <br />Why and Who. I send them <br />over land and sea, I send them<br />East and West; but after they <br />have worked for me, I give them <br />all a rest. <br />Rudyard Kipling<br />
  42. 42. Strategy 4: Improve support for incident investigations<br />Specific challenges<br />Did not know “Who, What, How and When”<br />No “Go To” person to help with effective investigations<br />Client actions<br />Trained in-house professional RCA investigators<br />Established a “rules of engagement” for facilitators<br />Publicise successes<br />Recognition by Management<br />
  43. 43. Training in-house facilitators<br />Advice to Incident Owner on who to invite to RCA meeting to improve chances of a quick success (Stakeholders & Info Sources)<br />How to prepare a team for an effective RCA meeting<br />Exceptional investigation facilitation skills (the art of asking the right questions and how to verify it for authenticity) <br />RCA process skills to enable the facilitator to lead any team at any level in investigations.<br />“One of the main reasons for <br />incident investigation failure<br />is “analysis paralysis” – <br />having to work with too<br />much information”<br />Infra-Structure Manager<br />Airline Software Platforms<br />
  44. 44. Strategy 4: Improve support for incident investigations<br />SPECIFIC RESULTS ACHIEVED<br /><ul><li>Facilitators established a forum for themselves, meeting once a month to discuss lessons learned and sharing successes
  45. 45. Facilitators are now also used to help solve vendor issues affecting application performance
  46. 46. Facilitators started to feed results into an agreed knowledge data base, also encouraging informal use of RCA incidents to be recorded
  47. 47. Increased division awareness of how well they are doing with application performance issues</li></ul>“It is always a good <br />strategy to stand a<br />few steps back and <br />looking at the <br />incident from a<br />different angle”<br />Unknown<br />
  48. 48. Application Performance results<br />M-T-T-R went from weeks to a couple of hours<br />Improvement in M-T-T-R practices by nearly 50%<br />Availability of critical systems went from 77% to 94%<br />HOURS<br />WEEKS<br />
  49. 49. Improvement in escalations<br />Escalation of severity 3 to severity 2 reduced by nearly 24%<br />Escalation of severity 2 to severity 1 reduced by 76%<br />Recurring incidents reduced by 35%<br />
  50. 50. Lessons learned..<br />Most of the recurring incidents and problems are caused by “out of date procedures” and lack of proper documentation<br />RCA is a “mental orientation” which people have to get trained in – “does not come with experience”<br />IT professionals need a “thinking approach” that could be applied in most situations<br />Rules of Engagement to become a standing order<br />Encourage use in all incident investigation meetings – ask for the paperwork/evidence<br />Sponsors continuous RCA training<br />Regular email communications to publish successes<br />
  51. 51. Thank you for your time!<br />If you have any further questions regarding Minor or Major Investigations and how to acquire the in-house skills to improve your metrics on this drastically, please do not hesitate to speak to us after this or Andrew on; <br />andrew@thinkingdimensions.com.au<br />
  52. 52. ITIL Centric Processes<br />