Problem management in Information Technology Infrastructure Library

6,144 views

Published on

Published in: Technology

Problem management in Information Technology Infrastructure Library

  1. 1. PROBLEM MANAGEMENTFrom: ITIL. The key to managing IT services BY FARSHAD BADIE farshadbadie@gmail.com COMPUTER SCIENCE M.Sc. UNIVERSITY OF DEBRECEN
  2. 2. “PROBLEM MANAGEMENT” Definition Problem Management is the process responsible for managing the lifecycle of all problems. Goal Is to minimize the adverse impact of incidents and problems on the business that are caused by errors within the IT infrastructure. And to prevent recurrence of incidents related to these errors. “The Proactive Problem Management is concerned with identifying and solving
  3. 3. PROBLEM & KNOWN ERROR COCEPTS  A problem, is an unknown underlying cause of one or more incidents.  A known error is a problem that is successfully diagnosed and for which a Work- around has been identified.
  4. 4. Proactive Problem ManagementProblem Error ControlControl Scope Of Problem Managemen t
  5. 5. INPUTS TO THEPROBLEM MANAGEMENT PROCESS Incident details from Incident Management Configuration details from the Configuration Management Database Any defined Work-around (From Incident Management)
  6. 6. OUTPUTS OF THEPROBLEM MANAGEMENT PROCESSUnknown Errors A Request For Change (RFC) An Updated A Problem closed Record Proble m record- For a Response fromresolve Managem incident d matching to ent problems and Informatio known errors n
  7. 7. The relation between Problem Management and Incident Management The Problem Management process is intended to reduce both the number and severity of Incidents and Problems on the business.A problem is a condition often identified as a result of multiple Incidents that exhibit common symptoms. Problems can also be identified from a single significant Incident, indicative of a single error, for which the
  8. 8. Problem Incident Management Management The main goal of PM is the detection of the underlying causes of an Incident and their subsequent resolution and prevention. In many situations their goal can be in directconflict with the goals of IcM where the aim is torestore the service to the customer as quickly as possible.
  9. 9. PROBLEM CONTROL This process is concerned with handling problems in an efficient and effective way. The aim is to identify the root cause Problem identification & recording Problem Classification Problem investigation & diagnosis
  10. 10. ERROR CONTROL It covers the processes involved in progressing Known Errors until they are eliminated by the successful implementation of a Change (That Change Management process controls it). The objective of error control is to aware of errors, to monitor them and to eliminate them when feasible and cost- justifiable. ACTIVITIES RECOGNISED IN ERROR CONTROL : Error identification Recording Error Error assessment & Recording Resolution Monitoring Problem Error Closure & Error Resolution Progress
  11. 11. PROACTIVE PROBLEM MANAGEMENT Covers the activities aimed at identifying and resolving Problems before Incidents occur Targeting Providing Trend Information to Support the Analysis Action Organization
  12. 12. BENEFITS Desk better first- PROBLEM time fix rate at theMANAGEMENT service Improved Improved IT service organisational quality learningIncident volume Permanent reduction solutions
  13. 13. PLANNING AND IMPLEMENTATION TIMING & PLANNING Good PM relies to a great extent on an implemented and efficient IcM process. If resources are scared, it is advisable to concentrate in the first instance on the implementation of Problem and error control ( Reactive Problem Management) Smaller organizations can introduce reactive Problem Management by focusing daily on the ‘top ten’ Incidents of the previous day.
  14. 14. PLANNING AND IMPLEMENTATION KEY SUCCESS FACTORS An effective automated registration on Incidents, with an effective classification is fundamental for the success of PM. Setting achievable objectives and making use of the problem-solving talents of existing staff is a key activity. In view of the potentially conflicting interests between IcM and PM, good cooperation between both processes are essential.
  15. 15. PLANNING AND IMPLEMENTATION RISKS Absence of a good Incident control process, hence the absence of detailed historical data on incidents. Failure to link Incident records with Problem/error records. Lack of management commitment. The undermining of the Service Desk role. Failure to set aside time to build and maintain the knowledge base will restrict the delivery of benefits. The inability to determine accurately the
  16. 16. PROBLEM CONTROL ACTIVITIES Problem identification & recording Problem classification Problem investigation & diagnosis
  17. 17. PROBLEM IDENTIFICATIONProblem identification is actually seeing the problem before trying to solve it. It is the first strategy in solving a problem. The process of Problem identification, includes the basic classification of problems. Data on affected CIs should be accurately appended to this basic classification data. Ideally, these CIs are the lowest level of item capable of discrete amendment.
  18. 18. PROBLEM CLASSIFICATION When a problem is identified, the amount of effort required to detect and recover the failing CIs has to be determined. Therefore it is important to be aware of the impact of the problem on existing service levels. This process is known as Classification.The steps involved in Problem classification are similar to the steps in classifying Incidents; They are to determine: category – impact – urgency – priority
  19. 19. PROBLEM CLASSIFICATION• Problems are categorised into related groups ordomains (e.g. hardware , software , supportsoftware)• Identification of a new Problem should befollowed by an objective analysis of its impact.• Urgency is the extent to which resolution of aProblem or error can bear delay; It should not beconfused with priority.• Priority indicates the relative order in which aseries of items – be they Incidents, Problems,
  20. 20. PROBLEM INVESTIGATION & DIAGNOSIS The process of Problem investigation is similar to that of Incident investigation, But the primary objective of each process is significantly different, Incident Management’s aim is rapid restoration of service, whereas Problem Management’s aim is
  21. 21. ERROR CONTROL ACTIVITIES
  22. 22. ERROR IDENTIFICATION & RECORDING An error is identified when a faculty CI (a CI that causes, or may be likely to cause, Incidents) is detectedERROR CONTROL IN THE SOFTWARE ENVIRONMENT
  23. 23. ERROR RESOLUTION RECORDINGThe resolution process for each Known Error should be recorded in the Problem Management system. This data is then available for Incident matching. ERROR CLOSUREFollowing successful implementation of changes toresolve errors, the relevant Known Error record(s) is closed, together with any associated Incident or Problem records.PROBLEM/ERROR RESOULTION MONITORING Problem Management should monitor the continuing impact of Problems and Known
  24. 24. PROACTIVE PROBLEM MANAGEMENT The activities described so far in Problem and error control are mainly reactive. Proactive Problem Management activities are concerned with identifying and resolving Problems and Known Errors before
  25. 25. The main activities within Proactive PM processes are Trend Analysis and the Targeting of preventive action Trend Analysis Targeting of preventive  Is the practice of action collecting information  Trend analysis can and attempting to spot a lead to the pattern, or trend, in the identification of information. general problem areas  Although trend analysis needing more support is often used to predict attention. future events, it could be  Trend Analysis can used to estimate lead to the uncertain events in the identification of faults past. in the IT infra-  In fact it uses historical structure, which can then be analyzed and
  26. 26. MAJOR PROBLEM REVIEWSWhat was doneright?What was donewrong ?What could be done betternext time?How to prevent the Problem fromhappening again?
  27. 27. PROBLEM/ERROR CONTROL REPORTING The number of RFCs raised and the impact of them on the availability –reliability of the services covered, The amount of time worked on investigations and diagnoses per organizational unit. The number and impact of Incidents occuring before the root Problem is closed or a Known Error is confirmed. The plans for resolution of open Problems with regard to resources. A short description of actions to be
  28. 28. Roles within Problem Management >>Problem Manager<< Developing & Developing & Recommending maintaining the maintaining for improving the Problem control Problem/error error control process control system Reviewing the Reviewing the Monitoring the efficiency of the efficiency of effectiveness of Problem control proactive Problem error control process Management activities AllocatingManaging Problem resources for the support staff support effort
  29. 29. Roles within Problem Management >>Problem Support<< Identifying the Identifying trends Identifying root causes & Potential Problems major Incidents sources Preventing the Assisting with Investigating replication of the handling of Problems Problems across major Incidents multiple systems MonitoringRaising RFCs to progress on the clear errors resolution of Known Errors
  30. 30. Kepner & Tregoe AnalysisDefining the Problem Describing the Problem with regard to identity , location , time and size Establishing possible causes Testing the most probable cause Verifying the true cause

×