Processes in Service Operation Event Management
Event Management• GOAL – To detect Events, make sense of them, and determine the appropriate control action to be provided by Event Management• OBJECTIVE – To provide the entry point for the execution of many Service Operation processes and activities• SCOPE – Any aspect of Service Management that needs to be controlled and which can be automated
Concept : EventAn event is a change of state that is significant for themanagement of Configuration Item or service For example… This term is also used Action Needed Backup In Progress to mean an alert or notification. STATUS STATUS IN MAINTENANCE UNAVAILABLE Events typically IT Operations personnel to take action, and STATUS STATUS often lead to Incidents PROCESSING AVAILABLE being logged UNAUTHORIZED SERVICE ACCESS DEGRADED
Concept Of AlertAn Alert is a warning that… CHANGE • A threshold has been reached • Something has changed • A failure has ocurred • Alerts are often created and managed by system management tools and are managed by the event management process • The purpose of a alert is to ensure that the person with the skills appropriate to deal with the event is notified
Key Metrics• No. of events by: – Category – Significance• No. and % of events – That required human intervention and wheter this was performed – That resulted in Incidents or Changes – Caused by existing Problems or Known Errors – Compared with the number of Incidents• No. and % of: – Repeated or duplicated events – Events indicating performance issues – Events indicating potential availability issues – Each type of events per platform or application
Implementation Challenges Correct level of filtering Obtain fundingRolling out necessary monitoring Acquiring necessary agents skills
Service Operation Processes Incident Management
Incident Management• GOAL – To restore normal service operation as quickly as possible and minimize the adverse impact on business operations• OBJECTIVE – To ensure that the best possible levels of service quality and availability are maintained• SCOPE – Incident Management includes any Events which disrupts, or which could disrupt ,a service. This includes Events which are communicated directly by users, either through the Service Desk or through an interface from Event Management to Incident Management tools
Basic Concepts• Timescales• Incident model• Major Incidents NOTE: People sometimes use loose terminology and/or confuse a Major Incident with a Problem. In reality, an Incident remains an Incident forever – it may grow in impact or priority to become a Major Incident, but an Incident never ‘becomes a Problem’. A Problem is the unknown cause of one or more Incidents and remains a separate entity always.
Key Metrics• Total numbers of Incidents – Breakdown at each stage• Mean elapsed time to achieve Incident resolution to circumvention, broke down by impact code• Percentage of Incidents handled within agreed response time• Incident response-time• Average cost per Incident• Number and percentage of: – Major incidents, backlog, incorrectly assigned or categorized – Resolved remotely, without the need for a visit
Implementation Challenges• The ability to detect Incidents as early as possible• Convincing all staff (technical teams as well as users) that all Incidents must be logged• Availability of information about Problems and Known Errors• Integration into the: – Configuration Management system (CMS) – Service Level Management process (SLM) – Service Knowledge Management System (SKMS
Service Operation Processes Request Fulfillment
Request Fulfillment• GOAL – To deal with Service Requests from the users/customers• OBJECTIVE – To provide a channel for users to request and receive standard services for which a predefined approval and qualification process exists – To provide information to users and customers about the availability of services and the procedure for obtaining them – To source and deliver the components of requested standard services (e.g. : licenses and software media) – To assist with general information, complaints or comments• SCOPE – Each organization will need to decide and document which requests it will handle through the Request Fulfillment process and which others will have to go through more formal Change Management process
Concept of the Service RequestThe request from a user for information, advice, a standard change or access to an IT service. – For Example : • To reset a password • To provide standard IT services for a userService requests are usually handled by a Service Desk and do not require an RFC to be submitted.
Concept of the Request ModelThe Request Model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of request) in an agreed waySupport tools can be used to manage the required process. This will ensure that standard requests are handled in a predefined path and within predefined timescales
Key Metrics Average Cost BacklogMet SLA Did not meet SLA~~~~~~~~ ~~~~~~~~~~~ ~~~~~~~~~~~ ~~~ Satisfaction Surveys
Implementations ChallengesClearly defining and documenting the type of requests that will be handled within the Request Fulfillment process (and those that will either go through the Service Desk and be handled as Incidents or those that will need to go through formal Change Management) – so that all parties are absolutely clear on the scopeEstablishing self-help front-end capabilities that allow the users to interface successfully with the Request Fulfillment process
Service Operation Processes Problem Management
Problem Management• GOAL – To diagnose the root cause of incidents, to determine the resolution to those problems and to implement resolutions through appropriate control procedures• OBJECTIVE – Primarily to prevent problems and resulting Incidents, eliminate recurring Incidents and to minimize the impact of Incidents that cannot be prevented• SCOPE – The Management of the lifecycle of all problems
Problem ManagementProblem Management is the process responsible for managing the Lifecycle of all ProblemsProblem Management consists of two major processes : 1. Reactive Problem Management is generally executed as part of Service Operation and is, therefore, covered in the Service Operation book 2. Proactive Problem Management is initiated in Service Operation, but is generally driven as part of Continual Service Improvement.
ProblemThe unknown cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created, and the Problem Management process is responsible for further investigation
Problem Investigation & Diagnosis• Chronological Analysis• Pain Value Analysis• Kepner and Tregoe• Brainstorming• Ishikawa Diagrams• Pareto Analysis
Workaround• A technique which reduces or eliminates the impact of an incident or problem for which a full resolution is not yet For Example… • Restarting a failed Configuration Item • Rerouting workload Incident # xxxx • Workarounds for incidents that do not Category : … have associated problem records are Step 1 : … Step 2 : … documented in the incident record Step 3 : … Step 4 : … Shared Data
Known ErrorA Problem that has a documented Root Cause and a Workaround Root Cause Workaround Known Error + + = ProblemKnown Errors are created and managed throughout their lifecycle by Problem Management. Known Errors may also be identified by development or suppliers
Known Error Database ( KEDB )• A database containing all Known Error records• The purpose is to store previous knowledge of Incidents and Problems, and how they were overcome, to allow quicker diagnosis and resolution if they recur KEDB • This database is created by Problem Management and used by Incident and Problem Management Known Error Known Error • The Known Error Database is part of the Service Knowledge Management System Known Known Error Error
Concept of The Problem ModelA problem model is a way of predefining the steps that should be taken to handle a process (in this case a process for dealing with a particular type of problem) in an agreed waySupport tools can then be used to manage the required process. This will ensure that ‘standard’ problems are handled in a pre-defined path and within pre-defined timescales
Key Metrics• Total problem recorded • Average cost per problem• % of problems resolved • # of Major problems within SLA identified• # or % problems that • # of Major problem exceed resolution targets reviews conducted • Known Errors added to• Aged problems KEDB
Implementation Challenges• The establishment of an effective Incident Management process and tools• Formal interfaces and common practices between the two processes• Links between Incident and Problem Management tools• The ability to relate Incident and Problem Management Records• Second and third-line Staff need to have a good working relationship with first-line staff• Business Impact is well understood by staff undertaking investigation of problems• Problem Management is able to use all Knowledge and Configuration Management resources available
Management• GOAL – To execute the policies and actions defined in Security and Availability Management.• OBJECTIVE – To provide the entry rights for users to be able to use service or group of services• SCOPE – Access Management ensures that users are given the rights to use the service, but it does not ensure this access is available at all agreed times
Concepts• Access• Identity• Rights ( also called privileges )• Services or service groups• Directory services
Key MetricsNumber of … – Requests for access ( Service Request, RFC, etc.) – Incidents requiring a reset of access rights – Incidents caused by incorrect access settingsInstances of access granted : By service, user , department, etc.
Implementation ChallengesProvision of a database of all users and the rights that they have been granted the ability to… – Verify the identity of a user – Verify the identity of the approving person of body – Verify that a user qualifies for access to a specific service – Link multiple access rights to an individual user – Being able to determine the status of the user at any time – Manage changes to a user’s access requirements – Restrict access rights to unauthorized user’s