• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Major Incident Draft Template
 

Major Incident Draft Template

on

  • 8,826 views

Major Incident Draft Template

Major Incident Draft Template

Statistics

Views

Total Views
8,826
Views on SlideShare
8,826
Embed Views
0

Actions

Likes
1
Downloads
242
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Major Incident Draft Template Major Incident Draft Template Document Transcript

    • Major Incident Draft (MID) TemplateMethodologyThere is a close relationship between problem management and the major incident process.An incident is any event that is not part of the standard operation of a service and that causes an interruption or a reduction in the quality of that service. Incidents are recorded in a standardized system which is used for documenting and tracking outages and disruptions. A Major Incident is an unplanned or temporary interruption of service with severe negative consequences. Examples are outages involving core infrastructure equipment/services that affect a significant customer base, such as isolation of a company site, which is considered a Major Incident. Any equipment or service outage that does not meet the criteria necessary to qualify as a Major Incident is either a Moderare, Minor or Normal Incident. Major incident reports are escalated to the problem manager for quality assurance.Incident PyramidThe scale of incidents follows an Incident Pyramid where the most incidents are normal, escalating up to a singular Major Incident. The major incident process consists of following components:Impact assessment: Classification, outage analysis and risk management.Measurement using the Incident User Metric (IUM).Grading the resources involved in a major incident.Major incidents can be seen in the context of a quadrant diagram. In this quad major incidents are either failures or misses and result from undesired outcomes. The difference between a miss and a failure is that the former is associated with desired activity while the latter is not. Success is measured as having both desired activity and outcomes. In the context of problem management a lucky break, desired outcome associated with undesired activity, is not optimal and should be investigated.IcebergIncidents are a portion of activity in problem management that forms the tip of an iceberg. The major incident process deals with the visible portion of the iceberg, while in the greater field of problem management a large number of non-visible issues are lurking.
      Notification (issued within 6 working hours of trigger)Preliminary (issued within 6 working hours of workaround)Final (issued within 6 working hours of normal business operations resuming)
      Details
      DescriptionService desk / Risk loggingTrigger (who requested the report/notification)Service affectedData networksADMessagingSecurityPaymentsOperationsVoice Service deskHostingMonitoringIntranetPrintingDocumentsThird partyEcommerceExtranetBackupsStorageIdentification (please clearly describe the incident and its symptoms – immediate and visual causes)Business impact (please describe clearly the undesired outcome)Conditions (please describe the environment – business or IT – conditions that caused or were present during the incident)ResolutionInitial (describe the workaround)Final (describe the steps taken to resume normal operations)
      Proximate cause analysisChangeComponent FailureCapacityConfigurationCarrierService ProviderEnvironmentalBugHardwareVendorProcess
      Identification
      AssetOwner (who does the asset belong to)Physical component or system (CI name in CMDB)Value (in $)<$x>Resources Headcount affected (application or service)Please supply the names of the following contacts involved in the incident1st line escalation,2nd line escalation,3rd line escalation),Region including head count (Mark with a X)Country,Location, Business Unit including head count (Mark with a X)Please try and supply an accurate count of the number of users involved.Business unit,
      Execution
      Timelines (date and times) the expanded incident lifecycleTime when incident started (actual – something has happened to a CI or a risk event has occurred)
      Time when incident was detected (incident is detected either by monitoring tools, IT personnel or, worse case, the user/customer)
      Time of diagnosis (underlying cause – we know what happened?)
      Time of repair (process to fix failure started or corrective action initiated)
      Time of recovery (component recovered – the CI is back in production – business ready to be resumed)
      Time of restoration (normal operations resume – the service is back in production)
      Time of workaround (Service is back in production with workaround)
      Time of escalation (to problem management team)
      Time period service was unavailable (SLA measure)Time period service was degraded (SLA measure)
      Measurement
      Function Please select the most appropriateCause Please select the most appropriateType (mark with a X) To calculate the IUM please select a single type which best describes the incident OutageUsageMultiplierScrutiny by managementProfit3YesNo*Effect on productivity*Staff10Yes*NoImpact on company’s imageShare price5YesNo*Direct financial impactAssetsYesNo*Liability or vulnerabilityNominal1YesNo*Limited to internal IT processBudget2YesNo*Incident User Metric Cost of Downtime AnalysisMetric is based on Outage type multiplier * outage time in minutes * percentage of users effected. Calculation is also weighted based on degradation at 60% and non-business hours at 50%. Default outage type is “Effect on productivity.”
      Classification
      • Scope (Mark with a X) Dashboard designation = S(4) More than 50% of customers affected(3) More than 25% of customers affected(2) Less than 25% of customers affected*(1) Less than 1% of users affected(0) Single IT customer affectedCredibility (Mark with a X) Dashboard designation = CR(4) Areas outside the company will be affected negatively(3) Company affected negatively(2) Multiple business units affected negatively(1) Single business units affected negativelyNo credibility issue*Operations (Mark with a X) Dashboard designation = OP(4) Interferes with core business functions(3) Interferes with business activities*(2) Significant interference with completion of work(1) Some interference with normal completion of work(0) No work interferenceUrgency (Mark with a X) Dashboard designation = U(4) Underway and could not be stopped(3) Caused by unscheduled change or maintenance(2) Incident caused by a change(1) Incident caused by scheduled maintenance(0) Completion time not important*Prioritization (Mark with a X) Dashboard designation = PReviewing the scope , credibility, operations and urgency please classify the priority of the incident(4) Critical - An immediate and sustained effort using all available resources until resolved. On-call procedures activated, vendor support invoked.(3) High - Technicians respond immediately, assess the situation, and may interrupt other staff working low or medium priority jobs for assistance.(2) Medium - Respond using standard procedures and operating within normal supervisory management structures.(1) Low - Respond using standard operating procedures as time allows. *(0) No prioritizationROC calculationSCROPUPTOTAL / 20%
      Outage analysis
      Service period outage classification (Mark with a X) Dashboard designation = P(4) Critical - App, server, link (network or voice) unavailable for greater than 4 hours or degraded for greater than 1 day – negative business delivery for more than 1 month(3) Major - App, server, link (network or voice) unavailable for greater than 1 hour or degraded for greater than 4 hours - negative business delivery for more than 1 week(2) Moderate - App, server, link (network or voice) unavailable for greater than 30 minutes or degraded for greater than 1 hour - negative business delivery for more than 1 day (1) Minor - App, server, link (network or voice) unavailable greater than 5 minutes or degraded for greater than 30 minutes - negative business delivery for more than 1 hour(0) Low* - App, server, link (network or voice) unavailable for less than 5 minutes or degraded for less than 30 minutes - negative business delivery for less than 1 hourService consequence outage classification (Mark with a X) Dashboard designation = C(4) Critical - Financial loss, which puts a business unit in a critical position - greater than $10m or substantial loss of credibility or litigation or prosecution or fatality or disability.(3) Major - Financial loss which severely impacts the profitability of a business unit - greater than $1m or serious loss of credibility or sanction or impairment(2) Moderate - Financial loss which impacts the profitability of the business unit, greater than $100k or embarrassment or reported to regulator or hospitalization.(1) Minor -Financial loss with a visible impact on profitability but no real effect, greater than $10k or some embarrassment or rule or process breaches or medical treatment(0) Low* - Financial loss with no real effect, less than R50k or irritating or no legal or regulatory issue or no medical treatment.ROC analysisPCTotal / 8%
      Risk management
      Risk impact (Mark with a X) Dashboard designation = IEvaluate the data and information that is directly effected by the incident taking into account the involvement of the people, process, products and partners.“At Risk” issuesConfidentiality (Information is accessible only to those authorized)SecureConfidentialRestricted*PublicIntegrity (Safeguarding the accuracy and completeness of information)Very highHighModerate*LowAvailability (Authorised users have access to information when required.)MandatoryVery highHighModerate*LowRating Taking into account the above please rate the Risk impact(4)Critical(3)Major(2)Moderate(1)Low(0)NoneRisk vulnerability (Rate as either low, moderate, high or major) Dashboard designation = VRate the vulnerability in the following categories of the information or data that is affected by the incidentLossErrorFailure*Rating Taking into account the above please rate the Risk vulnerability(4)Critical(3)Major(2)Moderate(1)Low(0)NoneCountermeasures Dashboard designation = CMWhat measures are in place to mitigate any risks identified with the information or data affected by the incidentRating Taking into account the above please rate the Risk Countermeasures(4)Critical(3)Major(2)Moderate(1)Low(0)NoneROC analysisIVCMTOTAL / 12%
      Closure
      Escalations (Mark with a X)Further Root Cause analysis requiredYesNo*Escalated to Operational RiskYesNo*Escalated to CTO/CIO (logged as a problem at Service Desk)YesNo*Escalated to Infrastructure ManagerYesNo*No EscalationYes*No