SlideShare a Scribd company logo
Major
Incident
Management
Andrew Vermes
Time for some fun!
• Why are major incidents so hard?
• Straightforward ways to share information
• A live, interactive simulation
Caught in headlights
• It’s easy to panic
• Calm atmosphere helps
Keep questions helpful
When will it be fixed?
Is everything under control?
Who’s responsible for this?
Did you change anything?
Why did we let this happen?
Too much vital information is hidden
Some more
Emails
Magic
Happens
Case Closed
Case Opened
Some
Emails & Calls
Sometimes the
magic is “Spoke
to Mike & Steve”
Most case
documentation
does not include
the actual root
cause of a problem,
therefore knowledge
reuse is minimised
Processes must be second nature
Emergency services
depend on:
•Clear processes
•Checklists
•Repeated training
Limited time to
think: needs
responses to be fast
and automatic
Four things need effective control:
• What’s happening?
• How will we restore
service?
• What risks are there?
• What investigation is
needed?
Major
Incidents
Understanding
InvestigatingRestoring
Preventing
Separate the workstreams
Four key questions:
What’s happening?
What will we do about it?
What risks are there?
How do we find the cause?
The goal is always-
effective action
DECISION
ANALYSIS
To select the
right fix or
workaround
POTENTIAL
PROBLEM
ANALYSIS
To manage
risksSITUATION
APPRAISAL
To Sort Out
Priority
Actions
PROBLEM
ANALYSIS
To Find True
Cause
Keep your working information visible
• Separate
into 4 areas
Keep information updated
• Ensure your
dashboard is
updated in
real time
Major Incident Simulation
Please go to:
http://mindstormrobotics.com/
You will find information about
the incident there
SITUATION
AUGUST
30
Tom Lewis, production manager at
Stockholm Brick Company, has contacted
you because one of their brick sorting
robots, the TX72-6, has a problem.
You repaired the conveyor belt yesterday, but now the
sorting system has broken down again. The breakdown
costs the company roughly 1,000 dollars per minute.
Tom requests your assistance to solve the problem
immediately.
Teamwork is key
In teams:
• Review your information
• Complete the Major Incident Dashboard
• Decide what to do next
• Take the necessary actions and update the
dashboard
Updating your dashboard
• Keep
information
visible to your
team
Check point
• How much has your team spent so far?
• What progress has been made in incident recovery?
• Which moves did you make that added value?
Wrap up
Start with a clear incident statement
(what process, what symptom)
Look early for similar but unaffected CIs for comparison
The solution is…
First 10 minutes
KT INCIDENT DASHBOARD time: August 30, 11:15
Incident Summary:
Customer Issues, Priorities, Impact -
What When Where
Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation stepDue Done Who
Bricksorting robot TX72-6 problems
Sorting system broken down
Conveyor repaired yesterday
Impact to client= $1,000/minute
Goals fix must meet Possible fixes Best
Fit?
Due Who
Risks and Opportunities
list
Preventive Actions Contingent Actions Due Done Who
Restore fast
Certainty of fix
Avoid causing other incidents
Bricksorting robot TX72-6 problems today
Incident Overview & Action Plan Problem Investigation - Finding Cause
Decisions to be made Risk Management
After 30 minutes
KT INCIDENT DASHBOARD time: August 30
Incident Summary:
Customer Issues, Priorities, Impact -
What When Where
Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation step Due Done Who
Bricksorting robot TX72-6 problems The chute assembly presses
long against the touch sensor
Application update
yesterday
Sorting system broken down IS NOT badly Sorting bricks
when initially moving right
Touchsensor sw update
Conveyor repaired yesterday Happens during reset Database slow to respond
Impact to client= $1,000/minute Started August 30
Goals fix must meet Possible fixes Best
Fit?
Due Who
Risks and Opportunities list Preventive Actions Contingent Actions Due Done Who
Restore fast Reload database
Certainty of fix Replace touch sensor
Avoid causing other incidents Repair conveyor (again)
Replace control unit
Replace Motor B
Brick sorting robot sorting incorrectly; possible causes determined.
Incident Overview & Action Plan Problem Investigation - Finding Cause
Decisions to be made Risk Management
After 60 minutesKT INCIDENT DASHBOARD time: August 30
Incident Summary:
Customer Issues, Priorities,
Impact - What When Where
Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation step Due Done Who
Bricksorting robot TX72-6
problems
The chute assembly presses
long against the touch sensor
Application update
yesterday
Check log files for
evidence
Sorting system broken down IS NOT badly Sorting bricks
when initially moving right
Touchsensor sw update Check log files for
evidence
Conveyor repaired yesterday Happens during reset Database slow to respond Contact DB team
Impact to client= $1,000/minute Started August 30
Goals fix must meet Possible fixes Best
Fit?
Due Who
Risks and Opportunities list Preventive Actions Contingent Actions Due Done Who
Restore fast Reload database Replacing Touch sensor Review work instructions Contact MR support
Certainty of fix Replace touch sensor X Check part upfront for
DOAAvoid causing other incidents Repair conveyor (again) temp
Replace control unit
Replace Motor B
Brick sorting robot TX72-6 now sorting correctly
Incident Overview & Action Plan Problem Investigation - Finding Cause
Decisions to be made Risk Management
Takeaways for Major Incident Managers
(and for everyone):
1. Avoid holding people on calls for many hours
2. Keep all information visible in real time
3. Require specific information from participants
4. Run regular simulations to keep skills high
5. Set clear update times and stick to them
Leaders in Problem Solving
twitter.com @KepnerTregoe
facebook.com/KepnerTregoe
linkedin.com/company/kepner-tregoe
Andrew Vermes
Senior Consultant
+44 (0) 7973506628
avermes@kepner-tregoe.com
Andrew Vermes: Major Incident Management

More Related Content

What's hot

Incident Management
 Incident Management Incident Management
Incident Management
iicecollege
 
Incident Mgmt Process Guideand Standards
Incident Mgmt Process Guideand StandardsIncident Mgmt Process Guideand Standards
Incident Mgmt Process Guideand StandardsEdward Paul Pagsanhan
 
Incident Management
Incident ManagementIncident Management
Incident Management
Abhishek Agnihotry
 
ITIL Incident Management Workflow - Process Guide
	 ITIL Incident Management Workflow - Process Guide	 ITIL Incident Management Workflow - Process Guide
ITIL Incident Management Workflow - Process Guide
Flevy.com Best Practices
 
Business Continuity & Disaster Recovery
Business Continuity & Disaster RecoveryBusiness Continuity & Disaster Recovery
Business Continuity & Disaster Recovery
EC-Council
 
Incident Management PowerPoint Presentation Slides
Incident Management PowerPoint Presentation SlidesIncident Management PowerPoint Presentation Slides
Incident Management PowerPoint Presentation Slides
SlideTeam
 
IT Service Level Agreement
IT Service Level AgreementIT Service Level Agreement
IT Service Level Agreement
KHNOG
 
Incident management
Incident managementIncident management
Incident Management Framework
Incident Management FrameworkIncident Management Framework
Incident Management Framework
JohnPereira62
 
Incident and Problem management simplified
Incident and Problem management simplifiedIncident and Problem management simplified
Incident and Problem management simplified
Valentyn Barmak
 
Problem Management
Problem ManagementProblem Management
Problem Management
Abhishek Agnihotry
 
Problem Management Overview
Problem Management OverviewProblem Management Overview
Problem Management Overview
Marval Software
 
ITSM and Service Catalog Overview
ITSM and Service Catalog OverviewITSM and Service Catalog Overview
ITSM and Service Catalog OverviewChristopher Glennon
 
June2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem MgmtJune2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem Mgmt
IT Service and Support
 
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
Jesse Andrew
 
Business Continuity Planning
Business Continuity PlanningBusiness Continuity Planning
Business Continuity Planning
Institute for Business Continuity Training
 
Managed Services Support
Managed Services SupportManaged Services Support
Managed Services Support
jdivalerio
 
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdfITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
ManishKumar526001
 
IT-Centric Disaster Recovery & Business Continuity
IT-Centric Disaster Recovery & Business ContinuityIT-Centric Disaster Recovery & Business Continuity
IT-Centric Disaster Recovery & Business Continuity
Steve Susina
 

What's hot (20)

Incident Management
 Incident Management Incident Management
Incident Management
 
Incident Mgmt Process Guideand Standards
Incident Mgmt Process Guideand StandardsIncident Mgmt Process Guideand Standards
Incident Mgmt Process Guideand Standards
 
Incident Management
Incident ManagementIncident Management
Incident Management
 
ITIL Incident Management Workflow - Process Guide
	 ITIL Incident Management Workflow - Process Guide	 ITIL Incident Management Workflow - Process Guide
ITIL Incident Management Workflow - Process Guide
 
Business Continuity & Disaster Recovery
Business Continuity & Disaster RecoveryBusiness Continuity & Disaster Recovery
Business Continuity & Disaster Recovery
 
Incident Management PowerPoint Presentation Slides
Incident Management PowerPoint Presentation SlidesIncident Management PowerPoint Presentation Slides
Incident Management PowerPoint Presentation Slides
 
IT Service Level Agreement
IT Service Level AgreementIT Service Level Agreement
IT Service Level Agreement
 
Incident management
Incident managementIncident management
Incident management
 
Incident Management Framework
Incident Management FrameworkIncident Management Framework
Incident Management Framework
 
Incident and Problem management simplified
Incident and Problem management simplifiedIncident and Problem management simplified
Incident and Problem management simplified
 
Problem Management
Problem ManagementProblem Management
Problem Management
 
Problem Management
Problem ManagementProblem Management
Problem Management
 
Problem Management Overview
Problem Management OverviewProblem Management Overview
Problem Management Overview
 
ITSM and Service Catalog Overview
ITSM and Service Catalog OverviewITSM and Service Catalog Overview
ITSM and Service Catalog Overview
 
June2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem MgmtJune2007 Implementing Itil Problem Mgmt
June2007 Implementing Itil Problem Mgmt
 
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
ITIL Service Management: Integrating Normal Incident, Major Incident & Servic...
 
Business Continuity Planning
Business Continuity PlanningBusiness Continuity Planning
Business Continuity Planning
 
Managed Services Support
Managed Services SupportManaged Services Support
Managed Services Support
 
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdfITIL-v3-Incident-Management-Process-PPT-RED.pdf
ITIL-v3-Incident-Management-Process-PPT-RED.pdf
 
IT-Centric Disaster Recovery & Business Continuity
IT-Centric Disaster Recovery & Business ContinuityIT-Centric Disaster Recovery & Business Continuity
IT-Centric Disaster Recovery & Business Continuity
 

Similar to Andrew Vermes: Major Incident Management

Gap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramGap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramTriumvirate Environmental
 
Domains and data analytics
Domains and data analyticsDomains and data analytics
Domains and data analytics
Pratik Shukla
 
5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation
Anna Sadokhina
 
Problem management foundation - Lifecycle
Problem management foundation - Lifecycle Problem management foundation - Lifecycle
Problem management foundation - Lifecycle
Ronald Bartels
 
Process Mining and AI for Continuous Process Improvement
Process Mining and AI for Continuous Process ImprovementProcess Mining and AI for Continuous Process Improvement
Process Mining and AI for Continuous Process Improvement
Marlon Dumas
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
Marlon Dumas
 
Process Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teamsProcess Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teams
Marlon Dumas
 
Effective CAPA Implementation in a Management System - Praneet Surti
Effective CAPA Implementation in a Management System - Praneet SurtiEffective CAPA Implementation in a Management System - Praneet Surti
Effective CAPA Implementation in a Management System - Praneet Surti
Praneet Surti
 
ITlecture1.ppt
ITlecture1.pptITlecture1.ppt
ITlecture1.ppt
name954606
 
Drupalcamp Scotland - Usability testing in an agile development process
Drupalcamp Scotland - Usability testing in an agile development processDrupalcamp Scotland - Usability testing in an agile development process
Drupalcamp Scotland - Usability testing in an agile development process
Neil Allison
 
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Prescriptive Process Monitoring for Cost-Aware Cycle Time ReductionPrescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Marlon Dumas
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
Tasktop
 
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24
 
Jack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security MetricsJack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security Metrics
centralohioissa
 
Information Security Metrics - Practical Security Metrics
Information Security Metrics - Practical Security MetricsInformation Security Metrics - Practical Security Metrics
Information Security Metrics - Practical Security Metrics
Jack Nichelson
 
Three primary steps in maintenance reliability engineering
Three primary steps in maintenance reliability engineeringThree primary steps in maintenance reliability engineering
Three primary steps in maintenance reliability engineeringJim Taylor, ASQ-CRE, CPE, CPMM
 
Root Cause Analysis
Root Cause Analysis Root Cause Analysis
Root Cause Analysis
Grafic.guru
 
Using data science to automate event correlation - June 2016 - Dan Turchin - ...
Using data science to automate event correlation - June 2016 - Dan Turchin - ...Using data science to automate event correlation - June 2016 - Dan Turchin - ...
Using data science to automate event correlation - June 2016 - Dan Turchin - ...
PeopleReign, Inc.
 
APRA_Contact Reports_2016_Turner_Hrubik_IJM
APRA_Contact Reports_2016_Turner_Hrubik_IJMAPRA_Contact Reports_2016_Turner_Hrubik_IJM
APRA_Contact Reports_2016_Turner_Hrubik_IJMThomas Turner
 
Backups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for NonprofitsBackups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for Nonprofits
Community IT Innovators
 

Similar to Andrew Vermes: Major Incident Management (20)

Gap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S ProgramGap Analysis & Improvement Tactics for Your EH&S Program
Gap Analysis & Improvement Tactics for Your EH&S Program
 
Domains and data analytics
Domains and data analyticsDomains and data analytics
Domains and data analytics
 
5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation5 forces incident problem mgmt-presentation
5 forces incident problem mgmt-presentation
 
Problem management foundation - Lifecycle
Problem management foundation - Lifecycle Problem management foundation - Lifecycle
Problem management foundation - Lifecycle
 
Process Mining and AI for Continuous Process Improvement
Process Mining and AI for Continuous Process ImprovementProcess Mining and AI for Continuous Process Improvement
Process Mining and AI for Continuous Process Improvement
 
Process Mining and Predictive Process Monitoring
Process Mining and Predictive Process MonitoringProcess Mining and Predictive Process Monitoring
Process Mining and Predictive Process Monitoring
 
Process Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teamsProcess Mining in Action: Self-service data science for business teams
Process Mining in Action: Self-service data science for business teams
 
Effective CAPA Implementation in a Management System - Praneet Surti
Effective CAPA Implementation in a Management System - Praneet SurtiEffective CAPA Implementation in a Management System - Praneet Surti
Effective CAPA Implementation in a Management System - Praneet Surti
 
ITlecture1.ppt
ITlecture1.pptITlecture1.ppt
ITlecture1.ppt
 
Drupalcamp Scotland - Usability testing in an agile development process
Drupalcamp Scotland - Usability testing in an agile development processDrupalcamp Scotland - Usability testing in an agile development process
Drupalcamp Scotland - Usability testing in an agile development process
 
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Prescriptive Process Monitoring for Cost-Aware Cycle Time ReductionPrescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
Prescriptive Process Monitoring for Cost-Aware Cycle Time Reduction
 
Doing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating AnalyticsDoing Analytics Right - Designing and Automating Analytics
Doing Analytics Right - Designing and Automating Analytics
 
Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...Outpost24 webinar - The economics of penetration testing in the new threat la...
Outpost24 webinar - The economics of penetration testing in the new threat la...
 
Jack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security MetricsJack Nichelson - Information Security Metrics - Practical Security Metrics
Jack Nichelson - Information Security Metrics - Practical Security Metrics
 
Information Security Metrics - Practical Security Metrics
Information Security Metrics - Practical Security MetricsInformation Security Metrics - Practical Security Metrics
Information Security Metrics - Practical Security Metrics
 
Three primary steps in maintenance reliability engineering
Three primary steps in maintenance reliability engineeringThree primary steps in maintenance reliability engineering
Three primary steps in maintenance reliability engineering
 
Root Cause Analysis
Root Cause Analysis Root Cause Analysis
Root Cause Analysis
 
Using data science to automate event correlation - June 2016 - Dan Turchin - ...
Using data science to automate event correlation - June 2016 - Dan Turchin - ...Using data science to automate event correlation - June 2016 - Dan Turchin - ...
Using data science to automate event correlation - June 2016 - Dan Turchin - ...
 
APRA_Contact Reports_2016_Turner_Hrubik_IJM
APRA_Contact Reports_2016_Turner_Hrubik_IJMAPRA_Contact Reports_2016_Turner_Hrubik_IJM
APRA_Contact Reports_2016_Turner_Hrubik_IJM
 
Backups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for NonprofitsBackups and Disaster Recovery for Nonprofits
Backups and Disaster Recovery for Nonprofits
 

More from itSMF UK

Nicola Reeves and John McDermott: Value Creation in a Hybrid World
Nicola Reeves and John McDermott: Value Creation in a Hybrid WorldNicola Reeves and John McDermott: Value Creation in a Hybrid World
Nicola Reeves and John McDermott: Value Creation in a Hybrid World
itSMF UK
 
Gary Gamp: The 21st Century Service Manager
Gary Gamp: The 21st Century Service ManagerGary Gamp: The 21st Century Service Manager
Gary Gamp: The 21st Century Service Manager
itSMF UK
 
Martin Huddleston: No Service Management, No Security
Martin Huddleston: No Service Management, No SecurityMartin Huddleston: No Service Management, No Security
Martin Huddleston: No Service Management, No Security
itSMF UK
 
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a TimeRebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
itSMF UK
 
Chris Bryan: Continuous Service Improvement in a SIAM Environment
Chris Bryan: Continuous Service Improvement in a SIAM EnvironmentChris Bryan: Continuous Service Improvement in a SIAM Environment
Chris Bryan: Continuous Service Improvement in a SIAM Environment
itSMF UK
 
Johann Diaz: The New Management of Service – Joining Up the Enterprise
Johann Diaz: The New Management of Service – Joining Up the EnterpriseJohann Diaz: The New Management of Service – Joining Up the Enterprise
Johann Diaz: The New Management of Service – Joining Up the Enterprise
itSMF UK
 
David D'Agostino and Tony Price: Kicking the KPI Habit
David D'Agostino and Tony Price: Kicking the KPI HabitDavid D'Agostino and Tony Price: Kicking the KPI Habit
David D'Agostino and Tony Price: Kicking the KPI Habit
itSMF UK
 
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't TransformationPeter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
itSMF UK
 
Simone Jo Moore: Machine Humanity
Simone Jo Moore: Machine HumanitySimone Jo Moore: Machine Humanity
Simone Jo Moore: Machine Humanity
itSMF UK
 
Hayley Butler and Spenser Arnold: Agile Service Management
Hayley Butler and Spenser Arnold: Agile Service ManagementHayley Butler and Spenser Arnold: Agile Service Management
Hayley Butler and Spenser Arnold: Agile Service Management
itSMF UK
 
Network Rail: Intelligent Infrastructure
Network Rail: Intelligent InfrastructureNetwork Rail: Intelligent Infrastructure
Network Rail: Intelligent Infrastructure
itSMF UK
 
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so FarClare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
itSMF UK
 
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised StandardLynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
itSMF UK
 
Owen Appleton: FitSM
Owen Appleton: FitSMOwen Appleton: FitSM
Owen Appleton: FitSM
itSMF UK
 
Dave Wheable: Can We Manage the Future
Dave Wheable: Can We Manage the FutureDave Wheable: Can We Manage the Future
Dave Wheable: Can We Manage the Future
itSMF UK
 
Stuart Howitt: Honey, I Shrunk the Incident
Stuart Howitt: Honey, I Shrunk the IncidentStuart Howitt: Honey, I Shrunk the Incident
Stuart Howitt: Honey, I Shrunk the Incident
itSMF UK
 
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
itSMF UK
 
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service DeskSanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
itSMF UK
 
Alice Doyne: Service Design Meets Service
Alice Doyne: Service Design Meets ServiceAlice Doyne: Service Design Meets Service
Alice Doyne: Service Design Meets Service
itSMF UK
 
Jon Terry: Respect for People Lean's Neglected Pillar
Jon Terry: Respect for People Lean's Neglected PillarJon Terry: Respect for People Lean's Neglected Pillar
Jon Terry: Respect for People Lean's Neglected Pillar
itSMF UK
 

More from itSMF UK (20)

Nicola Reeves and John McDermott: Value Creation in a Hybrid World
Nicola Reeves and John McDermott: Value Creation in a Hybrid WorldNicola Reeves and John McDermott: Value Creation in a Hybrid World
Nicola Reeves and John McDermott: Value Creation in a Hybrid World
 
Gary Gamp: The 21st Century Service Manager
Gary Gamp: The 21st Century Service ManagerGary Gamp: The 21st Century Service Manager
Gary Gamp: The 21st Century Service Manager
 
Martin Huddleston: No Service Management, No Security
Martin Huddleston: No Service Management, No SecurityMartin Huddleston: No Service Management, No Security
Martin Huddleston: No Service Management, No Security
 
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a TimeRebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
Rebecca Ulyatt: People Power – Crack the Code, One Conversation at a Time
 
Chris Bryan: Continuous Service Improvement in a SIAM Environment
Chris Bryan: Continuous Service Improvement in a SIAM EnvironmentChris Bryan: Continuous Service Improvement in a SIAM Environment
Chris Bryan: Continuous Service Improvement in a SIAM Environment
 
Johann Diaz: The New Management of Service – Joining Up the Enterprise
Johann Diaz: The New Management of Service – Joining Up the EnterpriseJohann Diaz: The New Management of Service – Joining Up the Enterprise
Johann Diaz: The New Management of Service – Joining Up the Enterprise
 
David D'Agostino and Tony Price: Kicking the KPI Habit
David D'Agostino and Tony Price: Kicking the KPI HabitDavid D'Agostino and Tony Price: Kicking the KPI Habit
David D'Agostino and Tony Price: Kicking the KPI Habit
 
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't TransformationPeter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
Peter Hubbard: Don't Get Stuck in a Silo – Going Digital isn't Transformation
 
Simone Jo Moore: Machine Humanity
Simone Jo Moore: Machine HumanitySimone Jo Moore: Machine Humanity
Simone Jo Moore: Machine Humanity
 
Hayley Butler and Spenser Arnold: Agile Service Management
Hayley Butler and Spenser Arnold: Agile Service ManagementHayley Butler and Spenser Arnold: Agile Service Management
Hayley Butler and Spenser Arnold: Agile Service Management
 
Network Rail: Intelligent Infrastructure
Network Rail: Intelligent InfrastructureNetwork Rail: Intelligent Infrastructure
Network Rail: Intelligent Infrastructure
 
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so FarClare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
Clare McAleese: Verism at Vocalink Mastercard... Our Journey so Far
 
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised StandardLynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
Lynda Cooper: ISO/IEC 20000 - The Launch of the Revised Standard
 
Owen Appleton: FitSM
Owen Appleton: FitSMOwen Appleton: FitSM
Owen Appleton: FitSM
 
Dave Wheable: Can We Manage the Future
Dave Wheable: Can We Manage the FutureDave Wheable: Can We Manage the Future
Dave Wheable: Can We Manage the Future
 
Stuart Howitt: Honey, I Shrunk the Incident
Stuart Howitt: Honey, I Shrunk the IncidentStuart Howitt: Honey, I Shrunk the Incident
Stuart Howitt: Honey, I Shrunk the Incident
 
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
Akshay Anand: The Future is Built on ITIL – Get Ready for ITIL 4
 
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service DeskSanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
Sanjeev NC: 5 Game Techniques to Immediately Apply in Your Service Desk
 
Alice Doyne: Service Design Meets Service
Alice Doyne: Service Design Meets ServiceAlice Doyne: Service Design Meets Service
Alice Doyne: Service Design Meets Service
 
Jon Terry: Respect for People Lean's Neglected Pillar
Jon Terry: Respect for People Lean's Neglected PillarJon Terry: Respect for People Lean's Neglected Pillar
Jon Terry: Respect for People Lean's Neglected Pillar
 

Recently uploaded

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

Andrew Vermes: Major Incident Management

  • 2. Time for some fun! • Why are major incidents so hard? • Straightforward ways to share information • A live, interactive simulation
  • 3. Caught in headlights • It’s easy to panic • Calm atmosphere helps
  • 4. Keep questions helpful When will it be fixed? Is everything under control? Who’s responsible for this? Did you change anything? Why did we let this happen?
  • 5. Too much vital information is hidden Some more Emails Magic Happens Case Closed Case Opened Some Emails & Calls Sometimes the magic is “Spoke to Mike & Steve” Most case documentation does not include the actual root cause of a problem, therefore knowledge reuse is minimised
  • 6. Processes must be second nature Emergency services depend on: •Clear processes •Checklists •Repeated training Limited time to think: needs responses to be fast and automatic
  • 7. Four things need effective control: • What’s happening? • How will we restore service? • What risks are there? • What investigation is needed? Major Incidents Understanding InvestigatingRestoring Preventing
  • 8. Separate the workstreams Four key questions: What’s happening? What will we do about it? What risks are there? How do we find the cause? The goal is always- effective action DECISION ANALYSIS To select the right fix or workaround POTENTIAL PROBLEM ANALYSIS To manage risksSITUATION APPRAISAL To Sort Out Priority Actions PROBLEM ANALYSIS To Find True Cause
  • 9. Keep your working information visible • Separate into 4 areas
  • 10. Keep information updated • Ensure your dashboard is updated in real time
  • 11. Major Incident Simulation Please go to: http://mindstormrobotics.com/ You will find information about the incident there SITUATION AUGUST 30 Tom Lewis, production manager at Stockholm Brick Company, has contacted you because one of their brick sorting robots, the TX72-6, has a problem. You repaired the conveyor belt yesterday, but now the sorting system has broken down again. The breakdown costs the company roughly 1,000 dollars per minute. Tom requests your assistance to solve the problem immediately.
  • 12. Teamwork is key In teams: • Review your information • Complete the Major Incident Dashboard • Decide what to do next • Take the necessary actions and update the dashboard
  • 13. Updating your dashboard • Keep information visible to your team
  • 14. Check point • How much has your team spent so far? • What progress has been made in incident recovery? • Which moves did you make that added value?
  • 15. Wrap up Start with a clear incident statement (what process, what symptom) Look early for similar but unaffected CIs for comparison The solution is…
  • 16. First 10 minutes KT INCIDENT DASHBOARD time: August 30, 11:15 Incident Summary: Customer Issues, Priorities, Impact - What When Where Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation stepDue Done Who Bricksorting robot TX72-6 problems Sorting system broken down Conveyor repaired yesterday Impact to client= $1,000/minute Goals fix must meet Possible fixes Best Fit? Due Who Risks and Opportunities list Preventive Actions Contingent Actions Due Done Who Restore fast Certainty of fix Avoid causing other incidents Bricksorting robot TX72-6 problems today Incident Overview & Action Plan Problem Investigation - Finding Cause Decisions to be made Risk Management
  • 17. After 30 minutes KT INCIDENT DASHBOARD time: August 30 Incident Summary: Customer Issues, Priorities, Impact - What When Where Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation step Due Done Who Bricksorting robot TX72-6 problems The chute assembly presses long against the touch sensor Application update yesterday Sorting system broken down IS NOT badly Sorting bricks when initially moving right Touchsensor sw update Conveyor repaired yesterday Happens during reset Database slow to respond Impact to client= $1,000/minute Started August 30 Goals fix must meet Possible fixes Best Fit? Due Who Risks and Opportunities list Preventive Actions Contingent Actions Due Done Who Restore fast Reload database Certainty of fix Replace touch sensor Avoid causing other incidents Repair conveyor (again) Replace control unit Replace Motor B Brick sorting robot sorting incorrectly; possible causes determined. Incident Overview & Action Plan Problem Investigation - Finding Cause Decisions to be made Risk Management
  • 18. After 60 minutesKT INCIDENT DASHBOARD time: August 30 Incident Summary: Customer Issues, Priorities, Impact - What When Where Actions Needed Due Done Who Diagnostic Data we have Possible Causes Next Investigation step Due Done Who Bricksorting robot TX72-6 problems The chute assembly presses long against the touch sensor Application update yesterday Check log files for evidence Sorting system broken down IS NOT badly Sorting bricks when initially moving right Touchsensor sw update Check log files for evidence Conveyor repaired yesterday Happens during reset Database slow to respond Contact DB team Impact to client= $1,000/minute Started August 30 Goals fix must meet Possible fixes Best Fit? Due Who Risks and Opportunities list Preventive Actions Contingent Actions Due Done Who Restore fast Reload database Replacing Touch sensor Review work instructions Contact MR support Certainty of fix Replace touch sensor X Check part upfront for DOAAvoid causing other incidents Repair conveyor (again) temp Replace control unit Replace Motor B Brick sorting robot TX72-6 now sorting correctly Incident Overview & Action Plan Problem Investigation - Finding Cause Decisions to be made Risk Management
  • 19. Takeaways for Major Incident Managers (and for everyone): 1. Avoid holding people on calls for many hours 2. Keep all information visible in real time 3. Require specific information from participants 4. Run regular simulations to keep skills high 5. Set clear update times and stick to them
  • 20. Leaders in Problem Solving twitter.com @KepnerTregoe facebook.com/KepnerTregoe linkedin.com/company/kepner-tregoe Andrew Vermes Senior Consultant +44 (0) 7973506628 avermes@kepner-tregoe.com

Editor's Notes

  1. KT discovered early on that effective problem solvers focused on one thing at a time: Understanding what’s happening, the impact is a necessary prelude to the other four colours; in some cases we can go directly to a solution; in others we need to step back and understand the risk before we take action. If there are choices about the fix, we need to consider the goals and environment in a quick Decision Analysis. Sometimes, it’s unwise to move forward if we have no idea about the cause: we might make things worse- so Problem Analys needs to be done. This sessions is about speeding that path to root cause, so now a challenge for you.