What To Do When It All Goes So Wrong

  • 781 views
Uploaded on

As IT Professionals we inevitably will see situations where everything goes wrong. At times we are somewhat lucky and this just means diminished functionality or a slow system. Other times our …

As IT Professionals we inevitably will see situations where everything goes wrong. At times we are somewhat lucky and this just means diminished functionality or a slow system. Other times our organization is temporarily out of business. Regardless of the scope of the issue, how we react can have a direct impact on how quickly things are returned to normal. This session will cover how to communicate issues, including what to say, who to say it to and when to say it. Part of managing communication is to get everyone into a room, forcing them to talk, so time will be spent on designing an effective war room. The session will also cover how by setting out to prove that an issue is ours we are able to more quickly get at a root cause.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
781
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. What To Do When It All Goes So Wrong
    David Levy
    AdventuresInSql.com
    SQL Saturday #67 Chicago
  • 2. More than 11 years in IT
    SQL Server DBA for over 3 years
    Previous Life as Developer
    Blogger
    http://adventuresinsql.com
    Syndicated on SQLServerCentral.com
    Syndicated on SQLServerPedia.com
    @dave_levy on Twitter
    About Me
  • 3. Peak Time of Peak Sales Day
    Typical Hourly Sales $100K/HR
    Order Entry Screen is Locked Up
    Users report Slowness Initially
    Now the “Sales Center” Application is Just “Clocking”
    EMERGENCY!
  • 4. Let Everyone Know There is a Problem
    Prevent Duplicated Efforts
    Allows Others to Speak Up
    Recent Changes
    Related Issues
    Communicate
    http://www.freedigitalphotos.net/images/view_photog.php?photogid=1983
  • 5. Send Up a Flare
    Send to an IT Only Distribution Group
    Keep the Subject Line General
    Provide Broad Overview Including:
    Systems Impacted
    Major Symptoms Including Error Messages
    Number of People Impacted
    Any Location Specific Information
    Communicate
  • 6. What Resources Do You need?
    Subject Matter Experts
    Specialized Equipment
    Communicate
  • 7. Never Assign Blame
    Only State Facts
    Communicate
  • 8. To: IT Emergencies
    Subject: Sales Center Issues
    Sales Center Users are reporting that the Order Entry screen has quit responding. We are currently investigating the issue with the Sales Center Development Team. We will provide updates as we know more.
    Communicate
  • 9.
  • 10. What Are the Symptoms?
    What Locations are Involved?
    Collect
  • 11. What Systems are Involved?
    SQL Server
    AS400
    Mainframe
    Web Farm
    Major Network Components like Load Balancers
    Collect
  • 12. What Has Changed?
    Look at Change Control Calendar
    Talk to Primary On-Calls for Related Systems
    Collect
  • 13. Anything in the Logs?
    Windows Logs
    Application Specific Logs
    Custom Exception Handling Systems
    Collect
  • 14. What are Performance Indicators Showing?
    Perfmon
    SQL Wait Stats
    Third-party tools
    Collect
  • 15. Analyze Collected Information
    Are There Any Obvious Signs of Trouble?
    Can the Problem be Linked to a Change?
    Can Any Patterns be Identified?
    Process
  • 16. Prove It Is Your Issue
    Shows Humility
    Shows Respect for Everyone Else’s Time
    Avoid Appearing Arrogant
    Process
  • 17. Prove It Is Your Issue
    Construct Tests to Prove Theories in Order of Likelihood Until Problem Proven or Theories Exhausted
    Faster than arguing about what it is not
    How can you know it is not your issue?
    Process
  • 18. List Potential Actions
    Rank by effort, confidence, level of risk
    Develop action plans for best options and re-rank
    Each potential action should have a rollback plan
    Process
  • 19. Define Measures
    What will indicate things have gotten better?
    Adding this index will reduce Disk IO by 10 million reads per second
    The execution time of query x will drop from 6 minutes to 50 milliseconds
    Process
  • 20. Define Measures
    What will indicate things have gotten worse?
    Disk IO may go up
    The execution time of query x may go up
    Adding this index may slow inserts from the order upload process
    Process
  • 21. Communicate Your Intentions
    Make the Change
    Follow a written plan
    Make a single change
    A single person should make the change
    Document any additional steps taken
    Start Over by Collecting More Data
    Respond
  • 22. Signs You Need to Convene A War Room
    Having Trouble Finding Anything Wrong
    30 Minutes Without Progress
    An Issue Appears to Span Multiple Systems
    Having Difficulty Getting People Engaged
    The War Room
  • 23. Get Everyone in a Room
    No Changes Made Outside the Room
    No Heroes
    Watch out for people doing a lot of typing
    Avoid changes that take more than a few minutes
    Have a Call in Number for Remote Coworkers
    The War Room
  • 24. Have a Technology Kit
    Old Switch
    Patch Cords
    Mice + Mouse Pads
    Power Strips
    The War Room
  • 25. Monitor Your Guest List
    1-2 Representatives From Each Team
    Try to Keep Management Out
    Watch for Disruptive People
    The War Room
  • 26. To: IT Emergencies
    Subject: Sales Center Issues
    We are convening a war room for the Sales Center issue. Everyone working on the issue please meet in the North Conference Room. Remote/WFH coworkers should dial into the conference bridge 888-888-1234, participant code:1234.
    Communicate
  • 27.
  • 28. White Board the Issue
    Every System Gets Own Column
    Write All Facts on White Board
    Closed Items Get Crossed Out Not Erased
    Include a Resolution for Each Closed Item
    The War Room
  • 29. Share the Floor
    Likely Issue Owner Has the Lead
    Make Sure Everyone is Heard
    Contributing Often Involves Staying Out of the Way
    Don’t Be Afraid to Fade Back and Run The Whiteboard
    The War Room
  • 30. Never Call “Not-It” and Leave
    Not Helpful
    You May be Wrong
    Appears Arrogant
    The War Room
  • 31. Keep an Eye On Time
    Provide Regular Updates to Management
    Bring in Food Around Meal Times
    Raises Spirits
    Brings in More People to Help
    The War Room
  • 32. To: IT Emergencies
    Subject: Sales Center Issues Update
    The Sales Center war room is still going. We are currently looking into a driver issue with IBM. All necessary resources have been engaged.
    Communicate
  • 33. Keep People in Reserve
    Each Team Should Divide up the Day
    Rotate People In and Out
    Send Someone Home Early to Come in Early
    The War Room
  • 34. Closing Out
    Communicate Resolution
    Capture Contents of Whiteboard
    Clean Up Room
    The War Room
  • 35. To: IT Emergencies
    Subject: Sales Center Issues Resolved
    The Sales Center issue has been resolved. The issue was caused by a patch that was applied over the weekend. Now that it has been backed out everything has returned to normal.
    Communicate
  • 36. ?
    Questions?