Improve your it disaster recovery plan, and your ability to recover from disaster


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Improve your it disaster recovery plan, and your ability to recover from disaster

  1. 1. Print Document This research note is restricted to the personal use of Aristotle Castro ( Improve Your IT Disaster Recovery Plan, and Your Ability to Recover From Disaster 4 June 2012 | ID:G00234709 Kevin Knox Many organizations have inconsistent IT disaster recovery plans that vary in quality, scope and detail. We help disaster recovery and business continuity planners improve their IT disaster recovery plans, and their ability to recover from disaster, by outlining best practices for key problems. Overview Explore related content: "SMB Context: Improve Your IT Disaster Recovery Plan, and Your Ability to Recover From Disaster." (17 September 2012) Key Challenges Minor discrepancies, omissions and oversights in an organizations disaster recovery plan can have a major impact on the time required to recover from a disaster and the associated business impact. While most organizations claim to have some form of IT disaster recovery plan in place, there are wide-ranging differences in quality, scope and detail level from one plan to another. Respondents to the 2011 Gartner Risk Management Disciplines Survey were asked which types of disasters their organizations planned for. IT outage was ranked highest among the 13 categories, with 66% of respondents stating that they plan for IT outages. Recommendations Organizations should focus their disaster recovery plans specifically on the recovery of IT services, and should clearly define the intended use and scope of the plan as a critical first step. Two to three senior executives in the organization should be authorized to make a disaster declaration, and only after specific criteria have been met to qualify the event as a disaster. Organizations should include the details of ongoing recovery operations and failback processes and procedures as highlighted sections in the disaster recovery plan. Analysis IT organizations spend considerable time and money developing and managing IT disaster recovery plans they hope will reduce downtime and minimize the business impact when a disaster arises. Although most large organizations claim to have some form of IT disaster1 of 7 9/23/12 4:09 PM
  2. 2. Print Document recovery plan in place — based on the numerous plan reviews Gartner performs each year — there are significant differences in quality, scope and detail level from one plan to another. Disaster recovery plans should be specific enough to address the individual recovery requirements, technologies and processes of an organization. Although no two plans are exactly alike, there are certain issues all organizations should consider and missteps to avoid when developing their plans. Having a focused, detailed and well-organized disaster recovery plan can mean the difference between smooth recovery operations and chaos during a disaster. This research looks at common mistakes organizations make within their IT disaster recovery plans, and provides recommendations for improvement. Define the Scope of the Plan A common mistake organizations make when developing disaster recovery plans is not limiting their scope exclusively to the recovery of IT services. For example, some organizations include general business continuity requirements, which typically fall outside the purview of IT. Despite IT service recovery being a key part of overall business continuity, each department should have its own plan, coordinated at a high level, but managed and owned separately. Organizations should focus disaster recovery plans specifically on the recovery of IT services, and should clearly define the intended use and scope of the plan as a critical first step. This includes developing a concise statement about whats included and whats not, who the intended audience is and how the document should be used. The scope also should identify the specific locations, businesses, companies and functions covered by the recovery plan. Note: Business continuity management (BCM) ensures business resilience before, during and after an operational disruption. BCM includes supplier management, crisis management, emergency management, IT disaster recovery management (IT DRM), business recovery, contingency planning and preparedness. Identify Key Terminology Most disaster recovery plans reviewed by Gartner fail to include a formal glossary of key terminology and language. Because most recovery plans must address a wide variety of individuals with varying levels of knowledge from multiple internal and external organizations, an advanced understanding of language or terminology cannot be assumed. A well-defined and easily accessible glossary of key terms and phrases should be included in all disaster recovery plans. Establishing early in the recovery document a common language and terminology — including industry-specific terms, recovery terminology, commonly used acronyms, location and facility names, and abbreviations — helps minimize misinterpretations and potential mistakes. Make the Plan Easy to Use Although it may seem a basic point, one constant with good disaster recovery plans is that they are well-organized, easily navigated and easy to use. Organizations often structure their recovery plans as novels instead of reference documents. Disaster recovery plans are rarely read from front to back, and are most likely to be used during a crisis, not as leisure reading beforehand. To improve effectiveness and ease of use, organizations should separate their disaster recovery plans into multiple, stand-alone sections or subdocuments. For example, a recovery planning section covers items such as methodologies, management and program goals, while a recovery operations section focuses on recovery processes and procedures. Target each section to the specific audience or individual role, and format and organize the2 of 7 9/23/12 4:09 PM
  3. 3. Print Document plan for the targeted user and by content (see Table 1). Table 1. Recovery Planning and Recovery Operation: Document Differences Item Recovery Planning Recovery Operations Target IT leaders IT operations Formatting Paragraphs and sections Bulleted lists Order Varied Sequential Writing Detailed Straightforward and concise Indexed Not important Highly important Knowledge assumption High Low Source: Gartner (June 2012) Reference Roles, Not Individuals Names Having an accurate and up-to-date recovery plan is critical for success. Unfortunately, it is not uncommon for recovery plans to be out of date. Organizations typically do not update their plans frequently enough to keep pace with the rate of personnel changes associated with the individuals who are assigned recovery responsibilities. This opens the door for tasks to be assigned to people who are no longer in the required role, have left the company or have changed their contact information. Avoid the use of individuals names and contact information in the recovery document, and use roles and job titles instead. References to roles and job titles can be indexed against an appendix of individual names and contact information. This way, only the appendix needs to be updated on a regular basis, and can be achieved automatically via standard HR reports. Address Ongoing Recovery and Failback, as Well as Failover Most disaster recovery plans Gartner reviews focus almost exclusively on failover processes and procedures. These plans usually fail to include adequate levels of detail, if any details are addressed at all, on what should happen in operations after a disaster failover occurs, or on re-establishing production operations via failback. Ongoing recovery operations and failback procedures are almost as important as failover, and should be covered in detail in all disaster recovery plans. Organizations should ensure that disaster postmortem processes are established to understand the root cause of the disaster and how it impacted IT, and to assess recovery performance. Consider the Types of Disasters to Plan For What types of disasters should organizations planned for? Two common approaches to answering this question are: One size fits all — where all types of disaster scenarios are treated the same Individual subplans to address a wide array of potential disaster scenarios While there is no right answer, many recovery plans we review are overly general or too comprehensive and complex. Organizations should plan for disaster scenarios based on their ability to manage and3 of 7 9/23/12 4:09 PM
  4. 4. Print Document benefit from including the various scenarios. Scenarios based on criteria such as notification time (e.g., a tornado warning is in effect starting tomorrow at 12 noon), type of disaster and potential business impact should be established only if material differences exist in the way the type of disaster is managed. Organizations should avoid planning for disasters that are highly unlikely to occur (e.g., a blizzard in the Caribbean). Figure 1 shows 2011 Gartner Risk Management Disciplines Survey respondents answers to the question, "What disaster scenarios does your organization plan for in its business continuity management efforts?" Figure 1. Common Disasters Organizations Plan for in BCM Efforts N = 159 Source: Gartner (June 2012) Maintain Version and Configuration Control Maintaining consistency between production and recovery environments remains one of the biggest disaster recovery testing and exercising challenges organizations face. While configuration and asset management tools can help, few organizations use them or other tools as part of ongoing disaster recovery plan updates. Establish formal processes via the use of management tools and libraries, or manually, to4 of 7 9/23/12 4:09 PM
  5. 5. Print Document ensure that all hardware and software references in a disaster recovery plan are up to date, and represent actual production and recovery configurations. Specific version and patch-level details should be included for all hardware, software and OSs, and these should be updated on a regular basis. For example, it is insufficient to state Windows 2000 in the recovery plan for a server running Windows 2000 Advanced Server Service Pack 4. Codify What Constitutes a Disaster Defining what qualifies as a disaster and how it is declared are key considerations not covered by most recovery plans in adequate detail or focus. Yet, this is especially important, given the cost and potential level of disruption associated with declaring a disaster. Organizations must ensure that processes and safeguards are established and documented within the disaster recovery plan to protect against mistaken declarations. Two to three senior executives should be authorized to declare a disaster, and this should occur only after specific criteria have been met to qualify the event as a disaster. Similar processes and criteria should be established to declare the end of a disaster, and to initiate failback procedures. Include Testing in the Disaster Recovery Plan Disaster recovery testing is challenging and expensive, but is a critical component of disaster recovery preparedness. Given the time and money spent on disaster recovery testing, it is surprising we dont see it called out more regularly or covered in enough detail within disaster recovery plans. Testing should be a highlighted section of all disaster recovery plans, and should include specific details, such as when it is scheduled throughout the year, what types of tests are planned, which applications or business functions will be tested, and what testing processes and procedures should be followed. Besides physical recovery testing, organizations should establish a regular "paper test" schedule of when major reviews and walk-throughs of the recovery plan occur (see "Best Practices for Planning and Managing Disaster Recovery Testing"). Consider the Communication Infrastructure The communication infrastructure is a top recovery priority for many organizations. However, since it is not necessarily seen as an application or a business service, it is not always called out or prioritized appropriately within disaster recovery plans. The communication infrastructure should be considered a high-priority recovery function, and treated similarly to other mission-critical business services. This is especially important when business continuity functions such as an emergency response system might depend on the availability of the communication infrastructure for operation. Even for execution of the recovery plan, primary and alternative communication methods should be established and documented. Recommended Reading Some documents may not be available as part of your current Gartner subscription. "Best Practices for Planning and Managing Disaster Recovery Testing" "Ten Best Practices for Creating and Maintaining Effective Business Continuity Management Plans" "Define, Develop and Verify Plans for Application Availability and Recoverability" "Recent IT Outages Beg the Question: Whos Minding the Data?"5 of 7 9/23/12 4:09 PM
  6. 6. Print Document "New Evaluation Criteria and Provider Capabilities Are Changing Disaster Recovery Sourcing" Evidence This research is the result of over 40 disaster recovery document reviews and analyses, as well as direct discussions with Gartner clients regarding the creation and management of disaster recovery documents and plans. © 2012 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartners research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.6 of 7 9/23/12 4:09 PM
  7. 7. Print Document of 7 9/23/12 4:09 PM