Office of the University Auditor
Old Dominion University
Glenn R. Wilson, IT Audit Manager
BUSINESS CONTINUITY – DISASTER RECOVERY
Planning and Methodology Guide
For Developing Departmental
Business Continuity – Recovery Programs
Oct 2005 Rev B
Plans are nothing… Planning is everything…
- Dwight D. Eisenhower
TABLE OF CONTENTS
I. Introduction 3
II. Overview: Business Continuity – Recovery Program 3-4
III. Business Analysis 5-6
IV. Business Analysis Activities 6
V. Plan Initiation Activities 6-7
VI. Scope of Plan 8
VII. Planning Assumptions 9
VIII. Scaling Levels of Disaster 10
IX. Collecting Data and Information 10-11
X. Planning Objectives 11
XI. Setting Realistic Goals 11
XII. Determining Critical Needs 12
XIII. Priority of Processing 13
XIV. Basic Items of Focus 13
XV. Know The Cost Of Downtime 13
XVI. How Much Risk Is Acceptable? 14
XVII. Events to Consider 14
XVIII. Information Technology Considerations, Classification of Data 15
XIX. Records Retention 16-17
XX. Insurance Considerations 17
XXI. Strategy and Plan Development 18
XXII. Response Teams 19
XXIII. List of Deliverables 20
XXIV. The Plan Document 20-22
XXV. Document Distribution 22
XXVI. Maintenance and Support 22-23
XXVII.Testing and Training 23
XXVIII.Critical Success Factors 24
XXIX. Best Practices 24
XXX. Stepping Through The Creation Of The Plan 24-28
XXXI. Summary and Conclusion 28
A Business Continuity Glossary 24-42
B Online Sources of Additional Information 42
Page 2 of 42
The goal of business continuity and disaster recovery is to mitigate financial, operational, and business
impacts to a business unit and to ensure its survivability under various scenarios. It assures that core
processes will either be continued or effectively and efficiently restored in accordance with the business
mission, which directly assures the overall success of the organization.
Owing largely to increased reliance on information technologies (IT), Contingency Management has
been supplanted by the concept of Business Continuity, which focuses on the resiliency of people,
processes, workspace, systems, safety, communication and new planning scenarios — loss of life, lack
of decision makers, interruption of transportation, building evacuation, loss of physical assets and
workspace, lack of communications, crisis command centers, terrorism, bio-terrorism etc.
Traditionally, data centers or offices of computing services alone have borne the responsibility for
providing contingency planning. Frequently, this has led to the development of recovery plans to restore
computer resources in a manner that is not fully responsive to the needs of the University and/or its
customers supported by those resources. Contingency planning is a business issue rather than strictly
and IT issue. Long-term operations outages often result in impacts of catastrophic proportions. The
development of a viable continuity and recovery strategy for the University must be a product of the
collective planning of not only the University’s data center, communications and operations centers, but
also the users and customers of those services who directly support the success of the University, and
management personnel who determine acceptable levels of risk and bear responsibility for protecting the
This guide is meant to facilitate the planning, development and maintenance of a comprehensive
Business Continuity – Disaster Recovery Plan (BCRP) at the departmental level. The collection of these,
anchored to OCCS’ own IT asset focused DRP program will serve to ensure the success of the
University. When used herein, “Agency” refers to the individual College, Office or Department, “Unit”
refers to the various areas comprising the University and “University” refers to Old Dominion
Note that Old Dominion, as an Agency, is now responsible for reporting to the Governors office.
Participation in this BCRP program is essential to the University's success.
II Overview: Business Continuity – Recovery Program
For most agencies, services to their customers would effectively cease if the core processes supported by
key systems, resources and assets were inaccessible or unavailable for an unacceptable period of time.
Each agency should establish risk management and disaster recovery planning processes for identifying,
assessing, and responding to the risks associated with loss of ability to execute its core processes. The
University’s requirements for continuity and recovery planning should be addressed through University
wide efforts to develop and maintain a Business Continuity Recovery Program.
The Business Continuity Recovery Plan is one of the main deliverables of this program. Other
constituents of a comprehensive program include the Risk Assessment Analysis (RAA), Business
Impact Analysis (BIA), Testing, Training and Support.
Page 3 of 42
Risk: Assessment And Analysis
Risk may be defined as the potential for loss; man-made or natural. Potential is measured in probability
of occurrence, which may vary with activities, geographic region, locale, world events or seasonal
change. The Risk Assessment Analysis considers a range of possible disasters and applies a risk factor
to each type. Each area is analyzed to determine the degree of risk associated with the various types of
disasters, such as natural, technical, human and societal threats. Based on the core functions and
processes necessary to continue business operations, assets are identified, categorized as being either
critical, essential or administrative in nature and prioritized in order of their contributory importance to
their respective functions and processes. Existing controls that serve to reduce the exposure of these are
delineated. The cost for each control, the level of risk and the degree to which they effectively mitigate
against known risks are analyzed, resulting in recommendations for new measures/controls or updates to
Planning steps for the risk assessment/analysis should include:
• Evaluation of exposures and existing controls, identification of potential improvements and
creation of new procedures or measures to mitigate against exposure thereby reducing residual
risk to an acceptable level.
Evaluation of mission critical business applications.
• Systems, business applications, datasets, networks, user processes, dependencies between
functions, departments, systems and applications and reliance on relationships with other
agencies or external services.
A proper Risk Assessment Analysis will contain a projected timeline for implementing recommended measures
or list the constraints and additional resources that would need to be acquired to accomplish such.
For continuity and recovery planning purposes an asset is defined as anything of material value or
usefulness. This definition may be refined to “anything of material value or usefulness that is required
for the major functional and business activities of the University”, including staff, equipment, facilities,
IT resources, office furniture etc. The Risk Assessment is an analysis that carefully considers the degree
of value or usefulness of a given asset and categorizes it as Critical, Essential or Administrative. Critical
assets have the greatest potential business impact, and measures against their loss are implemented
commensurate with the nature and degree of their exposure to loss or damage.
Critical Asset: Definition
An asset is critical to the degree that a core process depends upon it for its function and output. These
may include people, computers, datasets, paper documents, specialized equipment, or other resources
which are not commodity items. They may or may not be readily replaceable, but it is certain that in
their absence the process would not continue and the University’s [not necessarily the University’s]
ability to fulfill its business mission would cease or be significantly degraded. The critical classification
may be extended to include those items which do not affect day to day operations and yet are of critical
importance due to the impact of their loss, which may include a damaged reputation resulting in
decreased revenue, as might be the case with a very service oriented mission.
Page 4 of 42
Practically speaking, an asset is something a department cannot do without.
The Business Impact Assessment
The BIA delineates the business, operational and financial impacts and other consequences of potential
losses and discontinuities. These most often include revenue, productivity, public image/trust,
regulatory compliance, contractual obligations and represent the adverse fulfillment of the risks
previously identified. It identifies the operational (qualitative) and financial (quantitative) impact of a
disrupted or inaccessible core process on an agency’s ability to conduct its critical business processes.
This analysis forms the basis for the formulation of viable continuity and recovery strategies that will be
activated when necessary to restore operations within the required time frames.
Where appropriate and warranted a business impact analysis, operational impact analysis, and financial
impact analysis are developed for each core business process. The actual procedures to restore the core
process in part or in whole are written into the deliverable of the Business Continuity and Recovery
Plan. The procedures, when activated should reasonably assure achievement of recovery objectives.
Practically speaking, a B.I.A. is the process of determining how damaged operations would be over time
if the assets were not available.
In summation, the Business Continuity Recovery Plan is the actual set of detailed procedures that
would be followed or deferred to under sudden or impending conditions that threaten or damage the
ability of the University to carry on one or more of its core functions. It includes details about core
processes, inputs, outputs, information flow, personnel roles and responsibilities, contingency
arrangements, emergency response, communication and coordination. The Risk Assessment and
Business Impact Analyses would be attached as supporting documents to the plan.
III The Business Analysis
The Business Analysis identifies and describes critical, essential and administrative core processes, and
the high-level resources that support these functions. This analysis enables us to confirm the managers’
description of their operations and highlight functional dependencies and single points of failure. The
analysis may be in the form of a separate attachment or included within the body of the Business
Continuity and Recovery Plan, but it is essential to the overall success of the plan.
Most agencies are structured along functional boundaries (accounting, information technology) and the
core processes within those units (payroll, accounting). In reality, however, an agency’s business is
conducted through one or more business processes. A business process describes a set of recurring
activities- a flow of information and/or materials-that produce an output-something of value for the end
user or customer. A process usually contains multiple functions. The most straight forward approach is
to analyze the University in terms of its core processes. It is critical to understand the relationships
between those core processes and the end user or customer’s level of expectations in order to analyze the
impact of an interruption of a given function.
Page 5 of 42
For each core process, define the Maximum Acceptable Outage (MAO)1; the point at which resource(s)
and functions must be restored. Describe the qualitative and quantitative impact for an outage of each
core function assigned MAO. Decide whether that level of impact is acceptable or if the MAO needs to
be adjusted. The MAO is based on the University’s mission, not whether there are current resources and/
or procedures to achieve it. If there is a gap between the MAO and projected outcome, a gap analysis
with a plan and timeline to close it should be presented.
Core business processes should be divided into one of three categories:
• Supportive or Administrative
Refer to Section XIII Priority of Processing for detailed definitions of the above categories.
IV Business Analysis Activities
The high level approach to the Business Analysis consists of gathering information about core
processes, documenting business flows, identifying customers, and gaining confirmation of the
information. The first step of the analysis is to identify the core business processes performed by the
University, and understand the high level flow of information, materials, and services through these core
processes. The specific approach to understanding these core processes and business flows is:
1. Review relevant documentation (critical success factors, strategic plans, budgets, performance,
measurements, IT plans, division goals, organizational charts) to build an understanding of
organization purpose and structure.
2. Conduct interviews with agency leadership members to collect information on their “first hand”
perspectives on how the University operates. It is important to note that these interviews will
serve as data-gathering opportunities for all steps of the Business Impact Analysis.
3. Compile the results of your interviews in the form of business flows. These flows should
describe each core process and the flow of information, services, or goods into and out of the
process and include considerations for the end user or customer.
4. Develop descriptions of support functions. Some functions within the University may perform
important roles, which contribute indirectly to the University’s ability to implement its assigned
programs. These are classified as support or administrative functions.
5. Confirm understanding of the University, its core processes, and its business flows with
appropriate management through review of the descriptions of the core processes performed.
6. Document in a step wise manner or flowchart the Business Priorities and Drivers => Business
Processes => Applications and Infrastructure = >Operations and Management
V Plan Initiation Activities
Firstly, create a steering committee whose members represent key owners and managers of the critical
and essential operations of the University. Identify the individual(s) who will be responsible for the
Other Risk Analysis literature may identify this as the Maximum Allowable Downtime or MAD/
Page 6 of 42
development and implementation of the University’s Business Continuity Recovery Plan, the Project
Team. The authority to make high-level decisions on behalf of the University should be defined and
bestowed upon these persons. It is preferable to designate one or more of the University’s top-level
managers for this responsibility. Hold a kickoff meeting to present to all staff the goal and importance
of participating in the creation of the plan.
Why Business Continuity Recovery Planning Is Important:
• Minimizing potential economic loss
• Decreasing potential exposures
• Reducing the probability of occurrence
• Reducing disruptions to operations
• Ensuring agency stability and survivability
• Providing for an orderly recovery
• Minimizing insurance premiums
• Reducing reliance on certain key individuals
• Protecting the assets of the University
• Ensuring the safety of personnel and visitors
• Minimizing decision-making during a disastrous event
• Minimizing both legal and regulatory liability
The Steering Committee should work closely with the Project Team to:
• Review operations
• Identify business functions within the University
• Document the level of information, goods and services that relate to the business functions
• Determine the users served by each business function
• Identify legal and regulatory requirements
• Review special and unusual requirements
• Identify the maximum acceptable outage period
• Determine the consequences of not processing
• Determine critical equipment requirements
• Document dependencies
• Analyze work flow
• Evaluate the security of vital records and data
• Analyze record retention policies and procedures
• Determine the recovery timeframe required for each process
• Identify maximum acceptable lengths of service interruption
• Assess and document current file backup and recovery capabilities
• Determine and document calculated assumptions
Agencies must also complete the following operational activities to ensure a comprehensive Business
Continuity Program exists:
Page 7 of 42
• Completion of an emergency response plan, information technology risk survey, and a security
analysis with plans to close any identified gaps.
• Identify new or modified operating procedures to increase continuity.
• Review and modification of data backup and off-site storage procedures.
• New or modified restoration procedures.
• Development of alternate procedures for use during a disaster.
• Negotiating and implementing contracts and other provisions as needed.
• The development of alternate facilities and equipment.
• Developing step by step recovery scripts, which guide an employee through the procedures
necessary to recover a given service, resource or system.
• Standards, forms, and guidelines for standard procurement procedures, available from the
University’s procurement group or state procurement office.
• Procedures to recreate or recapture information that may be lost during disaster (records, recent
transactions, work in progress)
• Detailed team definition and procedures including responsibilities and time line oriented task
• Organizational information (Organizational charts, job descriptions etc).
VI Scope of Plan
Although most continuity and disaster recovery plans are weighted towards information technology, a
comprehensive plan will also include areas of operation outside data processing. The plan should have a
broad scope if it is to effectively address the many events and scenarios that could affect the University.
A “worst case scenario” should be the bottom line basis for developing the plan. The worst-case
scenario is the near or total destruction of the main or primary facility. When this is used as a baseline
premise, less critical and detrimental situations can be handled by using only the needed portions of the
plan, with only minor (if any) modifications required.
The scope will be determined by the University’s business drivers and priorities, its size and degree of
infrastructure, and the level of detail addressed by the procedures developed for business and asset
continuity and recovery. These plans should be clearly and concisely written but with sufficient detail,
to accurately implement them with little additional guidance to personnel in an emergency situation.
Core policies and procedures found in Business Continuity and Recovery Plans include:
1. Business continuity policies
2. Emergency response procedures
3. Emergency evacuation procedures
4. Damage impact assessment procedures
5. Disaster declaration and escalation procedures
6. Command center activation procedures
7. Personnel notification procedures
8. Resumption of normal operations
9. Physical and security assessments
Other more agency specific policies and procedures may include:
Page 8 of 42
2. Media management
3. Employee crisis management
4. Vendor communications management
5. Client communications management
6. Salvage operations procedures
7. Travel and lodging coordination
8. Recovery expense control and reporting
9. Plan exercise project management
10. Plan maintenance management
VII Planning Assumptions
Every viable disaster recovery includes assumptions. The assumptions limit the circumstances that the
plan addresses and provides a foundation to support its procedures. Limits may be imposed upon the
magnitude of disaster. Premises may be stated declaring known dependencies and expected levels of
services provided by others, such as reliance on the Office Of Computing Services to restore network
availability within an acceptable time frame.
Raising and exploring the following questions helps to identify assumptions.
• What equipment/facilities have been destroyed?
• What is the timing of the disruption?
• What records, files and materials were protected from destruction?
• What resources are available following the disaster:
o Hot site/alternate site?
Typical planning assumptions included in Business Continuity Recovery Plans include:
• The main facility of the organization has been destroyed.
• Staff is available to perform critical functions defined within the plan.
• Staff can be notified and can report to the backup site(s) to perform critical processing, recovery
and restoration activities.
• Off-site storage facilities and materials survive the event.
• The disaster recovery plan is maintained with regard to training, testing and updating.
• Subsets of the overall plan can be used to recover from minor interruptions
• An alternate facility is available or can be secured as necessary.
• An adequate supply of critical forms and supplies are stored off-site, either at an alternate facility
or off-site storage, or are readily available from an external source.
• A backup site is available for processing the organization’s work.
• Cell phones, pagers, email and other auxiliary forms of communication will be available.
• Surface transportation in the local area is possible.
Page 9 of 42
• Vendors will perform according to their general commitments to support the organization in a
This list of assumptions is not all-inclusive, but is intended as a thought provoking process in the
beginning stage of planning. The assumptions themselves dictate the actual plan’s procedures and hence
should be carefully reviewed by the Steering Committee for relevance and validity.
VIII Scaling Levels of Disaster
It is sometimes advantageous to define levels of disaster for given scenarios, for which a standard set of
response procedures can be written. The sub-procedures can then be referenced or called from within
other top-level procedures using the designated level of disaster.
Level 0- No Interruption in operations.
Level 1- Operations can be resumed within eight hours.
Level 2- Operations can be resumed within 8-48 hours. Users may
need to implement manual or alternate processing procedures.
Level 3- Operations cannot be restored for over 48 hours. All functions
and personnel to be moved to an alternate site(s). Users need to
implement manual processing.
Alternate disaster scale:
Level 0- The disaster can be handled by the personnel of the organization alone.
Level 1- The disaster will require some outside intervention for recovery such as police, fire, or
other professional services.
Level 2- The disaster will require assistance from multiple external organizations.
IX Collecting Data and Information
Collecting accurate data is vital to developing a successful plan. This point cannot be overstated.
Effective data collection involves the use of questionnaires and conduction of interviews with key
personnel and managers as well as review of existing policies, operating manuals and procedures.
Preprinted forms and questionnaires are particularly useful and efficient. Comprehensive data collection
should include the following:
• Asset and Equipment Inventory
• Personnel Roster
• Notification Checklist
• Master Communications List
• Vendor and External Agency Contact List
• Computer Hardware Inventory
Page 10 of 42
• Computer Software Inventory
• Documentation Inventory
• Forms Inventory
• Office Supply Inventory
• Insurance Policies Inventory
• Office Equipment Inventory
• Records and Data Inventory
• Offsite Storage Inventory
• Telecommunications Inventory
• Important Telephone Numbers
• Business Process Questionnaires
• Service Level Agreement Questionnaires
• Regulatory Compliance Questionnaires
• Security Questionnaires
X Planning Objectives
The primary objective of Continuity Recovery planning is to enable the University to at a minimum
survive a disaster but preferably continue normal business operations. In order to survive, the University
must assure that critical operations can resume/continue normal processing before the Maximum
Acceptable Outage is reached. [Reference Appendix A Glossary of Terms] The plan must establish clear
lines of authority and prioritize work efforts. The key objectives of the Continuity and Recovery plan are
• Ensure the safety and well being of personnel.
• Continue core business processes and functions.
• Minimize the duration of disruption to operations.
• Minimize immediate damage and losses.
• Establish chain of command and responsibility.
• Facilitate effective coordination and communication.
• Assure restoration of normal operations
Statistically, the probability of a major disaster is low, but the consequences of an occurrence may be
catastrophic, both in terms of operational impact and agency reputation Management should assign on-
going responsibility for continuity recovery planning to personnel with top-level responsibility or those
possessing intimate knowledge of a process or service.
XI Setting Realistic Goals
The goal of any Business Continuity and Recovery Plan is implicit from its title; Continuity and
Recovery. For Old Dominion, a realistic goal is to maintain the "integrity of a semester", meaning that if
some sort of disaster occurs the University should avoid stopping a semester, or declaring it a loss.
The objective of Continuity is little or no disruption to a process. Recovery may be appropriately
divided into two objectives. The Recovery Time Objective (RTO) is the period of time within which
systems, applications, or functions must be recovered after disruption to ensure the viability of business
operations and are often used as the basis for the development of recovery strategies, and as a
Page 11 of 42
determinant as to whether or not to implement the recovery strategies during a disaster situation. The
Maximum Allowable Downtime or Maximum Acceptable Outage (MAO) is the elapsed time interval
between disruption’s commencement and the achievement of the RTO. The Recovery Point Objective
(RPO) is the point in time at which systems and data most be recovered after a disruption in order to
ensure the viability of business operations. This includes restoration of manually processed hard copies
and application datasets. RPO's are often used as the basis for the development of backup strategies, and
as a determinant of the amount of data that may need to be recreated after the systems or functions have
been recovered pursuant to preserving the viability of business operations.
XII Determining Critical Needs
To determine the critical needs of the University, each unit should document all the functions performed
within their area. An analysis over a period of two weeks to one month can indicate the principle
functions performed inside and outside the University, and assist in identifying the necessary data
requirements for the University to conduct its daily operations.
Some useful diagnostic type questions that may be raised towards this end are:
• How long could the department function without the certain existing equipment and normal
• What, if any, state reporting requirements for education must be met (medically related education
may experience this issue) for integrity?
• What are the high priority tasks including critical manual functions and processes in the department?
• How often are these tasks performed, e.g., daily, weekly, monthly, or on a semester boundary, etc.?
• What staffing, equipment, forms and supplies would be necessary to perform the high priority tasks?
• How would the critical equipment, forms and supplies be replaced in a disaster situation?
• Does any of the above information require long lead times for replacement?
• What reference manuals and operating procedure manuals are used in the department?
How would these be replaced in the event of a disaster?
• Should any forms, supplies, equipment, procedure manuals or reference manuals from the
department be stored in an off-site location?
• Identify the storage and security of original documents. How would this information be replaced in
the event of a disaster? Should any of this information be in a more protected location?
• What are the current microcomputer backup procedures? Have the backups been restored? Should
any critical backup copies be stored off-site?
• What would the temporary operating procedures be in the event of a disaster?
• How would other agencies be affected by an interruption in the department?
• How might the University be affected by disruption within others during the same event?
• What effect would a disaster at the data center have on the University?
• What outside services/vendors are relied on for normal operation?
• Would a disaster in the University jeopardize any legal requirements for reporting?
• Are job descriptions available and current for the department?
• Are department personnel cross-trained in accordance with the plan?
• Who will be responsible for maintaining the University’s plan?
• Are there other concerns related to planning for Continuity and Recovery?
Page 12 of 42
• What obstacles must be overcome to develop a viable working plan?
Critical needs can be ascertained by first using questionnaires, then correlating the consistency and
range of the responses.
XIII Priority of Processing
Once the critical needs have been documented, priorities within the University’s units should be
established for the overall recovery of the University. In the University setting, the activities of each unit
may be prioritized as:
o A disruption of this function exceeding one day would seriously impair the operation of
the University. The impact is direct and immediate upon the University and its customers.
o A disruption of this function exceeding one week would seriously impair the operation of
the University and would likely damage the integrity of a "semester".
o The function is related to internal administration but its extended absence would not
seriously impair the operation of the University.
XIV Basic Items of Focus
Regardless of the University’s level of criticality to the University or the complexity of its operations,
there are common basic items their plan should accommodate.
• Crisis management plan — ensuring the safety of employees, continuity of decision making, and
view from outside world. Includes personnel communication tree and facilities diagrams and
• Asset list and external dependency contact information
• Secure, data storage and retrieval both on and off site.
• Prioritization of resources for most critical business processes based on the
business impact analysis.
• Detailed process flow descriptions and procedures to recover them under all scenarios considered
in accordance with the plan’s scope, assumptions and premises.
• Work-at-home programs or alternate plan for workspace recovery
• Contingency planning — mitigating the risks of external events
• Calling trees - lists of people who call other people and keep communications flowing.
XV Know The Cost Of Downtime
The true cost of downtime is the summary of expenditures and losses that result from a given adverse
event. The total often exceeds that which can be immediately calculated. The monetary cost is a factor
of lost productivity and revenue and expenses of recovery. Other costs which expose themselves during
Page 13 of 42
and after events include decreased financial performance, damaged reputation, loss of personnel,
penalties for noncompliance etc. All of these must be taken into account for a valid analysis of cost
effective mitigation measures.
The cost of downtime is a rather complex issue that deserves a considerable portion of the planning
stage allocated to it. It is the basis for establishing acceptable levels of risk. Depending on the
University’s activities, it may not be sufficient to merely consider the endpoint summation of the costs
of downtime for the basis of accepted risk. While an accounts receivable department will eventually
reach its objective to post and deposit all outstanding payments, their ability to meet this recovery point
may exceed the amount of time that the business’ cash reserves will suffice, leading to secondary crisis
and potential losses. Additional costs incurred by other agencies or customers dependent upon your
services should also be considered for a thorough downtime cost analysis.
XVI How Much Risk Is Acceptable?
The theoretical goal of the BCRP is ‘zero’ downtime and ‘instantaneous’ recovery. Due to limitations of
time, resources and funding these are not practically achievable. There is always residual risk that will
lead to outages of quantifiable duration. Officers and upper management generally bear the
responsibility of determining acceptable levels of residual risk.
The decision is rather complex and approaches a non-linear system of probabilities of occurrence and
outcome. Events of interruption are often not isolated and the number of permutations is great. Under
the threat of hurricane will the power also be interrupted?; Will the facility’s roof be breached?; If so
will important paper documents or computer systems be damaged?; Will evacuation be necessary?; How
many employees will decide to leave early to be with family? A resilient plan takes into account all
possibilities and probabilities of success and failure of any preceding assumptions and preventive
measures. Decisions are based on outcome probabilities and the corresponding cost of downtime.
Risks may be enumerated in order of probability and their potential impact.
XVII Events To Consider
The potential threats to any given agency are so numerous, that any viable plan must consider only the
most probable or most catastrophic events that can still be reasonably mitigated. Hence, a nuclear attack
or major east coast earthquake can be eliminated. The following is a comprehensive list of both natural
and man-made disasters that threaten the University and can reasonably and effectively be planned for.
• Flooding, Tornadoes, Sever Thunderstorm, Hurricane, Winter Storms.
Man-Made or resulting from the above
• Fires with and without equipment loss and/or facility damage, loss of vital documents, data software,
loss of power, loss of communications, loss of environment controls (heat, A/C), internal/external
vandalism or theft, loss of equipment, loss of personnel, breech of physical security, loss of facilities
access, sabotage, bomb or terrorist threat or similar event, cyber event such as virus or hacker,
construction including exposure such as asbestos.
Page 14 of 42
The following events have been excluded on the basis of extremely low probability of occurrence or are
beyond the scope of the authority of individual agencies.
• Chemical or nearby industrial accident, civil disobedience, employee strike, hostage or terrorist
situation, war and military conflicts including nuclear attacks, plane or aircraft crashes, earthquakes.
Refer to Appendix A, for detailed information about these events, from the perspective of Business
Continuity and Recovery.
XVIII Information Technology Considerations
Information technology dependencies exist extensively though out the University due to its distributed
computing environment. Distributed computing involves multiple computers, remotely located in
relation, each with a role in computation problems or information processing. In the typical transaction
using a tiered model, user interface processing is done at the desktop computer at the user's location,
business processing is done in a remote computer, and database access and processing is done in another
computer that provides centralized access for many business processes. The process depends upon the
ability of each tier to fulfill its role.
Agency process dependencies are often described in Business Continuity and Recovery literature as
vertical and IT dependencies as horizontally corresponding to them.
IT SERVICE DELIVERY
Priorities and MEDIUM
Business HIGH Departmental Applications
The bulk of the responsibility for managing and maintaining the campus distributed computing
environment is borne by OCCS. Through extensive commitment of resources and aggressive planning,
OCCS can continue or rapidly recover data and computing services due to most events including
extensive damage to the data center facilities. This high degree of certainty and availability should
Page 15 of 42
definitely be one of the assumptions in the University’s plan. However, residual risk is never zero, even
for OCCS and therefore some plan or procedures should be in place in the event that OCCS services are
not available or partly available such as loss of email functionality or limited banner access. This action
will surely augment the resiliency of the University.
Information Technology statistics are helpful when setting priorities for allocating resources and
developing preventive strategies.
Leading Causes of Computer Downtime- Contingency Planning Research, Livingston, NJ
Power Outage …………………………31%
Fire and Bombing………………………9%
Leading Causes of Data Loss- Ontrack Data Recovery Inc.
Hardware or System Malfunction…………. 44%
Human Error………………………………… 32%
Software Application Malfunction…………. 4%
Natural Disaster……………………………. 3%
In considering your on and off site data backup strategies it may be advantageous to classify data as a
function of the Recovery Time Objective for the process or function(s) it supports.
Data Classification2 Recovery Time Objective (RTO)
Product or Service Related Data
Required to support the core products and services of the Usually less than 24 hours.
business unit e.g. Order Entry, Inventory, Shipping.
Business Support Data
Required to run the business Usually less than 48 hours.
e.g. Financials, Payables, Data Warehouse
Required to support the business Usually 72 hours or longer.
e.g. Fixed asset accounting,
XIX Records Retention
Note that the University has established policy on Data Classification, number 3512.
Page 16 of 42
An organized systematic approach to records management is an important part of a comprehensive
disaster recovery plan. In fact, the state legislature has regulated records retention and the
commonwealth librarian has created a set of schedules for records retention. Effective records
management can free up administrative resources and work to optimize processes dependent upon them
before the occurrence of an event.
• Reduced storage costs.
• Expedited customer service.
• Efficient process flow
• Federal and state regulatory compliance.
Records are not only retained as proof of transactions, but also to verify compliance with legal and
regulatory requirements. These records are used for independent examination and verification of
business practices. Federal and State requirements for records retention must be analyzed by each
agency individually before a retention and management policy can be created. In addition, the
University should employ specific handling and salvage techniques and procedures for its employed
Commonly used media types include:
• Compact Disc
XX Insurance Considerations
Adequate insurance coverage is a key consideration when developing a business recovery plan and
performing a risk analysis. Having a disaster plan and testing it regularly may in itself, lower insurance
rates. Good planning does reduce risks and address many concerns of the underwriter, in addition to
affecting the cost or availability of the insurance.
To assist in planning for emergency funding, most insurance agencies specializing in business
interruption coverage can provide the organization with an estimate of anticipated business interruption
costs. Many organizations that have experienced a disaster indicate that their costs were significantly
higher than expected in sustaining temporary operations during recovery.
To provide adequate proof of loss to an insurance company, the University should well document its
assets. Asset inventories become extremely important as the adjustment process takes place. Photos of
specialized or expensive equipment can prove valuable in expediting reimbursement for losses.
Types of insurance coverage to be considered may include: computer hardware replacement, extra
expense coverage, business interruption coverage, valuable paper and records coverage, errors and
omissions coverage, fidelity coverage, media transportation coverage.
Page 17 of 42
With a good handle on the costs of downtime, costs to replace assets and increased operating costs
incurred during recovery, management can make reasonable decisions as to the type and amount of
insurance to carry and to what extent the University should self-insure against certain losses.
[University Note: It is recommended that a copy of or pertinent portions of your plan be submitted to the
appropriate Office of the Vice President for review of insurance coverage and strategies]
XXI Strategy and Plan Development
Methodology used to develop Business Continuity and Recovery Plans should emphasize the following:
• Defining recovery requirements from the perspective of business functions.
• Documenting the impact of an extended loss to operations and key business functions.
• Focusing appropriately on disaster prevention and impact minimization, as well as orderly
• Selecting project teams that ensure the proper balance required for plan development.
• Designing a continuity plan that is understandable, easy to use and easy to maintain.
• Develop methods to integrate continuity and recovery planning into ongoing business planning
and system development life cycles to sustain plan viability over time.
The successful and cost effective completion of such a project requires the close cooperation of
management from all areas ranging from strictly IT areas to dedicated business areas supported by
information systems. The kickoff meeting should include the department leadership members and those
who have first hand knowledge of core processes. Describe the project’s goals and its importance to the
ongoing continuity of the University. Answer any questions and clearly define the roles and
responsibilities of each member. Obtain commitment from appropriate management to support and
participate in the effort. Senior personnel from all units must be very much involved throughout the
project for the planning process to be successful.
Cooperation and awareness… Your keys to success!
The objectives are to identify alternatives for specific continuity requirements, evaluate those
alternatives, and recommend a business continuity and recovery strategy for management’s approval.
The strategy development builds upon the MAO’s identified for each core process in the BIA by
defining the specific resources necessary for the performance of that process, and setting a
recommended strategy for the recovery of those resources in an outage. These strategies are
documented and compiled into a comprehensive plan for the University. This is a critical decision-
making step in the development of a Business Continuity Program, because the general strategy
provides the specific guidelines by which the program will be implemented.
The plan development builds upon the strategies selected for each agencies core business processes.
The following four phases must be specifically addressed within the plan:
Page 18 of 42
• Response- the reactions to an incident or emergency in order to assess the level of containment
and control required activities.
• Resumption-the process of planning for and/or implementing the recovery of critical business
operations immediately following an interruption or disaster.
• Recovery-the process of planning for and/or implementing recovery of less time sensitive
business operations and processes after critical business functions have resumed.
• Restoration-the process of planning for and/or implementing full scale business operations,
which allow the organization to return to a normal service level.
Selecting alternative strategies is a very basic activity when developing the plan.
The following should be considered:
• Alternate procedures for carrying out the process either to its output stage (completion) or to an
intermediate stage that may be easier to recover from or provide some customer service as
opposed to shutting down the function completely.
• Manual processing abilities and related costs
• Suspending the function for some period of times
• Mitigation of insurance (replace rather than try to salvage)
• Outsourcing and vendor services: temporary personnel agencies, cellular phone rental etc.
• Process redesign
• Single points of failure
• Ability to recreate information
• Data backup versus real time replication
• Business cycles, seasonal or otherwise.
• Work schedule modification to maximize resource use
• Internal resource capabilities
• The option to do nothing
XXII Response Teams
The structure of the contingency organization need not be the same as the existing organization chart.
The team approach is used in developing a plan as well as responding to an event. The purpose of
creating teams is to assign specific responsibilities for a smooth recovery. Each team must have the
authority to carry out the procedures contained in their section of the plan. Within each team, a leader
and alternate should be designated. These persons provide the necessary leadership and discretion in
carrying out responsibilities at the time of disaster. For smaller agencies a team may consist of one
individual and furthermore that individual may head or be a member of other teams concurrently.
The following is comprehensive list of potential teams. The scope and complexity of your plan will
determine the type of teams that should be created.
• Emergency Response Team
• Management Team
Page 19 of 42
• Damage Assessment Team
• Safety and Security Team
• Facilities Management Team
• Administrative Support Team
• Logistics Support Team
• User Support Team
• Department Recovery Team
• Computer Backup Team
• Off-Site Storage Team
• Software Recovery Team
• Communications Team
• Production Team
• Computer Restoration Team
• Human Relations Team
• Public Relations/Customer Service Team
• Business Recovery team
• Departmental Recovery team
Note: Various combinations of the above teams may be use, resulting in the creation of fewer teams
each with broader responsibilities. It is imperative that all team members both accept their role and
responsibilities and are fully qualified and trained to perform them.
XXIII List of Deliverables
The basic list of deliverables for a Business Continuity Recovery Program, include the following:
• Detailed Project Schedule
• Risk Assessment and Business Impact Analyses
• Data Collection Forms and Questionnaires
• General Recovery Strategies and Approach Report
• Recovery Team Structure, Roles and Responsibilities
• Business Continuity Plan (including all detailed recovery plans, policies and procedures)
• Test Exercises Schedules and Procedures
• Maintenance, Support and Training Policies and Procedures
XXIV The Plan Document
Use a common format in preparing the actual detailed procedures and documenting other information.
This will help assure consistency and conformity throughout the plan and facilitates ongoing
maintenance. Standardization is especially important if several people write the procedures.
• Basic Requirements
Page 20 of 42
The University name, address, and primary and secondary contact information for the Business
Continuity plan must be identified.
Revision status should be clearly stated on the cover page at minimum, but may also be displayed in
the footer of each page.
Supporting documentation that should be inserted into or attached to the master copies of the plan
document include logs of Review and Update, Personnel Training, Test Exercise Procedures and
Provide inventory lists of all assets necessary to support the University’s operations and those items
required to carry out the plan. This might not only include office equipment, computers and software
but also supplies such as paper forms.
• Listing of Potential Sections
-- Executive Summary
-- Purpose and Scope
-- General Assumptions
-- Statement of Compliance
-- Types of Events Considered
-- Business Process Analysis
-- Business Continuity Agency Structure
-- Recovery Strategies
-- Reporting and Communications Structure
-- Responsibilities of the Recovery Teams
-- Emergency Response Procedures
-- Damage Assessment and Restoration Procedures
-- Media Notification and Public Relations Control During a Disaster
-- Employee Notification, Information and Communication Systems
-- Recovery Team(s) Procedures
-- Team Assignments With Designated Backups
-- Procedures For Establishing a Command and Control Center.
-- Contingency Administrative Procedures
-- Offsite Storage and Retrieval Procedures
-- Emergency Funding and Accounting Procedures
-- Identification of Extra Expenses During a Disaster
-- Computer Network and Communications Configurations
-- Critical Applications and Data Sets
-- Application and Process Priorities
-- Technical Recovery Procedures
-- Restoration Procedures
-- External and Inter-Agency Dependencies
-- Inventories: personnel, storage, skills, teams, vendors, hardware, software, data-com, documents,
forms, equipment, office supplies, records, critical inventories.
-- Internal Communication Plan
-- External Party and Customer Notification
-- Employee Assistance
-- Implementation Procedures
Page 21 of 42
-- Disaster Declaration Policies
-- Special Security Procedures
-- Procedures and Policies For Plan Review and Update
-- Maintenance documents including revision logs, training logs, exercise plans and test results
An organized Business Continuity Recovery Plan can be followed step by step by all internal and external
personnel and result in achievement of the expected continuity and recovery objectives under a given scenario.
• Writing Methods
-- Procedures should be clearly written.
-- Be specific. Write the plan with the assumption that it will be implemented and carried out
by personnel completely unfamiliar with plan and the University’s operation details.
-- Use short, direct sentences, and keep them simple. Long sentences can be overwhelming.
-- Use topic sentences to start each paragraph.
-- Use short paragraphs. Long paragraphs can diminish the reader’s comprehension.
-- Present one idea at a time and in logical sequence.
-- Try to avoid technical jargon even if explained elsewhere in the document.
-- With the exception of the personnel contact list, use position titles (rather than names of
individuals) to reduce maintenance and revision requirements.
-- Avoid gender specific nouns and pronouns that may lead to unnecessary revision.
-- Develop uniformity in procedures to simplify the training process and minimize exceptions to
conditions and actions.
-- Identify actions that occur in parallel and those must occur sequentially.
-- Use descriptive verbs. Examples of descriptive verbs are:
Acquire, Count, Log, Activate, Create, Move, Advise, Declare, Pay,Answer, Deliver, Print, Assist,
Enter, Record, Back Up, Explain, Replace, Balance, File, Report, Compare, Inform, Review,
Compile, List, Store, Contact, Locate, Type
XXV Document Distribution
A master copy of the plan should be stored under control of the recovery coordinator. At least one off-
site backup copy either in print or readily available electronic format is recommended.
The often sensitive and proprietary details of the University’s operations contained within a plan
sometimes pose security issues, particularly where extensive controls and segregation of duties exist.
At a minimum, each copy in any format should be monitored and tracked using a distribution log, which
should be attached to all master copies. Personnel that do not play a top level key role in the plan may
be given an abbreviated plan which includes only that which supports their assigned role and
responsibilities. Abbreviated versions should be clearly designated as such and tracked along with all
other copies. Mark all plan copies as Sensitive and Confidential. A strict statement of use is appropriate
XXVI Maintenance and Support
Page 22 of 42
A plan is that is not regularly reviewed and updated will not remain viable through changes to people,
process and technology. A change management program with regard to review, education, dissemination
and testing must be implemented.
Major considerations in this process include:
University and Agency changes
Continuity and Recovery requirements
XVII Testing and Training
As a critical factor to its success, the plan should be tested and evaluated on a regular basis, at least
annually. Procedures to test the plan should be documented. Only testing will provide the University
with the assurance that the policies and procedures will work to achieve necessary objectives.
• Determining the feasibility and compatibility of backup facilities and procedures.
• Identifying areas in the plan that need of modification.
• Providing training to the team managers and team members.
• Demonstrating the ability of the University to meet is continuity recovery objectives.
• Providing motivation for maintaining and updating the Business Continuity Plan.
[Testing is also good training]
There are five (5) main types of BCRP testing.
• Structured walk-through
• Active simulation
• Full interruption
A Structured walk through is the best place to start testing a BCRP. A structured walk through is usually
done in a conference room by people who are familiar with the plan but did not actually write the plan.
Prepare written procedures for the structured walk-through test includes:
• Test scenario
• Description of event
• Test assumptions
• Test constraints
Page 23 of 42
• Time, day, month that the disaster was reported
• Method of discovery of the event
• Immediate damage assessment
• Specific forms and reports to be used from the plan
• Specific Teams involved and other participants
• Have a moderator not directly participating in the test log the event
• Document the results and findings in the Continuity Recovery Plan
XVIII Critical Success Factors
There are many, many points in producing a successful BCRP. Some examples are listed below.
• Know your risks and determine acceptable residual levels
• Validate all planning assumptions
• Consider a spectrum of likely and catastrophic events.
• Gain a detailed understanding of core processes
• Create detailed procedures and sub-procedures
• Effectively train and instill awareness in all personnel
• Match personnel qualifications closely with their team and role
• Test for compliance with Recovery Time and Point Objectives
XXIX Best Practices
1. Establish a service-level classification scheme for availability and business continuity and define
standard, repeatable development, infrastructure and operations architectures to meet them.
2. If comprehensive testing is not practical perform walk-through testing and ensure that external
dependencies are addressed.
3. Continuity Coordinator and key personnel review the business continuity program at least
annually or as required in response to changes within the University.
4. Business Continuity spans the entire agency, organization or department.
5. Business continuity planning is a continuous process within the organization.
6. Strategies are in place based upon the impact the loss of a business process would have on the
University, department or organization.
7. A program to exercise the plan exists and is implemented. Results are analyzed and reviewed for
compliance and to determine if plan modifications are required.
Page 24 of 42
8. A quality assurance program is in place. Appropriate triggers to update the plan are included in
the University’s change management policies and procedures.
9. Project teams and timelines are established to implement all recommended measures to mitigate
risks as defined within the Risk Assessment Analysis.
10. Do establish or arrange for off-site storage of backups and copies of critical files and data.
XXX Stepping Through The Creation Of The Plan
1. Obtain Top Management Commitment
Top management must support and be involved in the development of the disaster recovery planning
process. Management should be responsible for coordinating the plan and ensuring its continued
viability. Adequate time and resources must be committed to the development of an effective plan.
Resources should include both financial considerations and the effort of all personnel involved.
2. Establish a planning committee
A planning or steering committee should be appointed to oversee the development and implementation
of the plan. Representatives from all functional areas of the organization should be included. The
committee will define the scope and objectives of the plan.
3. Perform a risk assessment
The planning and steering committee should prepare a risk analysis and business impact analysis that
includes a range of possible events, including natural, technical and man-made threats. Each functional
area is to be analyzed to determine the consequences and impact of both likely [power failure] and
catastrophic [tornado] events. Evaluate the safety of critical documents and vital records. Fire poses one
of the greatest threats. Intentional human destruction or sabotage, however, should also be considered.
The plan must provide for the “worst case” situation: destruction of the main facilities. Impacts and
consequences resulting from loss of information and services should be addressed. Cost effective risk
mitigation planning is also the committee’s responsibility.
4. Establish priorities for core processes and functions of the University’s operation
The critical requirements of each area within the University should be carefully and thoroughly
• Functional operations
• Key personnel
• Information and data
• Processing systems
• Customer service
Page 25 of 42
• Vital records
• Policies and procedures
Processing and operations should be analyzed to determine the maximum amount of time that the
department and organization can operate without each critical system. Critical needs are defined as the
necessary procedures and equipment required to continue operations should an area, main facility, or
key resources or any combination of these be destroyed or become unavailable.
A method of determining the critical needs of a department is to document all the functions performed
by each area. Once the primary functions have been identified, the operations and processes should be
ranked in order of priority: Critical, Essential, or Administrative (supportive).
5. Determine Recovery Strategies
The most practical and cost effective alternatives for processing in case of a disaster need to be
researched and evaluated.
Alternatives, depend upon the evaluation of a given function, and may include:
• Relocation To Backup Site
(A “warm” site will already have suitable equipment and operating environment)
• Reciprocal agreements or vendor service level agreements
• Manual processing with specific follow up “return to normal” restoration procedures.
• Home or remote processing (Facility is inaccessible but the computer equipment is fully operational)
Written agreements with vendors or other agencies for the specific recovery alternatives selected should
be prepared. Be sure to consider:
• Cost of contingency arrangement
• Special security procedures
• Notification of systems changes
• Required hours of operation
• Specific hardware and other equipment required for processing
• Personnel requirements-possible temp staff to accelerate recovery
• Circumstances constituting an emergency
6. Perform Data Collection
Recommended data gathering materials and documentation includes:
• Backup position listing
• Critical telephone numbers (work, cell. home, pager)
• Communications Inventory including work and an alternate email address
• Distribution Log
• Records inventory
• Equipment inventory
Page 26 of 42
• Forms inventory
• Insurance policies in effect
• Computer hardware /software inventory
• Office equipment inventory
• Master call list/communication plan
• Master vendor and external agency contact list
• Notification checklist
• Office supply inventory
• Off-site storage location inventory
• Software and data files backup/retention schedules
• Temporary location specifications, potential or existing backup sites
It is advantageous to develop standardized forms to facilitate the data gathering process.
7. Organize and document a written plan
An outline is very useful to guide the development of the detailed procedures.
• Helps to organize the detailed procedures
• Identifies all major steps before the writing begins
• Identifies redundant procedures that only need to be written once and defines sub-procedures
The planning committee should review and approve the proposed plan.
The plan should be thoroughly developed, including all detailed procedures to be used before, during
and after a disaster. It may not be practical to develop detailed procedures until backup alternatives have
been defined. Procedures should include methods for maintaining and updating the plan to reflect any
significant internal, external or systems changes and as important, allow for a regular review of the plan
by key personnel within the organization.
The disaster recovery plan is best structured using a team approach. Specific roles and responsibilities
should be assigned to the appropriate team for each functional area of the company. General team
categories include administrative functions, facilities, logistics, user support, computer backup,
restoration and other important areas in the organization.
The Management Team is especially important because it coordinates and accomplishes the actual
continuity-recovery process. The Damage Assessment Team should first assess the disaster followed by
activation the recovery plan by the team or the Continuity Coordinator, and contact other team leaders.
The Management Team also documents the efforts and recovery processes during the event.
Management Team members should sit on the Planning Committee to assist in final decisions, setting
priorities, policies and procedures.
8. Develop testing criteria and procedures
It is essential that the plan be thoroughly tested and evaluated on a regular basis (at least annually).
Procedures to test the plan should be documented. The tests will provide the organization with the
Page 27 of 42
assurance that all necessary steps are included in the plan. Other reasons for testing include:
• Determining the feasibility and compatibility of backup facilities and alternate processing methods
• Identifying areas in the plan that require clarification or modification
• Providing training to all staff and personnel
• Demonstrating the ability of the University to meet the anticipated recovery objective in time and
• Providing motivation for maintaining and updating the Business Continuity Recovery Program
9. Test the Plan
After testing procedures have been completed, test the plan initially by conducting a structured walk-
through test. The test will provide additional information regarding any further steps that may need to be
included, changes in procedures that are not effective, and other appropriate adjustments. It is
recommended that initial testing of the plan should be done in sections, and during off peak business
hours to minimize disruptions to the overall operations of the University.
10. Approve the plan
Once the plan has been written and tested, it must be approved by all top level management. It is top
management’s ultimate responsibility that the University has a current, documented and tested plan.
Additional responsibilities include:
• Reviewing and approving the plan at least annually, and documenting such reviews in writing
• Ensure that the plan is compatible with those of the University and other University Agencies.
XXXI Summary and Conclusion
Continuity and recovery planning traditionally has information technology roots, but involves more than
off-site storage or backup processing. Agencies need to develop written, comprehensive continuity
recovery plans that address all the critical operations and functions of its business operations. The plan
should include documented and tested procedures, which, if followed, will either ensure the ongoing
availability of critical resources and continuity of operations or the efficient and timely recovery of such.
Since the probability of occurrence for any given event is highly uncertain, the plan is not dissimilar to
liability insurance; it represents and ongoing investment in return for a certain level of protection from
financial disaster. In fact, the plan is better protection, because insurance alone it may not compensate
for the incalculable loss of business during the interruption or the long-term losses due to damage of
Effective documentation and procedures are extremely important in a continuity recovery plan.
Considerable effort and time are necessary to develop a working plan. Barring sweeping agency
changes, a well-organized plan requires relatively little maintenance and with proper testing and training
provides the type of core stability that cannot be matched by external arrangements or contracts alone.
Page 28 of 42
APPENDIX A- Business Continuity Glossary
Provided by the Disaster Recovery Journal
ACTIVATION: The implementation of business continuity capabilities, procedures, activities, and
plans in response to an emergency or disaster declaration; the execution of the recovery plan.
ALERT: Notification that a potential disaster situation exists or has occurred; direction for recipient to
stand by for possible activation of disaster recovery plan.
ALTERNATE SITE: An alternate operating location to be used by business functions when the primary
facilities are inaccessible. 1) Another location, computer center or work area designated for recovery. 2)
Location, other than the main facility, that can be used to conduct business functions. 3) A location,
other than the normal facility, used to process data and/or conduct critical business functions in the event
of a disaster. SIMILAR TERMS: Alternate Processing Facility, Alternate Office Facility, Alternate
Communication Facility, Backup Location, Recovery Site.
ALTERNATE WORK AREA: Office recovery environment complete with necessary
office infrastructure (desk, telephone, workstation, and associated hardware, communications, etc.); also
referred to as Work Space or Alternative work site.
APPLICATION RECOVERY: The component of Disaster Recovery that deals specifically with the
restoration of business system software and data, after the processing platform has been restored or
replaced. SIMILAR TERMS: Business System Recovery.
BACKUP GENERATOR: An independent source of power, usually fueled by diesel or natural gas.
Page 29 of 42
BUSINESS CONTINUITY PLANNING (BCP): Process of developing advance arrangements and
procedures that enable an organization to respond to an event in such a manner that critical business
functions continue with planned levels of interruption or essential change. SIMILAR TERMS:
Contingency Planning, Disaster Recovery Planning.
BUSINESS CONTINUITY PROGRAM: An ongoing program supported and funded by executive staff
to ensure business continuity requirements are assessed, resources are allocated and, recovery and
continuity strategies and procedures are completed and tested.
BUSINESS CONTINUITY STEERING COMMITTEE: A committee of decision makers, business
owners, technology experts and continuity professionals, tasked with making strategic recovery and
continuity planning decisions for the organization.
BUSINESS IMPACT ANALYSIS (BIA): The process of analyzing all business functions and the
effect that a specific disaster may have upon them. 1) Determining the type or scope of difficulty caused
to an organization should a potential event identified by the risk analysis actually occur. The BIA should
quantify, where possible, the loss impact from both a business interruption (number of days) and a
financial standpoint. SIMILAR TERMS: Business Exposure Assessment, Risk Analysis
BUSINESS INTERRUPTION: Any event, whether anticipated (i.e., public service strike) or
unanticipated (i.e., blackout) which disrupts the normal course of business operations at an organization
BUSINESS INTERRUPTION COSTS: The costs or lost revenue associated with an interruption in
normal business operations.
BUSINESS INTERRUPTION INSURANCE: Insurance coverage for disaster related expenses that
may be incurred until operations are fully recovered after a disaster.
BUSINESS RECOVERY COORDINATOR: An individual or group designated to coordinate or
control designated recovery processes or testing. SIMILAR TERMS: Disaster Recovery Coordinator
BUSINESS RECOVERY TIMELINE: The chronological sequence of recovery activities, or critical
path, that must be followed to resume an acceptable level of operations following a business
interruption. This timeline may range from minutes to weeks, depending upon the recovery requirements
BUSINESS RESUMPTION PLANNING (BRP): The operations piece of business continuity planning.
1) A specific segment of the overall recovery process focusing on those items between the recovered
environment and the actual processing of business in recovery mode. SIMILAR TERMS: Business
Continuity Planning, Disaster Recovery Planning
BUSINESS RESUMPTION PLANNING: An all-encompassing "umbrella" term covering both disaster
recovery planning and business resumption planning. 1) Process of developing advance arrangements
and procedures that enable an organization to respond to an event that lasts for an unacceptable period of
time. The process typically addresses all activities from the event to performing its critical business
Page 30 of 42
functions after an interruption and may include steps indicating how to return home. 2) Frequently used
to refer to a business department recovery rather than technology elements. SIMILAR TERMS:
Disaster Recovery Planning, Business Resumption Planning
BUSINESS RECOVERY TEAM: A group of individuals responsible for maintaining the business
recovery procedures and coordinating the recovery of business functions and processes. SIMILAR
TERMS: Disaster Recovery Team
BUSINESS UNIT RECOVERY: The component of Disaster Recovery which deals specifically with
the relocation of a key function or department in the event of a disaster, including personnel, essential
records, equipment supplies, work space, communication facilities, work station computer processing
capability, fax, copy machines, mail services, etc. SIMILAR TERMS: Work Group Recovery.
CALL TREE: A document that graphically depicts the calling responsibilities and the calling order used
to contact management, employees, customers, vendors, and other key contacts in the event of an
emergency, disaster, or severe outage situation.
CERTIFIED BUSINESS CONTINUITY PROFESSIONAL (CBCP): The Disaster Recovery Institute
International (DRI International), a not-for-profit corporation, certifies CBCPs and promotes credibility
and professionalism in the business continuity industry. Also offers MBCP (Master Business Continuity
Professional) and ABCP (Associate Business Continuity Professional).
CHECKLIST EXERCISE: A method used to exercise a completed disaster recovery plan. This type of
exercise is used to determine if the information such as phone numbers, manuals, equipment, etc. in the
plan is accurate and current.
COLD SITE: An alternate facility that already has in place the environmental infrastructure required to
recover critical business functions or information systems, but does not have any pre-installed computer
hardware, telecommunications equipment, communication lines, etc. These must be provisioned at time
of disaster. SIMILAR TERMS: Shell Site; Backup Site; Recovery Site; Alternate Site
COMMUNICATIONS RECOVERY: The component of Disaster Recovery which deals with the
restoration or rerouting of an organization's telecommunication network, or its components, in the event
of loss. SIMILAR TERMS: Telecommunications Recovery, Data Communications Recovery
COMPUTER RECOVERY TEAM: A group of individuals responsible for assessing damage to the
original system, processing data in the interim, and setting up the new system.
CONSORTIUM AGREEMENT: An agreement made by a group of organizations to share processing
facilities and/or office facilities, if one member of the group suffers a disaster. SIMILAR TERMS:
Page 31 of 42
COMMAND CENTER: Facility separate from the main facility and equipped with adequate
communications equipment from which initial recovery efforts are manned and media-business
communications are maintained. The management team uses this facility temporarily to begin
coordinating the recovery process and its use continues until the alternate sites are functional.
CONTACT LIST: A list of team members and/or key players to be contacted including their backups.
The list will include the necessary contact information (i.e. home phone, pager, cell, etc.) and in most
cases be considered confidential.
CONTINGENCY PLANNING: Process of developing advance arrangements and procedures that
enable an organization to respond to an event that could occur by chance or unforeseen circumstances.
CONTINGENCY PLAN: A plan used by an organization or business unit to respond to a specific
systems failure or disruption of operations. A contingency plan may use any number of resources
including workaround procedures, an alternate work area, a reciprocal agreement, or replacement
CONTINUITY OF OPERATIONS PLAN (COOP): A COOP provides guidance on the system
restoration for emergencies, disasters, mobilization, and for maintaining a state of readiness to provide
the necessary level of information processing support commensurate with the mission
requirements/priorities identified by the respective functional proponent. This term traditionally is used
by the Federal Government and its supporting agencies to describe activities otherwise known as
Disaster Recovery, Business Continuity, Business Resumption, or Contingency Planning.
CRATE & SHIP: A strategy for providing alternate processing capability in a disaster, via contractual
arrangements with an equipment supplier, to ship replacement hardware within a specified time period.
SIMILAR TERMS: Guaranteed Replacement, Drop Ship, Quick Ship.
CRISIS: A critical event, which, if not handled in an appropriate manner, may dramatically impact an
organization's profitability, reputation, or ability to operate.
CRISIS MANAGEMENT: The overall coordination of an organization's response to a crisis, in an
effective, timely manner, with the goal of avoiding or minimizing damage to the organization's
profitability, reputation, or ability to operate.
CRISIS MANAGEMENT TEAM: A crisis management team will consist of key executives as well as
key role players (i.e. media representative, legal counsel, facilities manager, disaster recovery
coordinator, etc.) and the appropriate business owners of critical organization functions
CRISIS SIMULATION: The process of testing an organization's ability to respond to a crisis in a
coordinated, timely, and effective manner, by simulating the occurrence of a specific crisis.
CRITICAL FUNCTIONS: Business activities or information that could not be interrupted or
unavailable for several business days without significantly jeopardizing operation of the organization.
Page 32 of 42
CRITICAL INFRASTRUCTURE: Systems whose incapacity or destruction would have a debilitating
impact on the economic security of an organization, community, nation, etc
CRITICAL RECORDS: Records or documents that, if damaged or destroyed, would cause considerable
inconvenience and/or require replacement or recreation at considerable expense.
DAMAGE ASSESSMENT: The process of assessing damage, following a disaster, to computer
hardware, vital records, office facilities, etc. and determining what can be salvaged or restored and what
must be replaced.
DATA BACKUPS: The back up of system, application, program and/or production files to media that
can be stored both on and/or offsite. Data backups can be used to restore corrupted or lost data or to
recover entire systems and databases in the event of a disaster. Data backups should be considered
confidential and should be kept secure from physical damage and theft.
DATA BACKUP STRATEGIES: Those actions and backup processes determined by an organization
to be necessary to meet its data recovery and restoration objectives. Data backup strategies will
determine the timeframes, technologies, media and offsite storage of the backups, and will ensure that
recovery point and time objectives can be met.
DATA CENTER RECOVERY: The component of Disaster Recovery which deals with the restoration,
at an alternate location, of data centers services and computer processing capabilities. SIMILAR
TERMS: Mainframe Recovery, Technology Recovery.
DATA RECOVERY: The restoration of computer files from backup media to restore programs and
production data to the state that existed at the time of the last safe backup.
DATABASE REPLICATION: The partial or full duplication of data from a source database to one or
more destination databases. Replication may use any of a number of methodologies including mirroring
or shadowing, and may be performed synchronous, asynchronous, or point-in-time depending on the
technologies used, recovery point requirements, distance and connectivity to the source database, etc.
Replication can if performed remotely, function as a backup for disasters and other major outages.
(Similar Terms: File Shadowing, Disk Mirroring)
DISK MIRRORING: Disk mirroring is the duplication of data on separate disks in real time to ensure
its continuous availability, currency and accuracy. Disk mirroring can function as a disaster recovery
solution by performing the mirroring remotely. True mirroring will enable a zero recovery point
objective. Depending on the technologies used, mirroring can be performed synchronously,
asynchronously, semi-synchronously, or point-in-time. SIMILAR TERMS: File Shadowing, Data
Page 33 of 42
DECLARATION: A formal announcement by pre-authorized personnel that a disaster or severe outage
is predicted or has occurred and that triggers pre-arranged mitigating actions (e.g. a move to an alternate
DECLARATION FEE: A one-time fee, charged by an Alternate Facility provider, to a customer who
declares a disaster. NOTE: Some recovery vendors apply the declaration fee against the first few days
of recovery. 1) An initial fee or charge for implementing the terms of a recovery agreement or contract.
SIMILAR TERMS: Notification Fee.
DESK CHECK: One method of testing a specific component of a plan. Typically, the owner or author of
the component reviews it for accuracy and completeness and signs off.
DISASTER: A sudden, unplanned calamitous event causing great damage or loss. 1) Any event that
creates an inability on an organizations part to provide critical business functions for some
predetermined period of time. 2) In the business environment, any event that creates an inability on an
organization’s part to provide the critical business functions for some predetermined period of time. 3)
The period when company management decides to divert from normal production responses and
exercises its disaster recovery plan. Typically signifies the beginning of a move from a primary to an
alternate location. SIMILAR TERMS: Business Interruption; Outage; Catastrophe.
DISASTER RECOVERY: Activities and programs designed to return the entity to an acceptable
condition. 1) The ability to respond to an interruption in services by implementing a disaster recovery
plan to restore an organization's critical business functions.
DISASTER RECOVERY OR BUSINESS CONTINUITY COORDINATOR: The Disaster Recovery
Coordinator may be responsible for overall recovery of an organization or unit(s). SIMILAR TERMS:
Business Recovery Coordinator.
DISASTER RECOVERY INSTITUTE INTERNATIONAL (DRI INTERNATIONAL): A not-for-
profit organization that offers certification and educational offerings for business continuity
DISASTER RECOVERY PLAN: The document that defines the resources, actions, tasks and data
required to manage the business recovery process in the event of a business interruption. The plan is
designed to assist in restoring the business process within the stated disaster recovery goals.
DISASTER RECOVERY PLANNING: The technological aspect of business continuity planning. The
advance planning and preparations that are necessary to minimize loss and ensure continuity of the
critical business functions of an organization in the event of disaster. SIMILAR TERMS: Contingency
Planning; Business Resumption Planning; Corporate Contingency Planning; Business Interruption
Planning; Disaster Preparedness.
DISASTER RECOVERY SOFTWARE: An application program developed to assist an organization in
writing a comprehensive disaster recovery plan.
Page 34 of 42
DISASTER RECOVERY TEAMS (Business Recovery Teams): A structured group of teams ready to
take control of the recovery operations if a disaster should occur.
ELECTRONIC VAULTING: Electronically forwarding backup data to an offsite server or storage
facility. Vaulting eliminates the need for tape shipment and therefore significantly shortens the time
required to move the data offsite.
EMERGENCY: A sudden, unexpected event requiring immediate action due to potential threat to
health and safety, the environment, or property.
EMERGENCY PREPAREDNESS: The discipline that ensures an organization, or community's
readiness to respond to an emergency in a coordinated, timely, and effective manner.
EMERGENCY PROCEDURES: A plan of action to commence immediately to prevent the loss of life
and minimize injury and property damage.
EMERGENCY OPERATIONS CENTER (EOC): A site from which response teams/officials
(municipal, county, state and federal) exercise direction and control in an emergency or disaster.
ENVIRONMENT RESTORATION: Recreation of the critical business operations in an alternate
location, including people, equipment and communications capability.
EXECUTIVE / MANAGEMENT SUCCESSION: A predetermined plan for ensuring the continuity of
authority, decision-making, and communication in the event that key members of senior management
suddenly become incapacitated, or in the event that a crisis occurs while key members of senior
management are unavailable.
EXERCISE: An activity that is performed for the purpose of training and conditioning team members,
and improving their performance
. Types of exercises include: Table Top Exercise, Simulation Exercise, Operational Exercise, and Mock
FILE SHADOWING: The asynchronous duplication of the production database on separate media to
ensure data availability, currency and accuracy. File shadowing can be used as a disaster recovery
solution if performed remotely, to improve both the recovery time and recovery point objectives.
SIMILAR TERMS: Data Replication, Journaling, Disk Mirroring.
FINANCIAL IMPACT: An operating expense that continues following an interruption or disaster,
which as a result of the event cannot be offset by income and directly affects the financial position of the
Page 35 of 42
FORWARD RECOVERY: The process of recovering a database to the point of failure by applying
active journal or log data to the current backup files of the database.
HAZARD OR THREAT IDENTIFICATION: The process of identifying situations or conditions that
have the potential to cause injury to people, damage to property, or damage to the environment.
HIGH AVAILABILITY: Systems or applications requiring a very high level of reliability and
availability. High availability systems typically operate 24x7 and usually require built in redundancy
built-in redundancy to minimize the risk of downtime due to hardware and/or telecommunication
HIGH-RISK AREAS: Heavily populated areas, particularly susceptible to high-intensity earthquakes,
floods, tsunamis, or other disasters, for which emergency response may be necessary in the event of a
HOTSITE: An alternate facility that already has in place the computer, telecommunications, and
environmental infrastructure required to recover critical business functions or information systems.
HUMAN THREATS: Possible disruptions in operations resulting from human actions. (i.e., disgruntled
employee, terrorism, blackmail, job actions, riots, etc.)
INCIDENT COMMAND SYSTEM (ICS): Combination of facilities, equipment, personnel, procedures,
and communications operating within a common organizational structure with responsibility for
management of assigned resources to effectively direct and control the response to an incident. Intended
to expand, as situation requires larger resources, without requiring new, reorganized command structure.
INCIDENT MANAGER: Commands the local EOC reporting up to senior management on the recovery
progress. Has the authority to invoke the local recovery plan.
INCIDENT RESPONSE: The response of an organization to a disaster or other significant event that
may significantly impact the organization, its people, or its ability to function productively. An incident
response may include evacuation of a facility, initiating a disaster recovery plan, performing damage
assessment, and any other measures necessary to bring an organization to a more stable status.
Page 36 of 42
INTEGRATED TEST: A test conducted on multiple components of a plan, in conjunction with each
other, typically under simulated operating conditions
INTERIM SITE: A temporary location used to continue performing business functions after vacating a
recovery site and before the original or new home site can be occupied. Move to an interim site may be
necessary if ongoing stay at the recovery site is not feasible for the period of time needed or if the
recovery site is located far from the normal business site that was impacted by the disaster. An interim
site move is planned and scheduled in advance to minimize disruption of business processes; equal care
must be given to transferring critical functions from the interim site back to the normal business site.
INTERNAL HOTSITE: A fully equipped alternate processing site owned and operated by the
JOURNALING: The process of logging changes or updates to a database since the last full backup.
Journals can be used to recover previous versions of a file before updates were made, or to facilitate
disaster recovery, if performed remotely, by applying changes to the last safe backup. SIMILAR
TERMS: File Shadowing, Data Replication, Disk Mirroring.
LAN RECOVERY: The component of business continuity that deals specifically with the replacement
of LAN equipment and the restoration of essential data and software in the event of a disaster.
SIMILAR TERM: Client/Server Recovery.
LINE REROUTING: A short-term change in the routing of telephone traffic, which can be planned and
recurring, or a reaction to an outage situation. Many regional telephone companies offer service that
allows a computer center to quickly reroute a network of dedicated lines to a backup site.
LOSS REDUCTION: The technique of instituting mechanisms to lessen the exposure to a particular
risk. Loss reduction involves planning for, and reacting to, an event to limit its impact. Examples of
loss reduction include sprinkler systems, insurance policies, and evacuation procedures.
LOST TRANSACTION RECOVERY: Recovery of data (paper within the work area and/or system
entries) destroyed or lost at the time of the disaster or interruption. Paper documents may need to be
requested or re-acquired from original sources. Data for system entries may need to be recreated or
Page 37 of 42
MISSION-CRITICAL APPLICATION: An application that is essential to the organization’s ability to
perform necessary business functions. Loss of the mission-critical application would have a negative
impact on the business, as well as legal or regulatory impacts.
MOBILE RECOVERY: A mobilized resource purchased or contracted for the purpose of business
recovery. The mobile recovery center might include: computers, workstations, telephone, electrical
MOCK DISASTER: One method of exercising teams in which participants are challenged to determine
the actions they would take in the event of a specific disaster scenario. Mock disasters usually involve
all, or most, of the applicable teams. Under the guidance of exercise coordinators, the teams walk
through the actions they would take per their plans, or simulate performance of these actions. Teams
may be at a single exercise location, or at multiple locations, with communication between teams
simulating actual ‘disaster mode’ communications. A mock disaster will typically operate on a
compressed timeframe representing many hours, or even days.
NATURAL THREATS: Events caused by nature that have the potential to impact an organization.
NETWORK OUTAGE: An interruption in system availability resulting from a communication failure
affecting a network of computer terminals, processors, and/or workstations.
OFF-SITE STORAGE: Alternate facility, other than the primary production site, where duplicated vital
records and documentation may be stored for use during disaster recovery.
OPERATIONAL EXERCISE: One method of exercising teams in which participants perform some or
all of the actions they would take in the event of plan activation. Operational exercises, which may
involve one or more teams, are typically performed under actual operating conditions at the designated
alternate location, using the specific recovery configuration that would be available in a disaster.
OPERATIONAL IMPACT ANALYSIS: Determines the impact of the loss of an operational or
technological resource. The loss of a system, network or other critical resource may affect a number of
OPERATIONAL TEST: A test conducted on one or more components of a plan under actual operating
Page 38 of 42
PLAN ADMINISTRATOR: The individual responsible for documenting recovery activities and
tracking recovery progress.
PEER REVIEW: One method of testing a specific component of a plan. Typically, the component is
reviewed for accuracy and completeness by personnel (other than the owner or author) with appropriate
technical or business knowledge.
PLAN MAINTENANCE PROCEDURES: Maintenance procedures outline the process for the review
and update of business continuity plans.
RECIPROCAL AGREEMENT: Agreement between two organizations (or two internal business
groups) with basically the same equipment/same environment that allows each one to recover at each
RECOVERY: Process of planning for and/or implementing expanded operations to address less time-
sensitive business operations immediately following an interruption or disaster. 1) The start of the actual
process or function that uses the restored technology and location.
RECOVERY PERIOD: The time period between a disaster and a return to normal functions, during
which the disaster recovery plan is employed.
(RECOVERY SERVICES CONTRACT): A contract with an external organization guaranteeing the
provision of specified equipment, facilities, or services, usually within a specified time period, in the
event of a business interruption. A typical contract will specify a monthly subscription fee, a declaration
fee, usage costs, method and amount of testing, termination options, penalties and liabilities, etc.
RECOVERY STRATEGY: An approach by an organization that will ensure its recovery and continuity
in the face of a disaster or other major outage. Plans and methodologies are determined by the
organizations strategy. There may be more than one methodology or solution for an organizations
strategy. Examples of methodologies and solutions include,
contracting for Hotsite or Coldsite, building an internal Hotsite or Coldsite, identifying an Alternate
Work Area, a Consortium or Reciprocal Agreement, contracting for Mobile Recovery or Crate and Ship,
and many others.
RECOVERY POINT OBJECTIVE (RPO): The point in time to which systems and data must be
recovered after an outage. (e.g. end of previous day's processing). RPOs are often used as the basis for
the development of backup strategies, and as a determinant of the amount of data that may need to be
recreated after the systems or functions have been recovered.
RECOVERY TIME OBJECTIVE (RTO): The period of time within which systems, applications, or
functions must be recovered after an outage (e.g. one business day). RTOs are often used as the basis
Page 39 of 42
for the development of recovery strategies, and as a determinant as to whether or not to implement the
recovery strategies during a disaster situation. SIMILAR TERMS: Maximum Allowable Downtime.
RESPONSE: The reaction to an incident or emergency to assess the damage or impact and to ascertain
the level of containment and control activity required. In addition to addressing matters of life safety and
evacuation, Response also addresses the policies, procedures and actions to be followed in the event of
an emergency. 1) The step or stage that immediately follows a disaster event where actions begin as a
result of the event having occurred. SIMILAR TERMS: Emergency Response, Disaster Response,
Immediate Response, and Damage Assessment.
RESTORATION: Process of planning for and/or implementing procedures for the repair or relocation
of the primary site and its contents, and for the restoration of normal operations at the primary site.
RESUMPTION: The process of planning for and/or implementing the restarting of defined business
operations following a disaster, usually beginning with the most critical or time-sensitive functions and
continuing along a planned sequence to address all identified areas required by the business. 1) The step
or stage after the impacted infrastructure, data, communications and environment has been successfully
re-established at an alternate location.
RISK: Potential for exposure to loss. Risks, either man-made or natural, are constant. The potential is
usually measured by its probability in years.
RISK ASSESSMENT / ANALYSIS: Process of identifying the risks to an organization, assessing the
critical functions necessary for an organization to continue business operations, defining the controls in
place to reduce organization exposure and evaluating the cost for such controls. Risk analysis often
involves an evaluation of the probabilities of a particular event.
RISK MITIGATION: Implementation of measures to deter specific threats to the continuity of business
operations, and/or respond to any occurrence of such threats in a timely and appropriate manner.
SALVAGE & RESTORATION: The process of reclaiming or refurbishing computer hardware, vital
records, office facilities, etc. following a disaster.
SIMULATION EXERCISE: One method of exercising teams in which participants perform some or all
of the actions they would take in the event of plan activation. Simulation exercises, which may involve
one or more teams, are performed under conditions that at least partially simulate ‘disaster mode’. They
may or may not be performed at the designated alternate location, and typically use only a partial
STANDALONE TEST: A test conducted on a specific component of a plan, in isolation from other
components, typically under simulated operating conditions.
Page 40 of 42
STRUCTURED WALKTHROUGH: One method of testing a specific component of a plan. Typically,
a team member makes a detailed presentation of the component to other team members (and possibly
non-members) for their critique and evaluation.
SUBSCRIPTION: Contract commitment that provides an organization with the right to utilize a vendor
recovery facility for processing capability in the event of a disaster declaration.
SYSTEM DOWNTIME: A planned or unplanned interruption in system availability.
TABLE TOP EXERCISE: One method of exercising teams in which participants review and discuss
the actions they would take per their plans, but do not perform any of these actions. The exercise can be
conducted with a single team, or multiple teams, typically under the guidance of exercise facilitators.
TEST: An activity that is performed to evaluate the effectiveness or capabilities of a plan relative to
specified objectives or measurement criteria. Types of tests include: Desk Check, Peer Review,
Structured Walkthrough, Standalone Test, Integrated Test, and Operational Test.
TEST PLAN: A document designed to periodically exercise specific action tasks and procedures to
ensure viability in a real disaster or severe outage situation.
UNINTERTUPTIBLE POWER SUPPLY (UPS): A backup supply that provides continuous power to
critical equipment in the event that commercial power is lost.
VITAL RECORD: A record that must be preserved and available for retrieval if needed.
WARM SITE: An alternate processing site which is equipped with some hardware, and
communications interfaces, electrical and environmental conditioning which is only capable of
providing backup after additional provisioning, software or customization is performed.
WORKAROUND PROCEDURES: Interim procedures that may be used by a business unit to enable it
to continue to perform its critical functions during temporary unavailability of specific application
systems, electronic or hard copy data, voice or data communication systems, specialized equipment,
office facilities, personnel, or external services. SIMILAR TERMS: Interim Contingencies.
Page 41 of 42
APPENDIX B- Online Sources of Additional Information
Page 42 of 42