A detailed overview of the business continuity / disaster recovery planning process. Gives numerous tips for effective execution of plan development. Emphasizes development of a true recovery capability through exercises which reveal weaknesses in the plan or technology leading to improvements.
The other parts of organizational resiliency are disaster recovery (part of business continuity), crisis management (protection of personnel & other assets, crisis communications w/stakeholders & press), and emergency management (protection of people during immediate stage of crisis, comm. with fire/police, etc.)
The previous lesson guided you through making a risk assessment. This lesson helps you to identify mission-critical business functions, their supporting IT services, and create meaningful metrics to measure acceptable service downtime and data loss.
Building a Business Continuity Capability
A Detailed Overview of
Business Continuity Planning
Rod Davis, CRISC, CBCP
Formal education: BS Electrical Engineering
26 years in SIL International, 13 years in Mali, West Africa
Information Technology, Contingency Planning, Data
Recovery, and Business Continuity
IT Certifications: Security+, Network+, A+, MCSA
CRISC - Certified in Risk and Information Systems Control
CBCP - Certified Business Continuity Professional, DRII.org
Identify what/who they are in your organization.
Champion/facilitate information sharing.
Create forum for shared decision making.
Disaster Recovery is a subset of Business Continuity
Business Continuity Planning helps achieve Organizational Resilience
The Business Continuity Planning Cycle
If a natural disaster struck a data center rendering critical IT
If a terrorist attack targeted an overseas regional center?
If a pandemic threatened global operations for your mission?
The occurrence of some events could cause a temporary
disruption of mission-critical services.
Some scenarios could actually result in long-term loss of
The ‘unthinkable’ might include disruption or shutdown of
programs that these services and capacity support.
Disaster – an event, which causes the loss of an
essential service, or part of it, for a length of time
which imperils mission achievement.
― Andrew Hiles, Business Continuity: Best Practices
Disaster – An event that compromises an
organization’s ability to provide critical functions,
processes, or services for some unacceptable period of
― Disaster Recovery Journal
Disaster Recovery Planning: The activities associated
with the continuing availability and restoration
Planning of the IT infrastructure.
― BCI Dictionary of BC Management Terms
Close within two years
Organizations that experience major data loss without disaster recovery
* Cummings, Haag, & McCubbrey (2005). Management
Information Systems for the Information Age.
Business Continuity Planning is the process of developing
prior arrangements and procedures that enable an
organization to respond to an event in such a manner that
critical business functions can continue within planned levels
of disruption. The end result of the planning process is the BC
― BCI Dictionary of BC Management Terms
A resilient organization is one that is able to
achieve its core objectives in the face of
Planning is ...
“It does not do to leave a live dragon out of your
calculations, if you live near him.”
― Gandalf in ‘The Hobbit’, by J.R.R. Tolkien
Two questions to ask …
• Is he alive?
• Does he live near you?
The Threat Equation
Determine Relevant Threats
Threats, Vulnerability, and Probability
Risk = Threat +
• Winter storm
• Fire (accidental or
• Data theft or loss
• Terrorist attack
• Workplace violence
• Civil unrest
• Coup d'état
• Civil war
• Chemical or
• Power grid failure
• Petroleum supply
• Food or water
• Public utility failure
(water, sewer, etc.)
• Heating/Cooling system
failure (affects IT &
• Public transport
• Determine the most
relevant threats, e.g.
in your location
• High frequency of
electrical storms =
high probability of
• Lack of lightning
suppression = high
vulnerability to a
What is Business Impact Analysis?
Impact Rating System
Recovery Point Objective
Recovery Time Objective
The process of analyzing business functions and
the effect that a business disruption might have
― Business Continuity Institute
A process used to identify and prioritize:
Critical business functions and processes
Essential IT services and data
Required staff and equipment
Identify mission-critical business functions.
Which ones require the highest level of risk mitigation?
Determine impact of disruptions over time.
Establish recovery priorities in case of disruption.
RPO – Recovery Point Objective RTO – Recovery Time Objective
Point of last data backup Systems fully recovered
• RPO – Recovery Point Objective
• The maximum data loss that an organization will tolerate. Data and
systems must be restored to this point after a disruption.
• RTO – Recovery Time Objective
• The maximum period of time that an organization accepts for recovery of
business functions, systems, and processes.
List of mission-critical business functions and their IT
Recovery Time Objectives (RTOs) for these priorities
Recovery Point Objectives (RPOs) for IT assets
Recovery priorities … What do you recover first?
• Mission-critical business functions are those whose
sustained failure could severely impair the business or lead
to its imminent failure.
• Examples of disruptions to business functions/processes:
o Inability to meet employee payroll
o Unable to process critical bank transfers
o Critical financial data is corrupted
• Mission-critical refers to any network, system or
application whose sustained failure would severely disrupt
• Examples of disruptions to technology functions include:
o Fire in a server room (destroying critical data)
o Storm causing sustained power & Internet outage
o Ransomware encrypts critical files on a data server
Department Managers should decide - They understand which
processes and services are most critical to their department’s successful
The Directors should decide - They know what things are critical to the
accomplishment of the overall corporate mission.
Do the IT Admins decide? Often without clear direction from
leadership, by default they decide which data and services are critical to
Point – Leadership should decide what is mission-critical, and IT staff should
implement their decisions.
*Recovery times shown are arbitrary and will vary greatly depending
on the type of business.
• Mission-Critical: 12 - 48 hours
• Highest priority for rapid recoverability
• Vital: 3 - 5 days
• Essential to operatons but not as critical
• Important: 1 - 4 weeks
• Long-term absence has eventual impact.
• Minor: Months
• Absence causes minimal impact.
Identify low-hanging fruit
Group related functions together
Deciding what to protect
Use the Risk Assessment with your Recovery
Priorities to identify risk mitigation that will
produce the greatest positive impact for the least
Example: Risk Assessment discovers that both an IT
system and its only backup device are stored in the same
room. The Recovery Priorities mark this as a mission-
Solution: Move the backup device to another building on
Don’t attempt to identify every single business function/process, and
every single server, etc.
Instead, identify RELATED business functions and GROUP them together
into LOGICAL SYSTEMS.
Example: FINANCE SYSTEM - All the component parts of that system
have to be working for that system to function …
Your Business Impact Analysis identifies mission-critical functions
and ranks which should be recovered first.
Choose five systems which you think are among the most mission-
Chose one from that list of five systems … Develop System
Documentation and Recovery Procedures for that one system.
Apply lessons learned from this example to your remaining systems.
Data Storage and Backup Systems
For in-house hosted systems, do both local backup and
Examples of vendors for cloud based backup:
Crashplan (Business/Enterprise), Carbonite.
For cloud-based systems (e.g., Google Drive), use cloud-to-
Examples of vendors for cloud-to-cloud backup:
Recovery Operations Leader
Response and Recovery Teams
Guidelines for Writing Recovery Documentation
Templates for System Documentation and Recovery Procedures
• Mission critical systems identified and prioritized
• RPOs, RTOs established
• Vital Records, Databases, IT Services
• Designated Roles and Responsibilities
• Contact InformationTeams
• Recovery Procedures for Mission-Critical Functions,
• Business Owners test/certify recovered systems.
• Plan Activation: Transition Point from Emergency
Response to Plan Activation
• Declaration: Disruptive Event to Disaster
Business Continuity Theory
Has authority to declare an emergency or disaster
Can direct folks to stay home or move to alternate
Allows departmental staff and IT Department to
focus on recovery effort
Primary focus is on initial response to event
Secures IT assets
Gives preliminary recovery time estimates
• Recover business function
• Activity of business ownerBusiness
• Recover IT systems
• Activity of IT
• Departmental staff
validates functionality of
Focus is on
Documentation should be developed by the system owner,
i.e., the one who by default manages that system.
In other words, don’t assign a specific individual to write all
the documentation; task the people responsible for those
systems to write it.
The person writing the recovery procedure should write it
with the following assumptions:
The person performing the recovery is not normally
responsible for this service.
The person performing the recovery has sufficient
staff roles and
Do not attempt to design for the worst case scenario.
Initially focus your efforts on recovery from smaller scale events, e.g.,
Finance Systems has failed, needs to be rebuilt and tested.
Grow your planning efforts to handle more disastrous events, e.g.,
A fire has destroyed your data center, all in-house hosted systems
are down and need to be rebuilt.
Exercise your staff and your plan to develop a true recovery capability!
• The goal of testing your disaster recovery plan is not to find out if it
works, but to determine where it fails.
• A planned test should never, never cause a business interruption!
o Don’t lose your data in the process of testing your data recovery plan!
o Don’t shut down a mission-critical service as a result of the test.
• Testing the disaster recovery plan reveals weaknesses and
also trains staff.
• As you execute the test, weaknesses are revealed.
• The staff evaluates the results of the test, and this helps staff
to ‘own’ the plan.
• Training staff helps test your plan
• As you describe the plan to your staff, they may notice
inconsistencies or weaknesses.
• Document Review – Validate the disaster recovery plan via stakeholder’s
review of the recovery documentation.
• Table-Top Testing – Simple walkthrough of the plan in a safe environment,
e.g., conference room.
• Advanced Table-top Exercise – Directed simulation of activating the disaster
recovery plan against a specific business disruption scenario.
• Component Test - Evaluation of a single threat event impacting a single
• Comprehensive Simulation - An exercise to evaluate overall recovery
capability in a high-stress environment.
Proverbs 21:5 Good planning and hard work lead to
prosperity, but hasty shortcuts lead to poverty.
“You can't plow a field simply by turning it over in your
mind.” ― Gordon B. Hinckley
Business Continuity Planning focuses on recovery of the broader business; Disaster
Recovery Planning focuses on recovery of its IT Infrastructure.
Both are closely related and use a nearly identical process, and realistically you cannot fully
consider one without the other.
Use the Risk Assessment with your Recovery Priorities to identify high ROI risk
Identify RELATED business functions and GROUP them together into LOGICAL SYSTEMS
Pick the top five business functions and/or IT systems, then pick one, complete the BC/DR
planning process for that one.
Then apply lessons learned to the other five and eventually for the remaining systems.
Business Continuity Institute - TheBCI.org
Six ‘Good Practice Guidelines’
Disaster Recovery Institute International -DRII.org
Ten professional practices
The International Consortium for Organizational Resilience
Disaster Recovery Journal – drj.com
Dictionary of Business Continuity Management Terms:
Business Continuity Institute - BCI
International Glossary for Resiliency
maintained by DRI International
Business Continuity Glossary by DRJ
Disaster Recovery Journal
ISO 22301:2012 - Societal security
This has emerged as the predominant ‘gold standard’.
NIST Special Publication 800-34 Rev. 1 - Contingency Planning Guide for
Federal Information Systems