Information Technology Disaster Recovery Guide - ABC Bank (redacted)

Information Technology
Disaster Recovery Guide

Presented to

ABC Bank

April 11, 2014

(REDACTED)

Information Technology Disaster Recovery Guide – ABC Bank Page 2

Table of Contents
Table of Contents ....................................................................................................................................................... 2
Revision History ......................................................................................................................................................... 3
Current Situation ........................................................................................................................................................ 4
Current IT Infrastructure Overview ............................................................................................................................ 5
Statement of Intent.................................................................................................................................................... 6
Mission Statement ..................................................................................................................................................... 6
Objectives ................................................................................................................................................................... 6
Assumptions ............................................................................................................................................................... 7
Testing this Disaster Recovery Plan ........................................................................................................................... 8
Maintaining this Disaster Recovery Plan.................................................................................................................... 9
Disaster Risks & Prevention ..................................................................................................................................... 10
Disaster Preparation ................................................................................................................................................ 12
IT Disaster Recovery Plan Overview ......................................................................................................................... 14
Disaster Recovery Team ........................................................................................................................................... 15
Activating the Disaster Recovery Plan ..................................................................................................................... 17
Damage Assessment & Equipment Salvage ............................................................................................................. 19
Backup Process Overview ........................................................................................................................................ 20
Backup Process – IT Core Services ........................................................................................................................... 23
Backup Process – Line of Business Applications ...................................................................................................... 25
Backup Process – IT Support Services ...................................................................................................................... 27
Restore Procedures – IT Core Services .................................................................................................................... 28
Restore Procedures – Business Application Services ............................................................................................... 31
Restore Procedures – SQL Server Agent .................................................................................................................. 32
Restore Procedures – Symantec Backup Exec System Recovery ............................................................................. 33
Restore Procedures – Nimble Snapshot .................................................................................................................. 34
Restore Procedures – vSphere Data Protection ...................................................................................................... 35
Restore Procedures – Symantec Backup Exec ......................................................................................................... 36
Observations & Recommendations ......................................................................................................................... 37
About the Author ..................................................................................................................................................... 39
Appendix A – IT Management & Staff Contact List .................................................................................................. 40
Appendix B – Vendor Contact List ............................................................................................................................ 41
Appendix C – Line of Business (LOB) Application Owners ....................................................................................... 42
Appendix D – Application Inventory ........................................................................................................................ 43
Appendix E – Server & Network Hardware Inventory ............................................................................................. 44
Appendix F – Network Topology Maps .................................................................................................................... 45
Appendix G – Network Device Configuration .......................................................................................................... 46
Appendix H – Disaster Recovery Site Details ........................................................................................................... 47


Revision History
Date Version Editor & Description
04/04/2014 1.0 Stephen White, initial draft
04/11/2014 1.1 Incorporate content changes & feedback from David Park
04/11/2014 1.1R Redacted version, IP addresses & hostnames have been changed or removed


Current Situation
ABC Bank (ABC) is a 25‐year old financial institution headquartered in Los Angeles, with 12 retail branches located
throughout California, New York, and New Jersey.  They have approximately 100 employees in the corporate office plus
an additional 200 employees in the 12 branches.
SMCI has been tasked to develop an Information Technology Disaster Recovery (DR) Plan, to be used as a guidebook in
the event of a disaster that renders all or part of the IT infrastructure inoperable.  Findings detailed in this document are
based upon interviews with the ABC CIO and IT Manager as well as a cursory discovery and understanding of the ABC IT
infrastructure and organizational requirements.
This DR plan, while part of a larger Business Continuity Plan, is limited in scope to those processes and procedures
necessary to recover the IT infrastructure and is designed with the assumption that failover to the DR site is necessary
due to a disaster affecting the primary data center.


Current IT Infrastructure Overview
ABC maintains its primary data center in a Savvis Colocation (colo) facility in El Segundo, and located there are 9 physical
servers running VMware hosting 29 production virtual servers running a combination of Windows Server 2003 and 2008.
For core storage for the VMware environment, ABC utilizes a 24TB (raw) Nimble CS240 SAN (Storage Area Network).
The virtual servers run a variety of services, including Active Directory, Microsoft SQL Server, Microsoft System Center,
Sophos (Endpoint Protection) , Websense (Web Filtering), Solarwinds (Network Monitoring), McAfee (hard drive
encryption), and other IT and banking‐specific applications.  There are also 8 additional physical servers for Cisco voice
(VOIP), Microsoft Exchange Server 2007, Microsoft Exchange Outlook Web Access, Microsoft Windows Storage Server
2008 R2 (Network Attached Storage), Patriot Officer, SWIFTNet, Symantec Backup Exec (backup system), and VMware
vCenter (for managing the VMware environment).
ABC currently maintains several physical servers in a small data center in the corporate office, including a Cisco voice
server (VOIP), an Active Directory domain controller, security and video surveillance servers, and an 18TB NAS (Network
Attached Storage) server running Microsoft Windows Storage Server 2008 R2.  The Cisco voice server replicates with its
peer located in the colo data center, including replication of both configuration as well as voicemail data.  The NAS
server replicates in near real‐time with its peer located in the primary data center using Microsoft Distributed File
System (DFS), a feature of Windows Server.
ABC also maintains a disaster recovery (DR) site at a Savvis Colocation facility in Tukwila, Washington.  Located there are
3 physical servers running VMware.  For core storage for the DR site VMware environment, ABC utilizes a 24TB (raw)
Nimble CS240 SAN.  The Nimble SANs in the primary data center and DR site automatically replicate twice daily (5am
and 5pm) across the WAN link.
For end‐user computing, there are approximately 275 laptop and desktop machines, running a mix of Windows XP and
Windows 7, which are used by administrative and branch staff.
In each of the 12 ABC branches, there is a single physical server running Microsoft Windows Server 2008 R2 for Active
Directory, File, Print, and DNS services, plus anywhere between 4 and 40 laptop/desktop machines, depending on the
size of the branch, running Windows 7 for end‐user computing.
Branches communicate with each other and the corporate offices, primary data center, and DR sites via a Sprint MPLS
network with between 1.5 and 10Mbps connections, depending on the size of the branch.
The corporate office maintains several separate and dedicated network connections, including a 10Mbps connection to
the Sprint MPLS network for branch communications, a 100Mbps connection through Cogent for internet services, a
500Mbps connection to the data center through Time Warner Cable, and a dedicated 50Mbps connection through Time
Warner Cable for the isolated wireless LAN.
The data center maintains several separate and dedicated network connections, including a 100Mbps connection to the
Savvis backbone (for connection to the DR site), dedicated T‐1 circuits through AT&T for Fiserv Core and Fundtech, a
50Mbps connection to the Sprint MPLS network for branch communications, a 100Mbps connection through Sprint for
internet services, plus dedicated VPN circuits for Fiserv EFT, Fundtech, FRB, and SWIFTNet.
The DR site maintains several separate and dedicated network connections, including a 100Mbps connection to the
Savvis backbone (for connection to the colo data center), a 3Mbps connection to the Sprint MPLS network for branch
communications, a 1.5Mbps connection through Sprint for internet services, plus dedicated VPN circuits for Fiserv Core,
Fiserv EFT, and Fundtech.


Statement of Intent
This document outlines procedures for recovering critical IT platforms as well as disaster recovery (DR) for the Bank’s
primary data center.  The DR plan focuses on systems and infrastructure and supports the Bank’s broader Business
Continuity Plan (BCP).
Every system outage is different and thus it would be difficult to plan for every situation, therefore, where appropriate,
general disaster recovery strategies and procedures are presented.  In the event of a disaster, modifications to these
processes and procedures may be made to ensure the safety of personnel, systems, and data.  Not all recovery
procedures will be necessary for every outage or disaster situation.
The overall purpose of this plan is to ensure information system uptime and data confidentiality, integrity, and
availability, and to support overall business continuity.
Mission Statement
IT management has approved the following Mission Statement:
 The company shall develop a comprehensive IT disaster recovery plan
 The Bank’s Business Impact Analysis (BIA) shall be used to drive the requirements for the disaster recovery
plan
 The disaster recovery plan should cover all essential and critical infrastructure elements, systems, and
networks, in accordance with key business activities
 The disaster recovery plan should be periodically tested in a simulated environment to ensure that it can be
effectively implemented in a disaster situation and that the management and staff understand how it is to
be executed
 All staff must be made aware of the disaster recovery plan and their respective roles
 The disaster recovery plan is to be kept up to date to take into account changing personnel, infrastructure,
and applications
Objectives
The principal objective of the disaster recovery plan is to develop, test, and document a well‐structured and easily
understood set of procedures, which will help the company recover as quickly and effectively as possible from an
unforeseen outage or emergency that interrupts information systems and business operations.  Additional objectives
include:
 Protect Confidentiality, Integrity, and Availability of ABC data and the associated IT infrastructure
 Present an orderly course of action for restoring critical IT infrastructure and applications within 72 hours
of the initiation of the plan (Recovery Time Objective)
 Ensure not more than 24 hours of data loss (Recovery Point Objective)
 Provide information concerning key ABC personnel and vendors, and other key parties required to
effectively implement this plan, and the specific computing expertise required
 Identify all IT equipment, software, procedures, and other items necessary for the recovery
 Detail all major components of the ABC IT infrastructure
 Detail the sequence of recovery
In case some or all of the ABC IT staff is unavailable, IT vendors familiar with Microsoft/VMWare platforms and typical
Cisco/MPLS network architecture can use this document to restore the IT infrastructure.  It is important to note that
many of the critical banking functions have been outsourced to third party service providers (e.g. Fiserv, Fundtech) and
as such, in some cases, the assistance of these vendors may be necessary to support recovery operations.


Assumptions
For purposes of development of this plan, the following assumptions are made:
 Primary colo data center in El Segundo is partially or completely destroyed, or one or more critical line of
business (LOB) applications are unusable
 The alternate processing site (DR site) is up and available to handle temporary data processing
 All necessary utilities, data, and communication circuits are available as documented and tested within the
plan
 Critical records and/or backup media are stored off‐site and have survived the disaster or disruption
 Critical IT recovery team members, or appropriately skilled IT vendor personnel, are available to perform the
procedures as defined within the plan
 Other departments within ABC have developed, implemented, and validated their own internal disaster
recovery plans
 Business Partners, service providers, vendors, and other external organizations perform according to their
general commitments, and/or service level agreements, to support our organization in recovery from the
disaster
 Business Partners, service providers, vendors, and other external organizations have implemented and
validated their own internal disaster recovery plans


Testing this Disaster Recovery Plan
The FFIEC IT Examination Handbook for Business Continuity Planning has an entire section on testing, located here:
http://ithandbook.ffiec.gov/it‐booklets/business‐continuity‐planning/risk‐monitoring‐and‐testing/
principles‐of‐the‐business‐continuity‐testing‐program/testing‐policy.aspx
In summary, it establishes a requirement for organizations to develop a disaster recovery test plan, presents guidelines
for this plan, and strategies for its implementation.  The test plan should include both internal as well as external service
providers and hosted systems.
The objective of this testing program is to ensure that the disaster recovery plan is accurate, complete, and viable under
adverse conditions.  While comprehensive tests require greater investments of time, resources, and coordination, this
level of testing will more accurately depict a true disaster and will assist management in assessing the organization’s
actual capacity to execute the DR plan in the event of a true disaster.  Comprehensive testing of critical functions and
applications will also allow management to identify potential problems or gaps in the DR plan.
Using guidance from the FFIEC documentation, and financial industry best practices, ABC should develop a
comprehensive disaster recovery test plan and this test plan should be executed at least annually, with more frequent
testing following any significant changes in the IT organization or infrastructure.


Maintaining this Disaster Recovery Plan
Having a disaster recovery plan is critical, and the plan will rapidly become obsolete if a workable procedure for
maintaining the plan is not also developed and implemented.
ABC IT personnel should update this document as personnel change, devices are added or removed from the
environment, devices are upgraded or their configurations change, or there are additions to or deletions from the
corporate applications portfolio.
Basic Maintenance
This plan should be reviewed twice each year by both ABC IT and appropriate management and operations personnel.
In addition, the plan should be tested on a regular basis and any discovered faults corrected.  The designated Disaster
Recovery Plan coordinator has the responsibility of maintaining the plan and its related documents.
It is inevitable in the changing environment of the computer industry that this disaster recovery plan will become
outdated and unusable unless someone keeps it up to date.  Changes that will likely affect the plan fall into several
categories:
 Hardware changes
 Software changes
 Facility changes
 Procedural changes
 Personnel changes
As changes occur in any of the areas mentioned above, appropriate ABC personnel should determine if changes to the
plan are necessary.
Changes Requiring Plan Maintenance
The following lists some of the types of changes that may require revisions to the disaster recovery plan.  Any change
that can potentially affect whether the plan can be used to successfully restore the operations of ABC computer and
network systems should be reflected in the plan.
 Additions, deletions, or upgrades to hardware platforms
 Changes to hardware system configuration
 Additions, deletions, or upgrades to system software
 Changes to software applications affected by the plan
 Changes to facilities that affect IT infrastructure
 Changes that affect the availability/usability of the DR site location
 Changes to personnel identified by name in the plan
 Changes to off‐site backup procedures, tape storage location, etc.
 Changes to application backups
 Changes to vendor lists maintained for acquisition and support purposes


Disaster Risks & Prevention
As important as it is to have a disaster recovery plan is, planning, and taking other measures to prevent a disaster, or to
mitigate its effects beforehand, is even more important.  This portion of the Disaster Recovery Plan reviews the various
threats that can lead to a disaster, wherein the vulnerabilities lie, and steps that should be taken to minimize risk.  The
threats covered here are both natural and human‐created.
1. Earthquake
The USGS database shows that there is a 67% chance of a major earthquake, magnitude 6.7 or greater, in the Los
Angeles area within the next 30 years.  The largest earthquake in recent memory within 30 miles of Los Angeles was the
Northridge event, a magnitude 6.7, which occurred in 1994 and caused significant damage to facilities and infrastructure
as far as 85 miles away.
Given the location and proximity of the corporate offices and primary data center, and the southern California branches,
an earthquake has the potential for being the most disruptive event for this DR plan, and could possibly damage or
destroy the corporate offices, data center, and nearby branch locations.  Restoration of computing and networking
facilities back to the pre‐disaster state following a significant earthquake event could be very difficult and require an
extended period of time due to the need to do wide scale building and infrastructure repairs.
The colo site building has been designed to be able to handle a magnitude 7.0 earthquake and survive with minimal
damage.  By this factor, as well as by choosing a DR site located in a different geographical area, ABC has effectively
mitigated about as much of the risk from this threat as possible.
2. Computer Crime
Computer crime is becoming more of a threat as systems become more complex and access is more highly distributed.
While computer crime usually does not typically affect hardware in a destructive manner, it may be more insidious, and
may often come from within the organization.  For example, a disgruntled employee could plant or spread viruses or
malware, could delete or disseminate sensitive data, or otherwise sabotage production computing systems and/or data.
All servers and workstations should have adequate security to protect against unauthorized access.  All systems should
be protected by complex/strong passwords, especially those systems that host and/or have access to sensitive data.  All
users should be required to change their passwords on a regular basis, e.g. every 90 days.  All servers and perimeter
security devices (firewalls) should have auditing enabled to log unauthorized attempts to access servers and data, and
ABC IT administrators should review these logs on a frequent basis.
Core servers and engineering machines should be backed up on a regular basis.  Those backups should be stored on
magnetic tape and those tapes taken offsite on at least a weekly basis.  There should also be established standards for
the number of backup cycles to retain and the length of their retention.  Industry best practices suggest that that at least
5 weekly backups and 12 monthly backups are retained in an offsite location.
Continue to improve security functions on all platforms.  Strictly enforce policies and procedures when violations are
detected.  Regularly reinforce with users the importance of keeping their passwords secret, and instruct users on how to
choose complex/strong passwords that are difficult to guess.
Constantly improve network security.  For wireless LAN (WLAN) infrastructure, choose complex/strong passwords for
WLAN access and “shared secrets”, and consider using machine‐based certificates or RADIUS for WLAN authentication.
ABC has implemented extensive, multiple, layered security mechanisms and a number of discrete protection,
management, and monitoring systems that enforce and monitor perimeter, web, email, application, and endpoint
security and activity.  Through implementation and utilization of these systems, plus active enforcement of policies
related to information security and regulatory compliance, ABC has effectively mitigated most of the risk from this
threat category.


3. Physical or Virtual Server Failure
Server failure is always a risk, even in a virtualized server environment.  The physical server hardware could fail, causing
the physical or virtual server(s) and application(s) located thereon to become unavailable.  The physical or virtual server
operating system, database engine, or application could fail and become unavailable.  Lastly, an application’s data could
become corrupted, causing the application to fail and become unavailable.
As detailed in a later section, in select cases, individual databases can be recovered from SQLAGENT backups, physical
servers can be recovered using BESR, and virtual servers can be recovered from VDP backups.  Also, the VMware virtual
server environment is mostly self‐managing, in that the VMware management server can automatically migrate VMs off
failing hardware, thereby preventing failure of a VM because of failure of its host.  By using these capabilities of
VMware, plus the multiple levels and types of backups, ABC has effectively mitigated most of the risk from this threat
category.
4. Network Failure
Network failure is always a risk, even in a collocated IT environment with redundant network links.
ABC has built out a highly capable wide area network using multiple redundant network links from separate network
providers, with mostly automated failover capabilities, ensuring that a failure of a single provider’s network will not
significantly impact ABC operations.  However, there is one area of risk that remains, specifically the Sprint MPLS
network.  While MPLS switching failures are rare, without access to a backup network, a MPLS failure could cause
significant impact to the organization.
ABC should consider deploying a backup MPLS network with an alternate provider to be used in the event of a failure of
the Sprint MPLS network.  As this would be simply a backup to the primary network transport, sites could be connected
with minimal bandwidth, potentially even a burstable‐type circuit such as frame relay, or this redundant service could be
used in a network load‐balanced topology to provide both fault tolerance and improved network connectivity to the
branch locations.
5. Conventional Threats
Conventional threats include those such as fire, flood, power outage, theft, sabotage, and the like.
By choosing to house critical IT infrastructure in Tier 3 (see ANSI/TIA‐942, Telecommunications Infrastructure Standard
for Data Centers) datacenters, which are highly secured environments complete with stringent access controls, multiple
redundant distribution paths for networking and power, and a site infrastructure with contracted availability of at least
99.982%, ABC has effectively mitigated most of the risk from this threat category.


Disaster Preparation
1. Disaster Recovery Planning
The first and most important thing to do is to have a well‐developed and updated Disaster Recovery Plan, which can be
used in response to a disaster.  The extent to which this plan can be effective, however, depends upon ABC IT
management and staff to review and update this plan, and practice for contingencies, as changes to the IT
infrastructure, personnel, and application portfolio occur.
2. Recovery Facility
If the primary data center is partially or completely destroyed in a disaster, repair or rebuilding of that facility could take
an extended period of time.  In the interim, it would be necessary to failover to the DR site in Tukwila.  This document
details procedures necessary to bring up all or part of the infrastructure at the DR site.
3. Replacement Equipment
Once completed, with input from ABC IT personnel, this document will contain a complete inventory of all servers and
network devices, including any software, that must be eventually restored at the primary data center following a
disaster.  The inevitable changes that occur in the systems over time require that this plan be periodically updated to
reflect the most current inventory.
Where possible, agreements should be made with vendors to supply replacement equipment on an emergency basis.
To avoid problems and delays in the recovery, every attempt should be made to replicate current system configurations,
however, there will likely be cases where components are not available or the delivery timeframe is unacceptably long.
The recovery team should have the expertise and resources to work through these problems as they are recognized and
although some changes may be required to the procedures documented in this plan, using different models of
equipment or equipment from different vendors may be suitable to expediting the recovery process.
4. Backups
New hardware can be purchased, new buildings can be built, and new employees can be hired.  However, the data that
was stored in the pre‐disaster IT infrastructure cannot be replaced or recreated; it must be restored from a copy that
was not affected by the disaster.
Disaster Recovery Site Data Replication
ABC has decided to employ a geographically separated DR site for all business critical IT infrastructure as well as other
redundant measures to protect specific information systems and infrastructures.
By utilizing a virtual server infrastructure and redundant Storage Area Network (SAN) storage devices, with automated
site‐to‐site data replication and a partially automated failover mechanism, ABC has effectively mitigated much of the risk
associated with a disaster at the corporate offices or primary data center.
This option could ideally provide a 12‐hour Recovery Point Objective (RPO), 24‐hours in the worst case, and a Recovery
Time Objective (RTO) of 24 – 72 hours.
Redundant Backup Processes
ABC is currently performing weekly full and nightly incremental backups to tape, and is rotating tapes three times per
week to a secure, climate‐controlled offsite storage facility.  This process acts as a backup to the SAN replication process
and ensures the capability for a point‐in‐time restore in the event of an issue with the SAN or corruption of the virtual
machine data.  In the unlikely event of data corruption, this corruption could potentially be replicated to the DR site,
rendering both the primary data center and DR site virtual machine(s) unusable.


ABC is also performing virtual machine backups, using a separate process, further ensuring point‐in‐time recovery
capability as well as providing for an extremely fast Return to Production (RTP) in the event of a virtual machine failure.
A complete and current set of backup and tape rotation processes is detailed in the Backup Process Overview section of
this document.
Disaster Recovery Lock Boxes
To ensure that a current copy of this plan is available when a disaster occurs, procedures should be established to store
a copy of this plan, including passwords, other recovery procedures, software, documentation, etc. at multiple offsite
locations.
At minimum, three fireproof lock boxes should be purchased.  The contents of these lock boxes must remain identical;
when this DR plan, or any other specified content is updated, ALL of the lock boxes must be concurrently updated.
These lock boxes should be distributed as follows:
 DataLOK storage facility
 Head of IT office
 IT Manager office

As they shall contain passwords and other sensitive information, these lock boxes are to remain locked at all times.
Physical keys to the lock boxes should only be held by designated personnel within the organization, including:
 Head of IT – Howie Schechtman
 IT Manager  – David Park
 IT Supervising Analyst – Shawn Kim

The minimum recommended contents of these lock boxes should include:
 A complete, current printed copy of this disaster recovery plan, including all appendices & attachments
 A complete, current electronic copy of this disaster recovery plan, including all appendices & attachments,
on CD or DVD
 A sealed envelope containing, at least: root, administrator, network device, and Directory Services Restore
Mode passwords, plus any other application or database passwords required to complete a full restore

If  the seal is broken on any envelope containing these passwords, ALL of the passwords contained
within ALL lock box envelopes should be changed to prevent potential unauthorized access

 Microsoft Windows Server 2012 DVD
 Symantec Backup Exec 12.5 CD
 Symantec Backup Exec System Recovery CD
 VMware vSphere & ESXi 5.5 Installation CD
 Installation media for any other current or future software not listed above
 License keys, installation keys, serial numbers, or other codes for each of the OS or software products listed
 Any required keys or access cards for the primary data center and DR sites


IT Disaster Recovery Plan Overview
This plan uses a general yet prescriptive approach to recover from an outage or disaster that destroys or severely
damages some or all of the IT infrastructure at the ABC primary data center.
1. Mobilize Personnel
Immediately following the disaster, a planned sequence of events should begin.  Key personnel should be immediately
notified and the designated recovery teams should be mobilized to implement this plan.  Personnel are listed in the plan
under Appendix A – IT Management & Staff Contact List.  However, this plan can still be usable even if one or more key
IT and administrative personnel are unavailable.
2. Begin Damage Assessment & Salvage Operations at Disaster Site
Early efforts should be targeted at protecting and preserving any surviving computing equipment.  Any salvageable
equipment should be identified and moved to a clean, dry environment away from the disaster site where a complete
system assessment can be performed.
At the same time, a survey of the disaster scene should be done by appropriate IT, and emergency services personnel, if
necessary, to estimate the amount of time required to put the facility, utilities, and systems back into working order.
3. Determine Activation of Disaster Recovery Plan
Based on damage assessment, a decision should then be made whether to activate the disaster recovery plan.  If the
corporate office was also affected by, for example, a regional disaster, a decision to temporarily relocate corporate
office employees to a branch or other temporary work location should also be made.
4. Activate Disaster Recovery Site
Since failover to the DR site is not entirely automated, once the DR plan is activated, IT personnel should immediately
begin the process of bringing the DR site online, making any changes to the infrastructure as necessary to support the
movement of data processing activities to the DR site.
Depending upon the severity of the disaster, and the specific systems affected, it may become necessary for recovery
personnel to deviate from the plan; this point underscores the necessity to keep this plan up to date.
5. Restore Data from Backup
While SAN replication should keep the DR site systems mostly in sync with those in the primary data center, some of the
physical servers rely on alternate backup methods.  Depending on the severity of the disaster, and the specific systems
affected, it may become necessary to restore servers from one of the backups.  Individual application owners may need
to be involved at this point, so appropriate administrative/operations personnel, those considered subject matter
experts (SMEs) for their specific applications, should be assigned for each affected application to ensure the application
and data are restored properly.
6. Move Back to Restored Permanent Facility
When the utilities and infrastructure have been repaired and the primary data center is again ready for occupancy,
systems should be failed‐back to the primary data center.  The logistics of this process are out of scope for this plan and
should be detailed in a separate document.


Disaster Recovery Team
To function in an efficient manner and to allow independent tasks to proceed simultaneously, the recovery process will
be handled by several separate teams.  This plan calls for six teams that work together, but for which specific portions of
the recovery effort are assigned.
The Disaster Recovery Teams are:
 Recovery Management Team
 Damage Assessment & Salvage Team
 Technology Recovery Team
 Applications Recovery Team
 Operations Support Team
 Administrative Support Team
Disaster Recovery Team Responsibilities
As the recovery process gets underway, it is imperative that each of the recovery teams remain in close communication
and strive to work together to complete the recovery as quickly as possible.  The following section provides a brief
description of the responsibilities for each team.
1. Recovery Management Team
The Recovery Management Team is responsible for coordination of the entire recovery project and is comprised of
the following personnel:
 Recovery Manager
 Technical Coordinator
 Operations Support Coordinator
 Administrative Support Coordinator
The Recovery Manager is the leader of the Recovery Management Team and the overall recovery effort and has the
final authority regarding decisions during the recovery process.  Each of the remaining individuals will be the leader
of a specialized team, or teams, which will each address a portion of the recovery tasks.  As the recovery process
gets underway, there will likely be areas of overlap between teams and thus close communication will be required.
The Recovery Management Team will have regular meetings to provide for adequate communication between team
coordinators.
Each coordinator should schedule a meeting for members of his team well in advance of their first planned activities.
A first meeting agenda should include:
 Reviewing the current status of the recovery operation
 Emphasizing what the team's responsibilities are
 Making sure that members are aware of any changes to the original disaster recovery plan
 Assigning tasks to individual team members
 Establishing time and location for future team meetings
2. Damage Assessment & Salvage Team
The Damage Assessment Team will be led by the Technical Coordinator or his designate.  This team will be
responsible for
 Providing a detailed damage assessment of the disaster site, including utilities and infrastructure
 Providing an inventory of both salvageable and non‐salvageable equipment


 Managing the equipment salvage operations
Based on this assessment the Administrative Support Team can begin the process of acquiring replacement
equipment necessary to rebuild the IT infrastructure and systems once the disaster site is repaired and operational.
3. Technology Recovery Team
The Network Recovery Team will be led by the Technical Coordinator.  This team will be responsible for:
 Determining which specific infrastructures and systems are affected by the disaster
 Reviewing the recovery steps documented in this plan, making changes as necessary to address any specific
circumstances, and communicating these changes to the Recovery Manager
 Communicating with the Recovery Manager regarding the estimated scope and severity of the disaster
 Communicating with the Recovery Manager regarding the estimated time to return to production (RTP)
 Determining which hardware, software, and supplies will be needed to start the recovery process
 Bringing the disaster recovery site online
 Making any other necessary changes in the IT infrastructure to support RTP
4. Applications Recovery Team
The Applications Recovery Team will be led by the Technical Coordinator or his designate.  This is an optional team
to be assembled in the event that business application owners and/or subject matter experts (SMEs) need to be
involved in the recovery process.  Those considered SMEs for their specific applications should be assigned for each
affected application to ensure the application and data are restored properly.
5. Operations Support Team
The Operations Support Team will be led by the Operations Support Coordinator.  This team will be responsible for:
 Providing basic Help Desk services to provide phone support and recovery status information to end‐users
 Assisting the Damage Assessment & Salvage, Technology, and Applications Recovery teams, as required
6. Administrative Support Team
The Administrative Support Team will be led by the Administrative Support Coordinator.  This team will provide
administrative support to the other recovery teams as well as support to employees.  One of the most important
functions this team can provide is to handle the burden of administrative details so that IT staff can focus on the
recovery efforts.
Some of the anticipated team tasks include:
 Providing support for executing acquisition paperwork
 Assisting the Recovery Manager and individual team coordinators with determining availability of staff to
help in the recovery efforts
 Providing support to track time and expenses related to the disaster
 Coordinating food and sleeping arrangements for recovery staff, as required
 Providing or coordinating transportation and delivery services, as required
 Assisting in contracting with outside vendors for assistance in the recovery process, such as IT consulting
services, for the installation or recovery of computing or infrastructure‐related systems


Activating the Disaster Recovery Plan
1. Appointment of Recovery Manager
The first critical step is to appoint the Recovery Manager.  The person most appropriate for this position is ABC Head of
IT.  If the Head of IT is unavailable, the appointment should be made by the Head of Operations, Chief Information
Security Officer (CISO), or any available senior executive or member of the ABC Board of Directors.  This person must
have at least some exposure to Information Technology and must have signature authority for any expenditures
incurred during the recovery process.
2. Assemble Recovery Team
One of the Recovery Manager's first duties is to contact and assemble all available members of the recovery team.  The
Recovery Manager should also produce a list of any extra personnel, outside of the core recovery team, who can provide
additional assistance in the recovery process, if required.
3. Damage Assessment & Equipment Salvage
The Recovery Manager should immediately engage the Technical Coordinator so that he can begin the damage
assessment and the process of identifying and retrieving any salvageable electronic equipment.
4. Establish the Recovery Control Center
The Recovery Control Center is the location from which the disaster recovery process is coordinated.  The Recovery
Manager should designate where the Recovery Control Center is to be established.  The Recovery Control Center would
most likely be at a surviving branch location, or other suitable location near the disaster site.
5. Activating the Disaster Recovery Plan
The Recovery Manager sets the plan into motion.  Early steps to take are as follows:
a. The Recovery Manager should retrieve the Disaster Recovery Lock Box located at DataLOK, or from one of the
other two locations described in the previous section, and open it to obtain an up‐to‐date copy of the Disaster
Recovery Plan.  This plan is in printed form as well on CD‐ROM.  Copies of the plan should be made and handed
out at the first meeting of the Recovery Management Team.
b. The Recovery Manager should appoint the remaining members of the Recovery Management Team.  This should
be done in consultation with surviving members of the IT staff, and with management approval.
c. The Recovery Manager should call a meeting of the Recovery Management Team at the Recovery Control Center
or a designated alternate site.  The following agenda is suggested for this meeting:
 Each member of the team is to review the status of their respective areas of responsibility
 The Recovery Manager briefly reviews the Disaster Recovery Plan with the team
 Any adjustments to the Disaster Recovery Plan to accommodate special circumstances are to be discussed
and agreed upon
 Each member of the team is to review the makeup of their respective recovery teams; if individuals key to
one of the recovery teams is unavailable, the Recovery Manager is to assist in locating others who have the
skills and experience necessary, including contracting with IT vendors or other appropriate personnel
 The next meeting of the Recovery Management Team is scheduled; it is suggested that the team meet at
least once each day for the first week of the recovery process


d. The Administrative Coordinator should begin locating and/or acquiring basic office equipment for the recovery
control center, including:
 Office desks and chairs
 Telephones
 Laptop or desktop computers
 Printer
 Copier
 Fax machine


Damage Assessment & Equipment Salvage
This document contains information on procedures to be executed immediately following a disaster to gauge the scope
of the disaster and to preserve and protect any surviving IT resources at the disaster site.
1. Damage Assessment
This damage assessment is a preliminary one intended to establish an estimate of the extent of damage to the data
center facility.  The primary goals of this process are:
 Quickly determine if ABC can continue IT operations at the primary data center, or if failover to the DR site
will be required
 Determine the extent of damage to the primary data center facility, including the building structure, and
security, electrical, data, and cooling systems
 If possible, estimate of the amount of time required to repair the facility and its affected systems
 Determine the extent of damage to ABC servers and network equipment and what equipment, if any, can be
salvaged
2. Equipment Salvage
As soon as practical, all salvageable equipment and supplies need to be moved to a secure location.  Transportation
should be coordinated through the Administrative Coordinator to move the equipment to the Recovery Control Center,
or to another designated area, until equipment can be thoroughly inspected and tested.

Recovery personnel should take great care when moving equipment to avoid further damage
3. Inventory
As soon as possible, a complete inventory of all salvageable equipment should be created, along with estimates of when
the equipment will be ready again for use, in case that repairs or refurbishment is required.  This inventory list, along
with a separate list of equipment that was destroyed, should be delivered to the Technical and Administrative
Coordinators so procurement of replacement equipment can begin.


Backup Process Overview
ABC utilizes a combination of multiple, separate backup systems to backup the IT infrastructure, for both disaster as well
as point‐in‐time recovery.
1. SQL Server Agent
SQL Server Agent (SQLAGENT) is a component of Microsoft SQL Server, Standard or Enterprise editions, which runs as a
discrete Windows service and can be used by administrators to automate the execution of various support functions,
such as backups, restores, reporting, log shipping, etc.
SQLAGENT can also be used to provide alerting in response to specific events, such as SQL database engine failure,
performance issues, or when system resources (e.g. CPU, RAM) exceed a specific threshold.
ABC utilizes SQLAGENT to perform database‐level backups of several critical SQL Server machines, specifically:
 SQL1
 SQL2
 WEBSENSE
 PROFITS – located at the Rowland Heights branch
SQLAGENT backs up the SQL Server database files and logs to a dedicated share on BACKUP2 then flushes the
transaction logs.  A FULL backup is performed weekly at 11:00pm on Friday and a DIFFERENTIAL backup is performed
nightly at 11:00pm Monday through Thursday.
SQL Server database files on BACKUP2 are then backed up to tape, as detailed in the Symantec Backup Exec (BUE)
section below.
2. Symantec Backup Exec System Recovery (BESR)
Backup Exec System Recovery (BESR) is an image‐based backup tool that allows for rapid recovery of a failed system
with a machine snapshot, essentially a replica of the entire system, taken at a specific point in time.
ABC utilizes BESR to backup the following physical servers:
 All (11) branch servers, excluding corporate office branch
 All (7) servers located in the primary data center, excluding the VMware hosts:
 BACKUP2
 EXCHWEST
 NASCOLO
 OFFICER
 OWA
 VCENTER
 SERVER
 REPORTPC – located in the Treasury department at headquarters
BESR runs at 11:00pm Monday – Friday and takes a snapshot of the system, then copies that snapshot to a dedicated file
share on BACKUP2.
BESR data on BACKUP2 is then backed up to tape, as detailed in the Symantec Backup Exec (BUE) section below.
3. Nimble Replication
ABC utilizes a pair of Nimble CS240 SANs for storage for the VMware host servers, one located at the primary data
center, the other at the DR site.  VMware is scheduled to perform snapshots of all of the server virtual machines (VMs)
at 5:00am and 5:00pm daily; this process takes approximately 30 minutes and once completed, this triggers the Nimble


SANs to automatically replicate any block‐level changes from the primary data center to the DR site.  The replication
process takes approximately 1‐3 hours, depending upon the amount of data that has changed since the last snapshot.
The Nimble SAN retains approximately 14 days of snapshots on the SAN located at the primary data center and
approximately 45 days of snapshots on the SAN located at the DR site.  The Nimble SANs provide storage for the
following (29) VM guest machines:
 CANONAPP
 CANONOCR
 CANONSQL
 DC1
 DC2
 DEPCON
 FTP
 GLBA
 HELPDESK
 INTRANET
 IS
 MCAFEE
 MPS
 NESSUS
 PRINTSVR
 RDP
 SCENTER
 SOLARW
 SOPHOS
 SQL1
 SQL2
 TACACS
 TSGW
 TSOFT
 WDS
 WEBSENSE
 WEBTEST
 WSUS
 PROLOGUE
4. vSphere Data Protection (VDP)
VMware vSphere Data Protection (VDP) is a backup and recovery solution that provides agentless, disk‐based backup of
virtual machines, regardless of their power state.  Like with BESR, VDP takes a quiesced snapshot, essentially a replica of
the entire system taken at a specific point in time, and copies this to external, deduplicated storage.
Enterprise data is highly redundant, with identical files and data stored within and across systems.  VDP utilizes a
patented deduplication technology to eliminate redundancy at both the file and block level.
VDP also uses Changed Block Tracking, a VMware feature that enables VDP to only backup disk blocks that have changed
since the last backup. This greatly reduces the backup time and size of a given VM image and provides the ability to
process a large number of VMs within a limited backup window.
ABC utilizes VDP to perform backups of the critical VM guest machines in two separate processes; the first process runs
at 8:00pm nightly and backs up approximately half of the VMs to the Nimble SAN located at the primary data center, the
second process runs at 11:00pm nightly and backs up the remaining VMs to a HP SAN, also located at the data center.
VDP backups are retained for 66 weeks to allow for extended point‐in‐time restoration capability.
5. Symantec Backup Exec (BUE)
Symantec Backup Exec (BUE) is an agent‐based system for performing server backup and recovery.  Unlike BESR, BUE
performs file‐based backup and recoveries and in the ABC environment, these backups are ultimately written to tape.
BUE is installed on the BACKUP2 server located at the primary data center.  Given constraints with the backup window,
BUE actually first backs up target devices to backup server direct‐attached (internal) storage, and then a subsequent
process writes this data to a direct‐attached tape backup unit (TBU).  This TBU is manufactured by HP/Quantum and is
equipped with a 24‐tape magazine cartridge, an auto‐loader, and three LTO‐4 tape drives.
BUE is used to backup the following:
 BESR images on BACKUP2
 SQLAGENT database backups on BACKUP2
 EXCHWEST
 NASCOLO
 OFFICER
All backup tapes are encrypted using 128‐bit AES encryption.


6. Backup Schedule
ABC performs monthly FULL backups starting 6:00pm on the last Friday of each month.  The first backup phase is to disk
(on BACKUP2).  This FULL backup set is approximately 14TB in size and takes approximately 3 days to complete.
ABC also performs nightly INCREMENTAL backups, starting at 11:00pm Monday – Friday.  Given that the daily rate of
change is generally consistent, the INCREMENTAL backups generally complete in 3‐10 hours.
Once the backup‐to‐disk has completed, the second backup phase, disk‐to‐tape, begins.  Data is written to tape in 3
separate streams, one per tape drive.  In the case of the monthly FULL backup, this process completes in approximately
7 days, requires 18‐22 tapes, and the backup tape set is ready for off‐site transport the following Tuesday.  In the case of
the nightly INCREMENTAL backup, this process completes in approximately 24 hours, requires 1 tape, and is available for
offsite transport within 48 hours after the backup was started.
The ABC backup administrator (Phi Phong) travels to the data center three times per week to pickup backup tapes and
bring them to the corporate office.
7. Backup Tape Rotation & Retention Process
ABC utilizes the services of DataLOK for offsite tape transport and storage.
DataLOK comes to the corporate office three times per week, typically Monday, Wednesday, and Friday, to pickup the
most recent backup tape set(s) and to drop off any tape set(s) that are returning from rotation.
The current backup tape retention policy is for daily INCREMENTAL tape sets to remain offsite for 2 weeks and monthly
FULL tape sets to remain offsite for 24 months.
The ABC backup administrator maintains a log of all tapes that are stored offsite at DataLOK and this log is updated
anytime tapes are moved on or off‐site.
A catalog of the content of the backup tapes is maintained in BUE, and BUE provides proactive notification to support
the tape rotation process.


Backup Process – IT Core Services
Current data backup processes for specific IT core services are detailed below:
1. Network Device Configuration
ABC utilizes software, specifically Solarwinds Engineer’s Toolset (SET), to automatically backup and compare firewall and
switch configurations any time there is a configuration change.  These are also emailed, by SET, to Joseph Kim who
maintains a copy is his email archive.
2. Active Directory Domain Services
ABC utilizes a flat, single‐forest, single‐domain Active Directory (AD) structure.  ABC maintains 15 AD domain controllers
(DCs): DC1 and DC2 are virtual machines located at the primary data center, DC3 is a physical machine located at the
corporate offices, and DC4 is a virtual machine located at the DR site; the remaining 11 DCs are servers running in each
of the branches.  The FSMO (Flexible Single Master Replication) roles are shared among the first three “primary” DCs.
Active Directory is not backed up per se, however the AD environment at ABC is quite fault‐tolerant.  AD data is
automatically replicated between all domain controllers through AD’s built in replication mechanism; this process occurs
in near real‐time.  In the event of a failure of one of the DCs, users can authenticate against any other DC in the forest,
assuming there is network connectivity.
In the event of failure of one of the “primary” DCs, those holding the FSMO roles, an administrator would simply “seize”
the missing roles onto any one of the remaining DCs; a manual, but very simple process.
3. File Services
For user data storage, ABC utilizes an 18TB NAS (Network Attached Storage) server, running Microsoft Windows Storage
Server 2008 R2, located in the corporate offices (NASHQ).  The NAS server replicates in near real‐time with its peer, an
identical NAS server located in the primary data center (NASCOLO), using Microsoft Distributed File System (DFS).  DFS, a
feature of Windows Server, utilizes “remote differential compression” to transmit only block‐level changes and data
compression to reduce network traffic between DFS replicas.
The NAS provides file services for nearly all departments at the corporate offices plus some of the branches.  There is
currently approximately 7.5TB of data in total.
ABC maintains a server in each of the branches; all run Microsoft Windows Server 2008 R2 Standard and provide AD DC,
file, print, and DNS services.  User data storage varies anywhere between 500GB (Flushing) and 50GB (Milpitas) and is
backed up nightly to the backup server (BACKUP2) via BESR.
4. E‐Mail
ABC utilizes Microsoft Exchange Server 2007 Standard and maintains 2 separate mailbox servers; EXCHWEST is a physical
server located at the primary data center that serves the west coast corporate office and branches, EXCHEAST is a
physical server located at the Flushing branch that serves the east coast branches.
ABC utilizes Proofpoint, a cloud‐based service, for SPAM/virus protection, encryption, archiving, and other security‐
related functions.  In the event the Exchange server is offline or otherwise unreachable, Proofpoint will provide backup
queuing (store and forward) which will retain all received messages and then forward them to the Exchange server
when it comes back online, ensuring no data loss.
In the event of a disaster where Exchange Server recovery will be delayed, Proofpoint can also temporarily host a virtual
Exchange Server environment which would be accessible by ABC personnel through Outlook Web Access (OWA), a web‐
based interface.
EXCHWEST is backed up nightly via BESR to BACKUP2; EXCHEAST is backed up nightly via BESR to a Lacie NAS device in
the Flushing branch.


ABC has also recently deployed an Exchange DR server in the Tukwila DR site (EXCHDR) and uses Exchange Server
Standby Continuous Replication (SCR) to replicate data between EXCHWEST and EXCHDR.  In the event of a site or server
failure involving the primary Exchange server, the Exchange DR server can quickly be brought online and begin hosting
corporate email services with little loss of data.
5. Telecommunications
ABC utilizes a pair of Cisco UCS (Unified Computing System) BE6000 servers for VOIP services for the entire organization;
the Publisher server is located at the corporate offices, the Subscriber server is located at the primary data center.  The
Cisco UCS devices are not backed up, however the UCS server at the corporate offices replicates with its peer at the
primary data center in near real‐time, including replication of both configuration and voicemail data.
All Cisco VOIP phones connect to the Subscriber.  In the event of malfunction of the Subscriber, the Publisher would
become active and begin handling VOIP services; this failover happens automatically in the event the Subscriber fails.
In the event of a failure of both UCS servers, or the Sprint MPLS network, branches are equipped with at least 2 analog
lines and the Cisco IP phones and routers support the use of “Survivable Remote Site Telephony” (SRST) which allows for
basic VOIP services and automatic failover to the analog lines.
Additionally, there are 3 PRIs (Primary Rate Interface) which provide digital access to the Public Switched Telephone
Network (PSTN); these are located in the primary data center, the Cupertino branch, and the Flushing branch.  In the
event a PRI fails, connection to the PSTN will automatically failover to the next available PRI, in the order listed above.
Given this level of redundancy, and the SRST backup capability, no specific recovery procedures are necessary.


Backup Process – Line of Business Applications
1. Fiserv
Fiserv has its own Business Continuity Plan to provide for prompt response to a disaster.  They maintain two redundant
data centers, a primary site in Iowa and a backup site in Florida; site‐to‐site data replication occurs in real‐time.
ABC connects to Fiserv Core through a dedicated AT&T 3 Mbps circuit at the primary data center, and to Fiserv EFT
through a dedicated VPN connection over the 50 Mbps Sprint internet connection.  There is also a backup VPN
connection for Fiserv Core in the event the AT&T circuit fails; failover to the VPN connection happens automatically.
There is also a backup Fiserv router at the Tukwila DR site in the event of a failure at the primary data center.  Bringing
this backup router online is a manual process, which requires assistance from Fiserv technical personnel.
In the event of a failure of the MPLS network, all communications between the remote branches, corporate offices, and
the primary data center would be down so those 11 branches would be temporarily unable to perform any transactions.
However, unless there are other internet connectivity issues, the branch collocated at the corporate office would still be
able to perform transactions.
2. Fundtech
Fundtech has its own Business Continuity Plan to provide for prompt response to a disaster.  Fundtech maintains two
redundant data centers, a primary site in Atlanta, Georgia and a backup site in San Leandro, Northern California; site‐to‐
site replication happens in real‐time.
ABC connects to Fundtech through a dedicated AT&T 1.5Mbps circuit at the primary data center.  In the event of a
circuit failure, the connection will automatically failover to a VPN connection over the 50Mbps Sprint internet
connection.
In the event Fundtech is completely inaccessible, ABC also utilizes Fedline Advantage for wire transfers and maintains a
backup router in the San Gabriel branch, which connects to Fedline Advantage over a dedicated MegaPath 1.5Mbps
internet connection.
This system is set up to transmit wires, however wires are received at the Fundtech terminal. In order for Fedline
Advantage to receive wires, ABC must contact Federal Reserve to change the endpoint of the wires.  Only the IS
department and the wire department are authorized to change the recipient of the wires; the IT department is unable to
perform this process.
3. SWIFT
SWIFT has its own Business Continuity Plan and to provide for prompt response to a disaster.  SWIFT maintains two
redundant sites in The United States and Belgium; site‐to‐site replication happens in real‐time.
The SWIFT application is installed on SERVER, a physical machine located in the primary data center.  A backup server,
SWIFTDR, is located in the San Gabriel branch, and in the event of a failure at the primary data center, the SWIFT client
can be manually reconfigured, under “Instance Configuration”, to point to the backup server.
ABC connects to SWIFT through a dedicated AT&T 1.5Mbps circuit at the primary data center.  In the event of a circuit
failure, the connection will automatically failover to a VPN connection over the 50Mbps Sprint internet connection.
4. Prologue
The Prologue application is installed on a virtual machine (PROLOGUESVR) located in the primary data center, and is
backed up via both Nimble replication and VDP.  In the event of a server failure, the server can be restored from either a
Nimble or VDP snapshot.


5. Paragon
The Paragon application is installed on PROLOGUESVR and would use the same restore procedure as listed above.
6. Patriot Officer
The Patriot Officer application is installed on a physical machine (OFFICER) located in the primary data center, and is
backed up nightly via both BESR and BUE.  In the event of a server failure, the server can be restored from a BESR image
or from a BUE tape backup.
7. SQL Application Servers
Individual SQL databases on SQL1, SQL2, WEBSENSE, and PROFITS are backed up nightly via SQLAGENT.  SQL1, SQL2, and
WEBSENSE are also backed up via both Nimble Replication and VDP.  In the event of a database failure in an application,
the SQL database for that application can be restored from the SQLAGENT database backup.  In the event of a server
failure, the server can be restored from either a Nimble or VDP snapshot.


Backup Process – IT Support Services
1. Backup Server
Symantec Backup Exec 12.5 is installed on a physical machine (BACKUP2) located in the primary data center, and is
backed up via BESR.  Like the other BESR backups, BESR images a copied to the backup server then written to tape.
Since recovering from tape would require a functional backup server, the backup server image is also copied to an
attached USB hard drive.  In the event of a physical server failure, the server can be restored using a BESR recovery CD
and the BESR image on the attached USB drive.
2. System Center
Microsoft System Center Configuration Manager 2012 is installed on a virtual machine (SCENTER) located in the primary
data center, and is backed up via both Nimble replication and VDP.  In the event of a server failure, the server can be
restored from either a Nimble or VDP snapshot.
3. Sophos
Sophos is installed on a virtual machine (SOPHOS) located in the primary data center, and is backed up via both Nimble
replication and VDP.  In the event of a server failure, the server can be restored from either a Nimble or VDP snapshot.
4. McAfee
McAfee Data Protection is installed on a virtual machine (MCAFEE) located in the primary data center, and is backed up
via both Nimble replication and VDP.  In the event of a server failure, the server can be restored from either a Nimble or
VDP snapshot.
5. Solarwinds
SolarWinds Network Performance Monitor, Network Traffic Analyzer, and Server & Application Monitor are installed on
a virtual machine (SOLARW) located in the primary data center, and is backed up via both Nimble replication and VDP.
In the event of a server or application failure, the server can be restored from either a Nimble or VDP snapshot.
6. Helpdesk
ManageEngine Service Desk Plus 8.2 is installed on a virtual machine (HELPDESK) located in the primary data center, and
is backed up via both Nimble replication and VDP.  In the event of a server failure, the server can be restored from either
a Nimble or VDP snapshot.


Restore Procedures – IT Core Services
A moderate amount of expertise in Windows Server, Active Directory, and Microsoft Exchange systems administration is
required.  If appropriate ABC IT personnel are unavailable, an appropriate IT vendor should be contracted to perform the
required work.
1. Power
All IT disaster recovery efforts will require power.  The corporate offices, primary data center, and each branch site is
equipped with one or more Uninterruptable Power Supply (UPS) units and enough battery capacity to provide short
term AC power for 15 – 90 minutes, however full recovery will not be possible without working utilities.  ABC’s branches
and departments should develop offline procedures in each location to be implemented in the event of an extended
power outage.
2. Network Connectivity
All ABC branches have a 3G wireless modem as a backup to the Sprint MPLS circuit.  In the event of a failure of a branch
WAN circuit, execute the following steps to bring up the 3G wireless connection:
a. Check to see if Tunnel2000 interface is up; if it is then that means the 172.20.5.x network (primary colo data
center) is not visible to the branch
b. To force the modem to dial, ping the Cellular interface, this will trigger the 3G modem to activate and connect to
the Sprint MPLS network and the tunnel should then come up automatically
The Savvis DR site in Tukwila, Washington also has a Sprint WAN router and it can be activated to join the MPLS WAN.
To bring up the Tukwila DR site WAN router, execute the following steps:
a. Contact Sprint and request activation of the Tukwila, Washington Gi0/1 interface
b. Configure LAN IP addressing to use the 172.20.5.x subnet
c. Check to confirm MPLS routers can see the new 172.20.5.x network
d. Configure the router to redistribute static routes to BGP
e. Configure the router to advertise the default route (0.0.0.0)
f. For purposes of testing, set the router’s LAN to use the 172.20.50.x subnet so it will not conflict with the primary
data center LAN; to connect individual servers use the NAT command to translate from 172.20.50.x to
172.20.5.x hosts
3. Active Directory Domain Services
Unlike Active Directory user data, which is replicated between all AD domain controllers, FSMO (Flexible Single Master
Operations) roles can only reside on a single server and are currently held by the following AD domain controllers (DCs):
DC1
 PDC Emulator
 Schema Master
DC2
 Domain Naming Master
DC3
 RID Master
 Infrastructure Master


In the event of failure of one or more of the DCs, it may be necessary to “seize” the missing roles onto DC4 or one of the
other surviving servers.  To perform this, execute the following steps:
a. Logon to DC4 (or another DC) with an account that is a member of the Enterprise Administrators group
b. Click Start | Run then type “ntdsutil” in the Open field then click [OK]
c. Type “roles” then press [ENTER], this will take you to the fsmo maintenance prompt
d. Type “connections” and then press [ENTER], this will take you to the server connections prompt
e. Type “connect to server DC4” and then press [ENTER]
f. At the server connections prompt, type “q” and then press [ENTER], this will take you back to the fsmo
maintenance prompt
g. Type “transfer <role>” then press [ENTER], where <role> is the specific FSMO role that you want to
transfer, FSMO roles are: Schema, Domain Naming Master, RID Master, Infrastructure Master, and PDC;  repeat
the transfer <role> command for each role you wish to transfer
h. Once done transferring roles, at the fsmo maintenance prompt, type “q” then press [ENTER] to take you back to
the ntdsutil prompt, then type “q” and press [ENTER] to quit
Once the FSMO roles are seized from the failed servers, they cannot be reintroduced back into the AD domain;
they must be manually rebuilt and then promoted to a DC role, then the FSMO roles can be manually transferred
back to the rebuilt domain controllers
4. File Services
In the event of a regional disaster that causes a failure of the NAS systems in both the primary data center (El Segundo)
and the corporate office, there would be no file services available for the corporate office branch.  Surviving branch
locations would still have file services available on their local branch servers but none of these locations have a server
with enough storage capacity to restore data from the NAS systems, which is where the bulk of the corporate data is
stored.
As documented in the Observations & Recommendations section, ABC should consider deployment of a third NAS
device in the DR site then initiating DFS replication between all three NAS devices.
5. E‐Mail
In the event of a disaster where recovery of the Exchange Server will be delayed, Proofpoint can very quickly spin up a
virtual Exchange Server environment which would be accessible by ABC personnel through Outlook Web Access (OWA),
a web‐based interface.  To bring up this virtual Exchange Server environment, Proofpoint will require the encryption key;
this key is available for download from a secure Proofpoint server.
To bring up the temporary virtual Exchange Server environment, contact Proofpoint and provide them with the
encryption key; once the virtual environment is online, Proofpoint will provide the web site address and logon
credentials.  Once email services are temporarily hosted by Proofpoint, ABC IT staff can then determine the next
appropriate steps to recover the physical server or failover to the Exchange DR server.
To recover the physical Exchange Server (EXCHWEST) using BESR, use the procedure detailed in the section
Restore Procedures – Symantec Backup Exec System Recovery.
If it is going to be an extended period of time before the primary exchange server can come back online, and ABC prefers
to host email internally vs. using the Proofpoint provided OWA interface, the decision can be made to failover to the
Exchange DR server.
The failover process assumes the Exchange DR server has been setup and prepared according to the documentation on
TechNet, available here:  http://technet.microsoft.com/en‐us/library/bb676502(v=exchg.80).aspx.


The process to failover to the Exchange DR server, which in Microsoft parlance is the “SCR target”, is documented here:
http://technet.microsoft.com/en‐us/library/bb738132(v=exchg.80).aspx.
6. Telecommunications
In the event of a regional disaster that causes a failure of the Cisco UCS servers in both the primary data center (El
Segundo) and the corporate office, there would be no telecommunications services available for the corporate office
and branches, except for the very basic capabilities provided by SRST.
As documented in the Observations & Recommendations section, ABC should consider relocating the corporate office
Cisco UCS server to the DR site.


Restore Procedures – Business Application Services
1. Fiserv
There is a backup Fiserv router at the Tukwila DR site in the event of a failure at the primary data center.  Bringing this
backup router online is a manual process, which requires assistance from Fiserv technical personnel.
To bring this router online:
a. Call Fiserv and connect to their networking support department
b. Connect an internet accessible laptop to the router via the Cisco (blue) console cable; a Savvis engineer will
perform this step
c. The Fiserv tech will connect remotely to the laptop and configure the router
d. IT staff should advertise (router protocol) over the MPLS network so that data is properly routed
2. Fundtech
In order for Fedline Advantage to receive wires, ABC must contact Federal Reserve to change the endpoint of the wires.
Only the IS department and the wire department is authorized to change the recipient of the wires; the IT department
will be unable to redirect wires.
In the event you wish to manually route all traffic through the Fedline Advantage backup router in the San Gabriel
branch, execute the following procedure:
a. Log into the primary data center router (172.20.5.1) and delete the route from 172.20.5.249 to 170.209.0.0; this
will cause the route advertisement from San Gabriel to propagate
b. To revert, simply reenter the route
3. SWIFT
In the event of a failure on SERVER, the SWIFT client can be reconfigured to use SWIFTDR.  To reconfigure the client, IT
personnel or a knowledgeable end‐user would simply reconfigure the SWIFT client, under “Instance Configuration”, to
point to the backup server, SWIFTDR (172.20.20.8).
Once SERVER is back online, reconfigure the SWIFT client, under “Instance Configuration”, to point back to the primary
server, SERVER (172.20.5.72).


Restore Procedures – SQL Server Agent
ABC utilizes SQL Server Agent (SQLAGENT) to backup a number of SQL Server machines and databases, which essentially
just automates the execution of a series of T‐SQL scripts that perform the database backups.
The specific machines that can be recovered using SQLAGENT backups include the following:
 SQL1
 SQL2
 WEBSENSE
 ALLPROFITS – located at the Rowland Heights branch
The following is a generic T‐SQL restore script for a database in Full Recovery mode; detailed information and
supporting documentation regarding this process is available on Technet:
http://technet.microsoft.com/en‐us/library/ms175510.aspx.
This example restores a primary database, differential database, and transaction log backup for the MyAdvWorks
database from the backup device MyAdvWorks_1:
-- Assume the database is lost at this point. Now restore the full
-- database. Specify the original full database backup and NORECOVERY.
-- NORECOVERY allows subsequent restore operations to proceed.
RESTORE DATABASE MyAdvWorks
FROM MyAdvWorks_1
WITH NORECOVERY;
GO
-- Now restore the differential database backup, the second backup on
-- the MyAdvWorks_1 backup device.
RESTORE DATABASE MyAdvWorks
FROM MyAdvWorks_1
WITH FILE = 2,
NORECOVERY;
GO
-- Now restore each transaction log backup created after
-- the differential database backup.
-- Specify RECOVERY to indicate this is the last log file.
RESTORE LOG MyAdvWorks
FROM MyAdvWorks_log1
WITH NORECOVERY;
GO
RESTORE LOG MyAdvWorks
FROM MyAdvWorks_log2
WITH RECOVERY;
GO


Restore Procedures – Symantec Backup Exec System Recovery
ABC utilizes Symantec Backup Exec System Recovery (BESR) to backup physical servers.
The specific servers that can be recovered using BESR include the following:
 All (11) branch servers, excluding corporate office branch
 All (7) servers located in the primary data center, excluding the VMware hosts:
 BACKUP2
 EXCHWEST
 NASCOLO
 OFFICER
 OWA
 VCENTER
 SERVER
 REPORTPC – located in the Treasury department at headquarters
Depending on the nature and timing of the server failure, BESR can be used to perform a server recovery.
To recover the physical server, execute the following procedure:
a. Build machine and configure RAID storage volumes as appropriate for the application
b. Boot from the BESR Recovery CD
c. Start networking services and configure the IP address for the primary network adapter; once configured, test
connectivity to the LAN
d. Connect to BACKUP2, browse to the BESR Images folder for the server, then select the latest or desired server
image; this will begin the recovery process
e. In the case of restoring the BACKUP2 server, browse the attached USB drive for the BESR server image
f. Once the recovery process is complete, remove the BESR Recovery CD from the CDROM drive then reboot the
server


Restore Procedures – Nimble Snapshot
ABC uses Nimble snapshots to backup virtual machines; ABC retains approximately 14 days of snapshots on the SAN
located at the primary data center and approximately 45 days of snapshots on the SAN located at the DR site.
The specific servers that can be recovered using a Nimble snapshot include the following:
 CANONAPP
 CANONOCR
 CANONSQL
 DC1
 DC2
 DEPCON
 FTP
 GLBA
 HELPDESK
 INTRANET
 IS
 MCAFEE
 MPS
 NESSUS
 PRINTSVR
 RDP
 SCENTER
 SOLARW
 SOPHOS
 SQL1
 SQL2
 TACACS
 TSGW
 TSOFT
 WDS
 WEBSENSE
 WEBTEST
 WSUS
 PROLOGUE
To recover a virtual machine from a Nimble snapshot, execute the following procedure:
a. From a web browser, access the DR site Nimble Web Client – https://172.20.50.29/
b. On the logon page, enter an administrator password then click [Log In]
c. Goto Manage | Volumes
d. Select VMSAN1 or VMSAN2, as appropriate, then goto the Snapshot tab
e. Select the latest, or desired, snapshot from the list of available snapshots
f. Click [Clone]
g. Goto Home then Manage | Volumes
h. Select the cloned volume then click [Online]
i. Select Edit Volume then goto the Access tab
j. Click [Add] and select Unrestricted Access then click [Ok]
k. Launch VMware vSphere Client and logon to DRVCENTER
l. Select the VMware host server you wish to restore onto
m. Goto the Configuration and select Storage then click Rescan All…
n. Click Add Storage… then select Disk/LUN and click [Next]
o. When prompted, choose Assign New Signature then click [Ok]
p. Browse the new datastore and right click on the *.VMX file, select Add to Inventory
q. The new VM will appear in the inventory in a powered off state

When snapshots are restored in the DR environment, they will be configured on the VMNET5 VLAN.  At the
Tukwila DR site, VMNET5 is isolated and should be used for testing only.  To bring the restored server into the
production DR environment, set the VLAN to VMNET50.  Do not do a test restore onto the production DR
network, otherwise, once a test machine is powered up, it will cause IP address and hostname conflicts.


Restore Procedures – vSphere Data Protection
ABC uses vShpere Data Protection (VDP) to backup virtual machines; VDP backups are retained for 66 weeks to allow for
extended point‐in‐time restoration capability.
The specific servers that can be recovered using a VDP backup include the following:
 CANONAPP
 CANONOCR
 CANONSQL
 DC1
 DC2
 DEPCON
 FTP
 GLBA
 HELPDESK
 INTRANET
 IS
 MCAFEE
 MPS
 NESSUS
 PRINTSVR
 RDP
 SCENTER
 SOLARW
 SOPHOS
 SQL1
 SQL2
 TACACS
 TSGW
 TSOFT
 WDS
 WEBSENSE
 WEBTEST
 WSUS
 PROLOGUE
To recover a virtual machine from a VDP backup, execute the following procedure:
a. From a web browser, access the vSphere Web Client – https://drvcenter:9443/vsphere‐client/
b. On the Credentials page, enter an administrator username and password then click [Login]
c. On the vSphere Web Client, select vSphere Data Protection
d. On the Welcome to vSphere Data Protection page, select the appropriate VDP appliance then click [Connect]
e. Click the Restore tab then click the [Restore] button, this brings up the Restore Virtual Machines wizard
f. On the Select Backup page, specify the source from which to restore then click [Next]
g. If the VM has more than one backup point, deselect all except the point you want to restore
h. On the Set to Restore page, confirm that the client and backup restore point is correct
i. Select Restore to Original Location or, to restore to an alternate location for testing or other purposes, uncheck
the Restore to Original Location check box and specify the alternate Destination and Datastore
j. Click [Next] to confirm the selected options
k. On the Ready to Complete page, review the configuration then click [Finish] to begin the restore process
l. You can view the current restore process through the Recent Task pane

When backups are restored in the DR environment, they will be configured on the VMNET5 VLAN.  At the Tukwila
DR site, VMNET5 is isolated and should be used for testing only.  To bring the restored server into the production
DR environment, set the VLAN to VMNET50.  Do not do a test restore onto the production DR network,
otherwise, once a test machine is powered up, it will cause IP address and hostname conflicts.


Restore Procedures – Symantec Backup Exec
ABC utilizes Symantec Backup Exec (BUE) to backup data to tape, including BESR and SQLAGENT backup data, as well as
file system data from several physical servers.  Backups are performed nightly and backup tapes are stored offsite at a
secure DataLOK facility.  The current backup tape retention policy is for daily INCREMENTAL tape sets to remain offsite
for 2 weeks and monthly FULL tape sets to remain offsite for 24 months.
Backup and file system data that can be recovered from BUE tape backups include the following:
 BESR images on BACKUP2
 SQLAGENT database backups on BACKUP2
 EXCHWEST
 NASCOLO
 OFFICER
The following is a high‐level procedure which can be used to recover data from a BUE tape backup; detailed information
and supporting documentation regarding this process can be found in the Backup Exec 12.5 for Windows Servers –
Administrators Guide, available here:
http://kbdownload.symantec.com/resources/sites/BUSINESS/content/live/DOCUMENTATION/2000/DOC2284/en_U
S/BE12.5‐AdminGuide‐308400.pdf
a. Data recovery using this procedure first requires retrieval of the appropriate tapes from DataLOK.  This can be
determined by searching the backup catalog in the BUE console, or if the desired data set is beyond the BUE
backup catalog retention range, a manual search of the physical tape log should reveal a collection of tapes that
should contain the desired restore data.
b. Once tapes are retrieved and loaded into the tape backup unit (TBU), all tapes should be cataloged.  The catalog
process retrieves from each tape a media ID (label), date/time of the backup, type of backup, and specific
directories/files that were backed up.
c. Once the backup tapes have been cataloged, the catalog can be searched and specific files and/or folders can be
selected for restore.
d. Data can be restored to its original location, overwriting any data present, or data can be restored to an
alternate location.

Information Technology Disaster Recovery Guide - ABC Bank (redacted)

Information Technology Disaster Recovery Guide - ABC Bank (redacted)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Information Technology Disaster Recovery Guide - ABC Bank (redacted)

Similar to Information Technology Disaster Recovery Guide - ABC Bank (redacted) (20)

Information Technology Disaster Recovery Guide - ABC Bank (redacted)