Module 4 disaster recovery student slides ver 1.0

12/13/2011
1
Module 4
BC & DR Course [StudentSlides],Copyright @2011 Secrivacy.com,Part of Al-TaysirConsulting 1
 DR Planning Methodology
 Assess risk to IT and technology related
components within the IT environment
 Power Recovery
 Telecommunications Recovery
 Data Storage Recovery

12/13/2011
2
 A computer system or component designed so that, in the event that a
component fails, a backup component or procedure can immediately
take its place with no loss of service. Fault tolerance can be provided
with software, or embedded in hardware, or provided by some
combination.
 In the software implementation, the operating system provides an
interface that allows a programmer to "checkpoint" critical data at pre-
determined points within a transaction. In the hardware
implementation (for example, with Stratus and its VOS operating
system), the programmer does not need to be aware of the fault-
tolerant capabilities of the machine.
 At a hardware level, fault tolerance is achieved by duplexing each
hardware component. Disks are mirrored. Multiple processors are "lock-
stepped" together and their outputs are compared for correctness.
When an anomaly occurs, the faulty component is determined and
taken out of service, but the machine continues to function as usual.

12/13/2011
3
 is a backup operational mode in which the functions of a system component (such as a processor, server,
network, or database, for example) are assumed by secondary system components when the primary
component becomes unavailable through either failure or scheduled down time. Used to make systems
more fault-tolerant, failover is typically an integral part of mission-critical systems that must be constantly
available. The procedure involves automatically offloading tasks to a standby system component so that
the procedure is as seamless as possible to the end user. Failover can apply to any aspect of a system:
within an personal computer, for example, failover might be a mechanism to protect against a failed
processor; within a network, failover can apply to any network component or system of components,
such as a connection path, storage device, or Web server. Originally, stored data was connected to servers
in very basic configurations: either point-to-point or cross-coupled. In such an environment, the failure
(or even maintenance) of a single server frequently made data access impossible for a large number of
users until the server was back online. More recent developments, such as the storage area network
(SAN), make any-to-any connectivity possible among servers and data storage systems. In general, storage
networks use many paths - each consisting of complete sets of all the components involved - between the
server and the system. A failed path can result from the failure of any individual component of a path.
Multiple connection paths, each with redundant components, are used to help ensure that the
connection is still viable even if one (or more) paths fail. The capacity for automatic failover means that
normal functions can be maintained despite the inevitable interruptions caused by problems with
equipment.
 A system or component that is continuously operational for a desirably long length of
time. Availability can be measured relative to "100% operational" or "never failing." A widely-held but
difficult-to-achieve standard of availability for a system or product is known as "five 9s" (99.999 percent)
availability.
 Since a computer system or a network consists of many parts in which all parts usually need to be present
in order for the whole to be operational, much planning for high availability centers
around backup and failover processing and data storage and access.
 For storage, a redundant array of independent disks (RAID) is one approach. A more recent approach is
the storage area network (SAN).
 Some availability experts emphasize that, for any system to be highly available, the parts of a system
should be well-designed and thoroughly tested before they are used. For example, a new application
program that has not been thoroughly tested is likely to become a frequent point-of-breakdown in a
production system.

12/13/2011
4
 In a computer system, a cluster is a group of
servers and other resources that act like a single
system and enable high availability and, in some
cases, load balancing and parallel processing
 Also called an offline backup, is a database backup when
the database is offline and thus not accessible for
updating. This is the safest way to back up because it
avoids the risk of copying data that may be in the process
of being updated. However, a cold backup involves
downtime because users cannot use the database while it
is being backed up.
 When system downtime must be minimized, a hot backup
can provide an alternative to the cold backup. A hot
backup can be done even as users access the database,
but some method must be used to ensure that data being
updated is noted and can be copied when the update is
complete.

12/13/2011
5
 Cold server
 Backup server whose purpose is solely to be there in case
the main server is lost. The cold server is basically turned
on once to have software installed and configured, and
then is turned off until needed.
 Warm server
 Backup server that is turned on periodically to receive
updates from the server being backed up. Warm servers
are often used for replication and mirroring.
 Hot server
 Backup server that receives regular updates and is
standing by ready (on hot standby) to take over
immediately in the event of a failover.
 The process of making a replica (a copy) of
something. A replication is a copy.
 On the Internet, a Web site that has been
replicated in its entirety and put on another site
is called a mirror site.

12/13/2011
6
 Resilience
 The ability of an organization to absorb the impact of a business interruption, and
continue to provide a minimum acceptable level of service.
 Resilient
 The process and procedures required to maintain or recover critical services such as
“remote access” or “end-user support” during a business interruption.
 Response
 The reaction to an incident or emergency to assess the damage or impact and to
ascertain the level of containment and control activity required. In addition to
addressing matters of life safety and evacuation, Response also addresses the
policies, procedures and actions to be followed in the event of an emergency.
 Restoration
 Process of planning for and/or implementing procedures for the repair of hardware,
relocation of the primary site and its contents, and returning to normal operations at
the permanent operational location.
 Resumption
 The process of planning for and/or implementing the restarting of defined business
processes and operations following a disaster. This process commonly addresses the
most critical business functions within BIA specified timeframes.
 Or "five 9s”, refers to a desired percentage of availability of a given
computer system. Such a system would probably have what some refer
to as high availability.
 99.999 availability works out to 5.39 minutes of total downtime -
planned or unplanned - in a given year.
 In one view, there are three approaches to 99.999 (or even 100) percent
availability:
 Special systems that are designed for high availability, such as those from
Tandem (Compaq) and Stratus, in which components are duplicated so that
a backup component is always available. These tend to be expensive,
requiring redundancy of components in which the redundant component is
seldom used.
 Shared component systems that are built so that one active system can back
up another active system if one of them fails.
 Clustering, a variation of the second approach, in which matched
components do not necessarily have to be duplicates of each other

12/13/2011
7

12/13/2011
8
 Determine your needs:
 SMBs should take time to decide what critical information should be secured and
protected. Customer, financial and business information, trade secrets and critical
documents should be prioritized. In addition, SMBs should monitor industry reports
that help to identify and prevent threats that SMBs face.
 Engage trusted advisors:
 With limited time, budget and employees, SMBs can look to a solution provider to
help create plans, implement automated protection solutions and monitor for trends
and threats. They can also educate employees on retrieving information from
backups when needed and suggest offsite storage facilities to protect critical data.
 Automate where you can:
 Automating the backup process ensures that it is not overlooked. SMBs can reduce
the costs of downtime by implementing automated tools that minimize human
involvement and address other weaknesses in disaster recovery plans.
 Test annually:
 Recovering data is the worst time to learn that critical files were not backed up as
planned. Disaster recovery testing is invaluable and SMBs should seek to improve
the success of testing by evaluating and implementing testing methods that are non-
disruptive.

12/13/2011
9
 Tornados, Hurricanes, Strong winds
 Floods, Snowstorms
 Earthquakes, Electrical storms
 Fires, Subsidence and Landslides
 Freezing Conditions
 Contamination and Environmental
Hazards
 Epidemic
 Organized and/or Deliberate
Disruption
 Act of terrorism, Act of Sabotage
 Act of war
 Theft, Arson, Disgruntled employee.
 Labor Disputes/Industrial Action
 Loss of Utilities and Services
 Electrical power failure
 Loss of gas supply, Loss of water
supply
 Petroleum and oil shortage
 Communications services breakdown
 Loss of drainage / waste removal
 Equipment or System Failure,
Internal power failure, Air
conditioning failure
 Production line failure, Cooling plant
failure
 Equipment failure (excluding IT
hardware)
 Serious Information Security
Incidents
 Cyber crime, Loss of records or data
 Disclosure of sensitive information
 IT system failure
 Mergers and acquisitions
 Negative publicity
 Legal problems

12/13/2011
10
 Full disaster:
 This is after total destruction of all operations and production
systems. You will need all offsite resources to recover and this will be
a mandatory offsite recovery as in this case the workplace has been
destroyed.
 Partial disaster:
 Certain portions of the operational and production systems have
been destroyed or impeded. This will result in a partial recovery
taking place. This recovery may not necessitate an offsite recovery.
Partial disaster could comprise one non-critical machine going down
and restore occurring from backup or an online networked source.
 Minimal disaster:
 Only small non-critical portions of the operation environment have
been harmed. Virus outbreaks and file deletion may cause these
small disasters. These disasters can easily be recovered using
undelete software or backup tape restores.

12/13/2011
11
 Storing important data off-site and including information on
how to access those data in the disaster recovery plan
 Maintaining hard copies of important data (including the
disaster recovery plan)
 Maintaining current information regarding contacts and system
resources
 Employing malware removal programs
 Utilizing and frequently examining devices such as UPS, fire and
smoke sensors and alarms, and antitheft systems
 Updating compliance assessments (such as for Sarbanes-Oxley)
whenever changes are made to the IT infrastructure
 Documenting all preventive measures in the disaster recovery
plan
 Maintaining backup servers at various locations
 Conducting training sessions

12/13/2011
12
 Factors to consider when developing the plans
 Pre-disaster readiness
 Evacuation procedures
 Escalation procedures
 Circumstances under which a disaster should be declared
 Identification of plan responsibilities
 Identification of contract information
 Recovery option explanations
 Identification of resources for recovery and continued
operation of the organization
 Application of the constitution phase

12/13/2011
13
 Identify and assess the risks:
 Identify and list serious incidents that can affect the normal
operations of the organization. Prioritize the list according to severity
level.
 Prioritize business processes:
 In case resources are limited, use those resources efficiently to
provide an effective disaster recovery plan. Prioritize the business
processes that are:
 Most essential to the organization’s mission
 Most and least needed during a disaster
 Prioritize technology services:
 After determining critical business processes, map the processes to
the technology components that make those processes possible. This
information is useful in identifying critical technology environment
components and prioritizing each component accordingly.
 Define recovery strategy:
 The recovery time objective (RTO) is the acceptable amount of time for returning the
services or information availability to an organization after a disaster occurs. The
recovery point objective (RPO) is the amount of data loss that can be considered
acceptable when a disaster occurs. Implementing RTO and RPO for each critical
information system helps in developing a plan that reflects the priorities of the
organization. They also function as tools to check the success of the chosen
strategies when a disaster recovery plan is tested.
 Secure facilities:
 Having efficient technology facilities is useful when a disaster strikes. The technology
facilities should be constructed so that they are secured and protected from any
disruptive events. There are many inexpensive facility tools that can be used to
minimize disaster situations, including electrical surge protectors, power
conditioning units, and fire suppression systems.
 Identify alternate sites:
 Maintain an alternate site for temporarily relocating required systems and resources.
This site should be able to function when the primary location cannot. The alternate
site should contain accurate and current technology environment documentation.

12/13/2011
14
 Use redundancy and failover
 An extensive variety of technology solutions can be used for
maintaining application and data continuity when a disaster occurs.
Combining these technologies with a strategy of geographically
dispersed technology resources will help in protecting data.
 Document the plan
 The disaster recovery plan should be documented in sequential
milestones to allow the organization to return to normal operations.
The first milestone should document the process of dealing with the
immediate aftermath of the disaster. This includes notifying key
employees, emergency services, and others who are required to
respond to the disaster. The plan should then include resuming the
operation of the technology services based on the business process
priorities. The roles and responsibilities of all individuals should be
described in the plan. The disaster recovery team should be able to
access the plan even if the primary site is unavailable. Store multiple
copies of the plan off-site.
 Test the plan
 Perform regular tests on the disaster recovery plan in
order to ensure its effectiveness. The tests should be
performed on the complete plan process. A useful testing
technique is to develop a test scenario based on a disaster
situation. After completing the testing, review the results
with the team members to determine any possible
improvements and update the plan accordingly.
 Update the recovery plan
 To make the disaster recovery plan effective, keep it up to
date and applicable to current technology and business
processes. Changes made to the plan should be
communicated to all personnel affected by those changes.

12/13/2011
15
 Establish Disaster Recovery team.
 A disaster recovery team should consist of a team leader that will
drive the project and have authority over the process and influence
in the company’s managerial department. This will ratify the plan and
also ensure that policy is accepted.
 Document DR team and participant's contact details.
 This process is important from the standpoint that when the disaster
emerges everyone concerned in the recovery needs to be contacted.
This contact list should encompass suppliers, IT professionals,
Support staff, team leaders, senior members of staff and all
respective DR personnel.
 Establish a DR pack.
 A Disaster Recovery pack needs to be established with all of the
essential documentation and respective contact details of each DR
member.
 Formulate a plan.
 Establish what needs to be available.
 Starting with the physical network cable and connectivity and the way the
computers that you recover will communicate as well as the infrastructure
used and switching fabric:
▪ Computers
▪ Network
▪ Hardware
▪ Connectivity
▪ Mail
▪ Web
▪ Remote access
▪ VPN
▪ Dial-in
 Do not forget the non-IT related infrastructure that overlaps into IT’s area of
responsibility. Something like telephone lines and access controls can
become a disaster on their own if not adequately planned and provided for.
This is why it is vital to incorporate the whole business and get participation
of all relevant parties to ensure completeness of such a plan.

12/13/2011
16
 Ensure that reserve personnel are available.
 In the event of the organization's staff being harmed alternate staff needs to
be brought in to resolve the issues associated with the disaster. For this
particular reason it is important to have detailed and updated
documentation so that generic staff can restore the complex customized
systems. Some systems are highly dynamic and are incredibly challenging to
document and restore. Remember that moon landings and space science
has been documented. There is no reason for documentation to fail if the
system is acting as intended. There is no substitute for experience and for
this reason the reserve staff should be carefully selected.
 Make sure procedure documentation is updated.
 A document that is not updated can be considered useless as pertinent
information may be missing that is required to restore the environment. It
should be the responsibility of the DR team leader to ensure that the DRP
document and any annexure and any other pertinent information must be
updated. No compromises on this one, it is high risk and if this process fails
then your DRP is compromised.
 Make sure configuration management documents are updated.
 Configuration documentation is a fallback in case a change causes a disaster. This is
why the change control is filled out before the change is made and then signed off
after the change is made. This is the same reason that the changes are implemented
on a test system. Update all the documentation using track changes may help, or
saving a new version each time, or using version control solutions.
 Ensure a copy of all documentation is offsite and updated.
 All documentation must be sent offsite once it has been updated. When a disaster
happens it is highly likely that the documents will also be destroyed. It is therefore
important to ensure offsite storage of all of your DRP documentation. Ensure that
the documents are stored in a safe place that insures integrity and confidentiality of
these documents as they are likely to contain sensitive information.
 Ensure that all information is protected and stored in a secure location.
 A daily back-up of all data is taken to tape and stored offsite in the safe environment.
On a weekly basis, documentation should be updated and uploaded to the offsite
storage or sent physically offsite to the remote location. Please note that remote
locations refer to buildings that are not across the road. History has taught us that
towers across the road can also be destroyed.

12/13/2011
17
 Concepts
 Circle of disaster
 What data must be replicated?
 RPO and RTO
 Application throughput and response time
 Inter-site link latency
 Synchronous replication
 Asynchronous replication
 Inter-site link and bandwidth
 High availability and NSPOF inter-site links

12/13/2011
18
 The distance between the primary/local and
secondary/remote sites forms the radius of a circle
 While replicating data to another disk system within
your data center provides a certain level of
protection, it certainly does nothing if an
earthquake, hurricane, fire, or large-scale power
outage takes out the data center.
 To adequately protect data from these types of
regional disasters, organizations must replicate data
to a secondary site that is geographically distant
enough that any natural or man-made disaster
cannot take out both sites at the same time.
 The nature of the business,
regulations, and competitive
advantages dictate the size
of the circle of disaster.
 Some high-end multi-
national financial
institutions ensure that their
local and remote sites are
on two separate continents
resulting in a circle of
disaster that spans the
entire globe.
 So, distance between the
local and remote site is
critical to reducing risk.

12/13/2011
19
 In an ideal world, organizations would choose to
protect every bit of corporate data. However, such
an all-encompassing data protection strategy will be
prohibitively expensive.
 Therefore, businesses typically classify their
corporate data and prioritize their applications
based on their importance to the business, from
“business critical” to “not important.”
 Business-critical data should be replicated to protect
the business from disasters or accidental data loss.
 Data recovery solutions can move data at extreme distances. However,
the speed of light in fiber optic cables (5 microseconds per kilometer)
causes inherent delays, called latency.
 At extreme distances, latency is the limiting factor in replication
performance, regardless of bandwidth. Also, in real life, the quality of
the inter-site link also affects the latency. Other contributors to latency
are the number of hubs, switches, routers, and firewalls in the network
route. Hence, it is essential that the inter-site link provided by the Telco
vendor be tested to see what the “real” latency is. This should be clearly
specified in the service level agreement with your Telco vendor.
 There is a point of inter-site link latency after which an application
performance becomes unacceptable (in synchronous replication) or the
RPO objective is not met (in asynchronous replication). This point is
individually determined for every customer’s application environment
and business needs.

12/13/2011
20
 An RPO of zero signifies that synchronous replication is needed.
Synchronous replication offers the highest levels of data
protection. In this, the disks in both the local and remote disk
arrays are identical and concurrent at all times. Data is written
simultaneously to the mirrored cache of the local disk array
and the remote disk array, in real time, before the host
application I/Os are completed, thus ensuring the highest
possible data consistency.
 Hence, the local disk array waits for acknowledgment that the
replication write has reached the cache of the remote disk
array, before sending acknowledgment to the host. Size and
latency of the bandwidth play an important role in the
performance of the application with synchronous replication in
terms of response time. In general, synchronous replication
(and shorter RPOs) requires higher bandwidth.
 Asynchronous replication provides a faster I/O response time
to servers by sending a write completion back to the host
application upon receipt in the cache. RPO is usually greater
than zero using asynchronous replication. In this, the local disk
array acknowledges the write as complete to the host, as soon
as the write is in local cache of the storage array. Data at the
remote site will be consistent with the data at primary site, but
it will lag behind the data at primary site.
 Size and latency of the bandwidth do not play an important
role in the performance of the application with asynchronous
replication in terms of application response time but they
significantly affect the RPO of the customer data being
replicated. From a business perspective, the RPO provides a
tolerance for data loss. The lowest cost solution provides only
enough bandwidth to exactly meet this objective.

12/13/2011
21
 The location of the remote or secondary site determines the
inter-site link technologies that meet your distance and
performance requirements. Bandwidth is the amount of data
that can be transferred from one computer to another in a
certain amount of time. The following factors affect the
bandwidth you need:
 Distance
 Fibre Channel over IP (FCIP) supports the longest distances. You can replicate
data across continents using FCIP inter-site link technology.
 RPO
 With a short RPO, new data must be copied over to the remote disk array
quickly; hence, more bandwidth must be reserved for peak loading.
Applications that require an RPO of zero (no data loss after a site failure) must
replicate application data synchronously to meet this goal. Note that
synchronous replication mode requires more bandwidth than asynchronous
as shown in Figure 3.
 Performance
 When you cannot adjust the distance between sites, you may be able to
increase performance by increasing bandwidth. Data moves through the
inter-site links at the same speed of light. Because the bits are longer in
the low-bandwidth link (that is, they spend more time on the wire), the
data takes more time to unload than the data in the high-bandwidth link.
The same advantage applies to loading data into a high-bandwidth link
compared to a low-bandwidth link.
 Peak loads
 With synchronous replication, the inter-site link must accommodate the
peak write rate and throughput of your applications. For asynchronous
replication, the inter-site link must be sized up properly to meet the RPO
objectives of your business. Usually, a lower bandwidth is necessary with
asynchronous replication, but keep in mind, you can never achieve an
RPO of zero with it.

12/13/2011
22
 Data change rate
 An enterprise should determine the rate of change for the logical volumes being
replicated. This plays a critical role in determining the bandwidth required for long-
distance replication. For example, if the source volume has an average change rate
of 1 MB/s during peak times, the WAN will need to accommodate at least 1 MB/s. In
fact, to accommodate unusually heavy loads and growth, additional bandwidth
should be considered. However, if the data being replicated is highly compressible,
then one could take into consideration the compression capabilities of the FCIP SAN
extension hardware, to provision lesser bandwidth and thus save money on
recurring bandwidth expenses.
 Data normalization
 A frequently overlooked aspect of link sizing is normalization time. This is the time it
takes to bring the remote site up to date with the primary initially or to
resynchronize the primary after a site disaster. A couple of reasons why
normalization is important during bandwidth sizing:
 During the normalization period, availability is at risk until the remote site has a complete and up-
to-date copy of the data.
 Normalization can consume large portions of the available bandwidth causing degradation of
other applications using the common interconnect.
 It can take a long time for the initial copy to complete if the size of the primary data is large.

12/13/2011
23
 The business criticality of the data being replicated
dictates how available the remote data replication solution
must be. In the case of an active/active data center
arrangement, where applications run on servers in two or
more sites, a highly available system with considerable
redundancy might be required. In this customer scenario,
bidirectional disaster tolerance is usually required.
Synchronous replication is typically deployed with an
active/active system to meet the cluster’s requirements. In
the case of asynchronous replication, the availability
requirements may not be as stringent. Usually, this type of
an organization cannot afford the risk of a down backup
system. Hence, the expense of a high-availability
replication system is required.

12/13/2011
24
 A combination of preventive, detective and
corrective measures
 Selection depends upon:
 The criticality of the business process and the
applications supporting the processes
 Cost
 Time required to recover
 Security
 Based on the risk level identified for recovery would include
developing several types of offsite backup facilities:
 Redundant Array of Inexpensive Disks (RAID)
 Cold sites - Basic environment
 Warm sites - Partially equipped but lacking processing power
 Hot sites - Fully equipped facility
 Duplicate (redundant) information processing facility
 Mobile sites
 Reciprocal agreement
▪ Contract with hot, warm or cold site
▪ Procuring alternative hardware facilities
▪ Vendor or third-party
▪ Off-the-shelf
▪ Credit agreement or emergency credit cards

12/13/2011
25
 Provisions for use of third-party sites should
cover:
 Configurations
 Disaster
 Speed of availability
 Subscribers per site and area
 Preference
 Insurance
 Audit
 Reliability

12/13/2011
26
 Obtain continuing senior management support
 Make sure the plan reflects the importance of
communications to the business
 Define hardware, software and facility
requirements for business applications needs
 Identify the amount of time the business can
survive without communications
 Make sure primary servers and related systems
have backups (e.g. redundant CPUs, spare parts)
both on-site and at alternate locations

12/13/2011
27
 Test the spare components regularly to ensure they
will work when needed
 Make sure primary facilities and network systems
have backups available if primary network paths are
disabled
 Test contingency plan elements regularly; test the
entire plan at least once a year
 Document plan elements; establish plan updating
procedures and follow them regularly
 Train and retrain contingency plan members
 Never assume a network is 100% safe and secure.
 Use products from known manufacturers that offer
warranties and emergency recovery options
 Contact other users of the same products for their
experience
 Insist on service level agreements (SLAs) to ensure an
acceptable level of vendor performance, especially in case
of a network disruption or system failure
 Use installation and maintenance sources whose skills and
performance are well known, professional and dependable
 Install duplicate, or redundant, processing elements
where appropriate, to ensure uninterrupted processing
 Install backup power supplies, diversified and non-
overlapping cable routes
 Use quality parts and supplies, cables, connectors, etc.

12/13/2011
28
 Install and test equipment according to manufacturer
specifications
 Provide proper environment for equipment, e.g. raised floors,
proper temperature/humidity range, and sufficient power
 Provide proper equipment security to prevent damage, theft
or vandalism
 Follow building and construction codes
 Follow electric codes for wiring and electrical systems
 Invest in spare components, terminals, circuit boards; store these
in protected areas both on- and off-site
 Regularly install spare component in production systems to
ensure they work correctly; record the date the component was
last used
 Conduct regular tests of system performance, following
manufacturer’s recommended test and maintenance procedures.
 Maintain backup copies of all critical software:
operating systems, applications, utility programs,
databases
 Have multiple backup storage resources available,
both on-site and off-site
 Keep special databases as current as possible; make
sure backup copies are no more than one to three
days old, unless more recent updates are available
 Use proven software products for major systems,
rather than untested items
 Get references on vendor and product performance
from customers

12/13/2011
29
 Insist on service level agreements (SLAs) to ensure an
acceptable level of vendor performance, especially in case of
software failure
 Analyze software performance regularly; coordinate this with
vendor and/or distributor support
 Update software documentation regularly as changes come
online; update contingency plans as well
 Install software patches as soon as they are received;
implement a patch management capability
 Make sure backup copies of primary applications are the same
release level, or generic, as operating versions
 Make sure vendors have emergency backup copies of system
software and special programs available
 Make sure software can be used by the technical staff as well
as vendors.
 Identify and pursue (if appropriate) local access
alternative routing options
 Identify and pursue alternative routing options from
customer site to long distance operators
 Use multiple long-distance carriers if cost effective
 Use multiple local access providers and Internet
service providers (ISPs) if cost effective
 Identify carrier network routing paths; look for
possible overlapping transmission paths across
multiple carriers that could represent disaster risk
points or single points of failure

12/13/2011
30
 Mix transmission facilities, e.g. T1/E1 with SSL/VPN, to obtain best
overall price/performance
 Mix switched access and Internet-based services with dedicated circuits
to obtain hybrid configurations, spreading risk more evenly
 Use alternative transmission services, e.g. cellular, radio paging, two-
way radio; microwave, satellite where needed
 Deal with carriers who are committed to supporting customer
contingency plans; check with other users for their experiences and
input
 Insist on service level agreements (SLAs) to ensure an acceptable level
of carrier performance, especially in case of a network disruption
 Deal with carriers who have circuit assurance plans, a demonstrated
commitment to network survivability, and who have demonstrated a
desire to work with users
 Make sure physical layout of facility is conducive to
rapid movement of materials
 Make sure the facility uses fire-resistant construction
 Facility should have fire detection, suppression and
alarm connections to the local fire department
and/or other suitable incident response firms
 Facility should have moisture detection where
appropriate
 Facility should have proper temperature/humidity
monitoring and control
 Cables should be raised off true floor to avoid
damage from minor leaks

12/13/2011
31
 Facility should have regular cleaning of roof and floor voids
 Security and access control systems should be available and linked to
local police department or other appropriate organization
 Ensure availability of backup power for user systems, security, fire and
environmental systems
 Ensure convenient and rapid access to records within required recovery
time frames, e.g. storage facility is open on weekends, holidays, etc.
 Ensure the facility is not located in hazardous geographic or
infrastructure areas, e.g. those prone to periodic flooding, earthquakes,
power fluctuations, etc.
 Ensure availability of bonded transportation services
 Ensure storage firm flexibility to support various media types in addition
to magnetic media, such as printed matter, CDs and DVDs.
 Use incoming routing service arrangements from local and
long-distance operators
 Ensure that call centre systems, e.g. automatic call
distributors (ACDs) and interactive voice response (IVR),
have redundant components, backup power and backup
copies of the system database
 If your firm has more than one call centre, configure
network services to easily route incoming calls from a
disabled system to working call centers
 Ensure availability of alternate call centre staff (e.g.
using temporary placement firms) in an emergency

12/13/2011
32
 Arrange for call centre staff to work at home if access to
call centre is denied
 Investigate carrier-based ACD call routing services that
can supplement premises- based ACD systems; these
can be configured to match existing call routing vectors,
skills-based call routing and other call centre parameters
 Arrange for rerouting of incoming calls to call centre
staff working at home
 If using computer telephony integration (CTI) as part of
the call centre, ensure that CTI hardware and software
are backed up, and emergency copies are stored in a
secure location
 Document system and application configurations
 Standardize hardware, software and peripherals
 Provide guidance on backing up data
 Use uninterruptible power supplies
 Ensure interoperability among components
 Coordinate with security policies and controls
 Backup data and store offsite
 Backup applications and store offsite
 Use alternate hard drives
 Image disks
 Implement redundancy in critical system
components

12/13/2011
33
 Document system and application configurations
 Standardize hardware, software, and peripherals
 Ensure interoperability among components
 Back up data and store offsite
 Back up applications and store offsite
 Implement redundancy in critical system components
 Implement fault tolerance in critical system components
 Replicate data
 Implement storage solutions
 Document web site
 Code, program, and document web site properly
 Consider contingencies of supporting
infrastructure
 Implement load balancing
 Coordinate with incident response procedures

12/13/2011
34
 Document LAN
 Coordinate with vendors
 Identify single points of failure implement
redundancy in critical components
 Monitor LAN
 Integrate remote access and WLAN
 Document WAN
 Identify single points of failure
 Install redundancy in critical components
 Institute SLA’s

12/13/2011
35
 Back up data and store offsite
 Document system
 Implement redundancy and fault tolerance in critical
system components
 Consider hot site or reciprocal agreement
 Institute vendor SLA’s
 Replicate data
 Implement storage solutions
 In addition to the operational strategies being discussed,
some further strategies for building a robust
communication environment are recommended:
 Protect all aspects of your communications infrastructure, not
just the network or hardware/software elements; be sure to
include security, HVAC, lighting, alarms and environmental
control
 Build your infrastructure according to industry standards,
 Consider mutual aid arrangements in which you can utilize
network resources from other companies in emergencies
 Establish emergency arrangements with all equipment
vendors, network service providers and other key suppliers.

12/13/2011
36
 Networking and communications contingency planning
and disaster recovery products are generally designed to
provide the following:
 Alternative sources of power
 Alternative communications paths
 Fire and smoke suppression
 Backup for critical computer/communications applications
 Testing and diagnostics of critical network elements
 Recovery of phone systems by redirecting service to secure
alternate locations
 Rapid replacement of failed or damaged hardware
components
 Rapid repair or replacement of damaged transmission circuits.
 Offsite library controls
 Security and control of offsite facilities
 Media and documentation backup
 Periodic backup procedures
 Frequency of rotation
 Types of media and documentation rotated
 Record keeping for offsite storage
 Business continuity management best practices
 Backup Schemes: Full, Incremental and Differential
 Methods of rotation
 Record keeping of offsite storage

12/13/2011
37
 The plan should contain key information about the
organization's insurance.
 The IS processing insurance policy is usually a
multiperil policy designed to provide various types
of IS coverage. It should be constructed in modules
so it can be adapted to the insured's particular IS
environment.
 Most insurance covers only financial losses based on
the historical level of performance and not the
existing level of performance.
 Also, insurance does not compensate for loss of
image/goodwill.
 IS equipment and facilities
 Provides coverage about physical damage to the IPF and owned equipment. (Insurance of leased
equipment should be obtained when the lessee is responsible for hazard coverage.) The IS auditor is
cautioned to review these policies since many policies obligate insurance vendors to replace non-
restorable equipment only with "like kind and quality," not necessarily with new equipment by the
same vendor as the damaged equipment.
 Media (software) reconstruction
 Covers damage to IS media that is the property of the insured and for which the insured may be
liable. Insurance is available for on-premises, off-premises or in transit situations and covers the
actual reproduction cost of the property. Considerations in determining the amount of coverage
needed are programming costs to reproduce the media damaged, backup expenses and physical
replacement of media devices such as tapes, cartridges and disks.
 Extra expense
 Designed to cover the extra costs of continuing operations following damage or destruction at the IPF.
The amount of extra-expense insurance needed is based on the availability and cost of backup
facilities and operations. Extra expense can also cover the loss of net profits caused by computer
media damage. This provides reimbursement for monetary losses resulting from suspension of
operations due to the physical loss of equipment or media. An example of a situation requiring this
type of coverage is if the information processing facilities were on the sixth floor and the first five
floors were burned out. In this case, operations would be interrupted even though the IPF remained
unaffected.

12/13/2011
38
 Business interruption
 Covers the loss of profit due to the disruption of the activity of
the company caused by any malfunction of the IS organization
 Valuable papers and records
 Covers the actual cash value of papers and records (not defined
as media) on the insured's premises against direct physical loss
or damage
 Errors and omissions
 Provides legal liability protection in the event that the
professional practitioner commits an act, error or omission that
results in financial loss to a client. This insurance was originally
designed for service bureaus but it is now available from
several insurance companies for protecting systems analysts,
software designers, programmers, consultants and other IS
personnel.
 Fidelity coverage
 Usually takes the form of bankers blanket bonds, excess fidelity
insurance and commercial blanket bonds, and covers loss from
dishonest or fraudulent acts by employees. This type of
coverage is prevalent in financial institutions, operating their
own IPF.
 Media transportation
 Provides coverage for potential loss or damage to media in
transit to off premises IPFs. Transit coverage wording in the
policy usually specifies that all documents must be filmed or
otherwise copied. When the policy does not state specifically
that data be filmed prior to being transported and the work is
not filmed, management should obtain from the insurance
carrier a letter that specifically describes the carrier's position
and coverage in the event data are destroyed.

12/13/2011
39
 Redundancy
 Alternative Routing
 Diverse Routing
 Long Haul Network Diversity
 Last Mile Circuit Protection
 Voice Recovery

12/13/2011
40
 Involves a variety of solutions, including:
 Providing extra capacity with a plan to use the surplus capacity
should the normal primary transmission capability not be available. In
the case of a LAN, a second cable could be installed through an
alternate route for use in the event the primary cable is damaged.
 Providing multiple paths between routers
 Dynamic routing protocols, such as Open Shortest Path First (OSPF)
and Enhanced Interior Gateway Routing Protocol (EIGRP)
 Providing for fail over devices to avoid single point of failures in
routers, switches, firewalls, etc.
 Saving configuration files for recovery in the event that network
devices, such as those for routers and switches, fail. For example,
organizations should utilize Trivial File Transport Protocol (TFTP)
servers. Most network devices support TFTP for saving and retrieving
configuration information.
 The method of routing information via an alternate medium such as
copper cable or fiber optics. This involves use of different networks,
circuits or end points should the normal network be unavailable. Most
local carriers are deploying counter-rotating, fiber-optic rings. These
rings have fiber-optic cables that transmit information in two different
directions and in separate cable sheaths for increased protection.
 Currently, these rings connect through one central switching office.
 However, future expansion of the rings may incorporate a second
central office in the circuit. Some carriers are offering alternate routes
to different points of presence or alternate central offices. Other
examples include a dial-up circuit as an alternative to dedicated circuits;
cellular phone and microwave communication as alternatives to land
circuits; and couriers as an alternative to electronic transmissions.

12/13/2011
41
 The method of routing traffic through split cable facilities or duplicate cable
facilities.
 This can be accomplished with different and/or duplicate cable sheaths. If
different cable sheaths are used, the cable may be in the same conduit and,
therefore, subject to the same interruptions as the cable it is backing up.
 The communication service subscriber can duplicate the facilities by having
alternate routes, although the entrance to and from the customer premises
may be in the same conduit. The subscriber can obtain diverse routing and
alternate routing from the local carrier, including dual entrance facilities.
However, acquiring this type of access is time-consuming and costly. Most
carriers provide facilities for alternate and diverse routing, although the
majority of services are transmitted over terrestrial media. These cable
facilities are usually located in the ground or basement. Ground-based facilities
are at great risk due to the aging infrastructures of cities. In addition, cable-
based facilities usually share room with mechanical and electrical systems that
can impose great risks due to human error and disastrous events.
 Many vendors of recovery facilities have
provided diverse long-distance network
availability, utilizing T1 circuits among the major
long distance carriers.
 This ensures long-distance access should any
single carrier experience a network failure.
 Several of tile major carriers now have installed
automatic rerouting software and redundant
lines that provide instantaneous recovery should
a break in their lines occur

12/13/2011
42
 Many recovery facilities provide a redundant
combination of local carrier T1’s or E1’s,
microwave, and/or coaxial cable access to the
local communications loop.
 This enables the facility to have access during a
local carrier communication disaster.
 Alternate local carrier routing also is utilized.

12/13/2011
43
 With many service, financial and retail industries
dependent on voice communication, redundant
cabling and Voice-over Internet Protocol (VoIP)
are common approaches to deal with it.

12/13/2011
44

12/13/2011
45

12/13/2011
46

12/13/2011
47
 Network Attached Storage (NAS)
 Direct Attached Storage (DAS)
 Storage Area Networks (SAN)
 Secure storage area networks with iSCSI
 dedicated, hard disk–
based storage
technology. It is
attached directly to a
computer network,
providing data access
to network clients using
a client-server design.

12/13/2011
48
 Security
 A properly implemented NAS system offers a level of data security. Most NAS implementations
are based on the Linux OS, making them less vulnerable to viruses and other malware when
compared to Windows-based systems.
 Power consumption
 NAS systems are energy efficient. In the case of a power failure, NAS can shut down the hard
disk drives and remain idle. The power utilization of NAS, depending on how many hard disk
drives are included, is about 5 W to 20 W.
 Network access
 Network storage restricts unwanted or unauthorized network communications to the Internet.
It is possible to set up a home page using NAS, providing a Web server with DDNS (Dynamic
DNS). In addition to regular Web content, it can be used to access cameras as a remote
surveillance server.
 Larger storage capacity
 NAS was originally designed to offer larger storage capacity than existing storage media.
However, with increasing growth in the storage market, NAS capacity looks smaller than it used
to.
 NAS hardware platform
 The present NAS model consists of SATA-II slots, USB 2.0 high-speed host ports, and Gigabit
Ethernet. NAS looks similar to a regular PC, without display and input devices. NAS typically
uses a RISC-based embedded application computer or x86 PC

12/13/2011
49
 NAS Disk-As-Disk
 A NAS disk-as-disk target is a disk array that stores the disk behind a NAS head,
creating a single shared volume.
 This type of system is easier to maintain than traditional disk arrays.
 A disk-as-disk target provides an inexpensive method for backing up the disk and
provides additional benefits when used with traditional backup systems. Disk-as-disk
systems require a SAN or NAS unit. SAN units are more powerful but are also more
difficult to maintain and share; they will be discussed later in this chapter.
 Scalable NAS
 Scalable NAS is a storage system that accommodates file-based content that is
always growing and must always be available. Advantages of scalable NAS include:
▪ Scalability: Scalable systems provide more computing power and storage capacity when necessary.
▪ Manageability: Newly added content and devices can be managed efficiently.
▪ Affordability: Scalable systems have reduced cost of ownership and reduced administrative
expenses due to their ease of management.
 A direct attached storage (DAS) system is directly attached
to a single host computer or server. When it is attached to
a server, network workstations must access that server to
connect to the storage device.
 DAS can use one of many types of drives, including ATA,
SATA, SCSI, SAS, and Fibre Channel.
 The main alternative is storage area network (SAN)
 DAS is an inexpensive storage system for small- and
medium-sized businesses. Small organizations use DAS for
file transfer and e-mail, while larger organizations are
more likely to use DAS in mixed storage environments such
as those that also use NAS and SAN. Organizations that
begin with DAS but later switch to networked solutions
can use DAS to store less critical data.

12/13/2011
50
 High-speed sub-network used to transfer large amounts of
data
 SAN is connected to several data storage devices
containing disks for data storage.
 SAN supports data storage, data recovery, and data
duplication for enterprise networks via high-end servers,
multiple disk arrays, and Fibre Channel interconnection
technology.
 Provides an interface between storage devices that
enables systems to access data backups as if they were
available locally.
 SAN architecture, consists of links from the storage system
to the user, servers, and network equipment.

12/13/2011
51
 Storage consolidation reduces cost
 Storage or server can be easily added without
interruption
 Data are backed up and restored quickly
 High-performance interface, over 100 Mbps
 Supports server clusters of eight or more servers
acting as a single reliable system
 Disaster tolerance
 Reduced cost of ownership

12/13/2011
52
 SAN is a devoted network that is connected via multiple
gigabit Fibre Channel switches and host bus adapters,
while NAS is directly connected to a network via TCP/IP
using Ethernet CAT 5 cables.
 Because NAS runs on a TCP/IP network, it is subject to
latency and broadcast storms. It is in contention with
other users and network devices for bandwidth. SAN does
not have this issue.
 SAN is highly secure thanks to its zoning and logical unit
numbers, while NAS is not very secure, typically using
access control lists for security.
 While NAS does support RAID levels, two different levels
cannot be mixed in the same device.

12/13/2011
53
 Internet Small Computer System Interface (iSCSI) is an IP-
based storage networking standard for linking data storage
facilities.
 Used to handle data storage over large expanses. Because
it is IP-based, data can be transmitted over the Internet,
LANs, or WANs.
 The protocol architecture is based on the client-server
model when the devices are present in close proximity and
connected using SCSI buses.
 The main function of iSCSI is encapsulation and reliable
delivery of data.
 This protocol provides a method for encapsulating SCSI
commands over an IP network and operates on top of
TCP/IP.

12/13/2011
54
 Host adapter–based security
 Security measures for the Fibre Channel host bus adapter
can be implemented at the driver level.
 Switch zoning
 In a switch-based Fibre Channel SAN, switch zoning refers
to the masking of all the nodes connected to the switch.
 Storage-controller mapping
 Some storage subsystems accomplish their “Logical Unit
Numbers” (LUN) masking in their storage by mapping all
host adapters against LUNs in the storage system.
 Software measures
 SAN security can be implemented using software tools to
control access to data and maintain its reliability.
 The storage area network may be vulnerable to risk
because it stores and transfers critical data. There are
different levels of threats faced by the SAN:
 Level one: These types of threats are unintentional and
common in workplaces. They may result in downtime and loss
of revenue. These threats can be prevented by administrators.
 Level two: These types of threats are simple malicious attacks
using existing equipment and easily obtained information.
Preventive measures used for level-one threats are also used
for these types of threats.
 Level three: These types of threats are large-scale attacks,
coming from skilled attackers using uncommon equipment.
Level-three attacks are difficult to prevent.

12/13/2011
55
PROS
 Better disk utilization
 Fast and extensive disaster
recovery
 Better availability for
applications
 Faster backup of large
amounts of data
CONS
 Installing an effective SAN is
expensive
 Increased administration cost
 Impractical for use with a
single application
 Requires a fast WAN
connection, which may be
costly

12/13/2011
56
 Data can be lost due to any number of reasons, including:
 Hardware failure
 Theft
 Data corruption
 Malicious attack
 Power outage
 Fire
 Flooding
 Virus or worm
 Human error
 Backups can be used to access older versions of files, in case changes to
files cause undesired effects. Backups can also reduce the amount of IT
resources required to maintain an application, if the organization
decides to archive older records and only work with current ones.
 Back up often and wisely
 The most effective thing to do is back up data on a daily basis, but this can be costly
and time-consuming. For the average business, the percentage of data that changes
daily is between 2% and 5%, so it can save significant time by only backing up those
changes.
 Prioritize data for disaster recovery
 An organization should prioritize each system and its related data, including e-mail,
telephones, databases, file servers, and Web servers. Typically, systems are
prioritized into three categories: redundant (required immediately), highly available
(minutes to hours), and backed up (four hours to days).
 Archive important data for the long term
 Depending on federal and state regulations, data must be retained for between
seven and 17 years. Older data should be stored in a separate physical storage
location.
 Some businesses will choose a full-service company that picks up, stores, and
delivers the data when it is needed.
 Store data cost-effectively
 Most small-to-midsize businesses do not have available IT resources to set up and
manage a storage solution. These businesses may wish to purchase an integrated
solution. The up-front cost may be a bit more, but in the long run, the time, money,
and effort spent on a custom solution will be far greater.

12/13/2011
57
 While developing a good backup strategy, the organization must first
determine what data backup platform (hardware) is best for protecting
the data. Tape backup systems are least expensive and are commonly
used to back up large amounts of data. This technology only requires
user intervention to physically change tapes; there are some
technologies to automatically rotate tapes, allowing little chance of
human error.
 Organizations must next determine how much data must be protected
and then choose a backup device suited for that amount. DDS (Digital
Data Storage) tape backup systems are normally used in small
organizations with less than 10 GB of data to protect. These systems are
also best suited for small/home offices because they are economical
and considered reliable. On the other hand, DLT (Digital Linear Tape)
systems are best suited for larger businesses because they contain great
data storage capacity. These systems contain automatic rotation
systems to spread data across multiple high-capacity tapes.
 Full
 Stores a copy of all data to a tape backup, regardless of whether the
data have been modified since the last backup was performed. This
changes the archive property bit of the file from 1 to 0, indicating
that the file has been backed up.
 Differential
 Backs up every file on the drive that has been added or changed
since the last full backup.
 This does not change the archive property bit of the file from 1 to 0,
indicating that the file has changed since the last full backup. Thus,
the file will be backed up each time a differential backup is
performed.
 Compared to an incremental backup, it takes more time to run each
differential backup and requires more space for each one, but a full
restore operation requires only the last full backup and the last
differential backup.

12/13/2011
58
 Incremental
 Backs up files that have been modified since the last
backup operation.
 This changes the archive property bit of the file from 1
to 0, indicating that the file has now been backed up
and will not be backed up on the next incremental
backup.
 Compared to a differential backup, it takes less time to
run each incremental backup and requires less space
for each backup operation, but would require more
time to restore the system, as the last full backup and
each sequential incremental backup must be used.
 The final step is to develop a rotation scheme and
decide where to store the backup media. Many
organizations store the physical backup media within
about three feet of the server. This means that,
should a disaster strike the server, it is likely to affect
the backups as well. The best course of action is to
place the physical media off-site, but still close
enough to access it when needed.
 Many organizations forget to secure backup data,
which makes backups a common target for attackers.
 Off-site storage must be sufficiently secure.
Organizations must always encrypt backup tapes.

12/13/2011
59
 A successful backup strategy should meet the
following criteria:
 Off-site backup
 Scheduled backup
 Daily notifications
 Sufficient space
 Data availability at all times
 Adequate security
 Guarantee from provider
 Tested regularly

12/13/2011
60
 Fault tolerance is achieved by mirroring or parity.
Mirroring is 100% duplication of the data on two
drives (RAID 1).
 Parity is used to calculate the data in two drives and
store the results on a third drive (RAID 5). After a
failed drive is replaced, the RAID controller
automatically rebuilds the lost data from the other
two drives.
 RAID systems may have a spare drive (hot spare)
ready and waiting to be the replacement for a drive
that fails.

12/13/2011
61

12/13/2011
62
 Disk mirroring involves creating an exact bit-by-bit copy of all
data on a physical disk drive. The mirrored disks are stored off-
site and kept synchronized. This way, if the primary disk fails,
important data can be accessed from the other disk. Disk
mirroring can be done in two ways:
 Synchronous mirroring: The disk is updated on every write request,
which can affect application performance.
 Asynchronous mirroring: Multiple changes to the primary disks are
reflected in the secondary mirrored disk at predetermined intervals,
which does not require an uninterrupted high-bandwidth connection.
 Disk mirroring has a few drawbacks. If a file is deleted from the
primary disk, it is also deleted from the secondary disk. Also,
any effects from viruses or data theft will be synchronized.
Establishing a disk mirroring infrastructure may require
additional resources and continuous maintenance.
 A storage snapshot contains a set of reference
markers that point to data stored on a disk drive, on
a tape, or in a storage area network (SAN). It
streamlines access to stored data and hastens the
data recovery process.
 There are two main types of snapshots:
 Copy-on-write snapshot: Creates a snapshot of changes or
modifications to stored data each time new data are
entered or existing data are updated
 Split-mirror snapshot: Physically clones a storage entity at
a regular interval, allowing offline access and making it
simple to recover data

12/13/2011
63
 Also known as continuous backup or
synchronous mirroring, involves backing up data
by automatically saving a copy of every change
made to those data.
 This creates an electronic record of storage
snapshots, with one storage snapshot for every
instant that data modification occurs, allowing
the administrator to restore data to any point in
time.
 Parity protection involves creating a parity disk
from all the available disks in the array. If any disk
in the array fails, the parity disk can be used to
recover the data from the failed disk. Parity
protection represents a low-cost and low-
maintenance mirroring infrastructure, but if two
drives fail simultaneously, then the data will be
lost completely. Also, any threat that affects one
disk could also affect the parity disk.

12/13/2011
64
 The following are two backup schedules:
 Intraday data protection:
 In an intraday data protection system, data are backed up several
times during the day. The data can be copied onto the same disk or
onto a remote disk.
 Backup strategies using intraday data protection include:
▪ Snapshots
▪ Application dumps (where application data are backed up every few hours)
▪ Continuous data protection
 Weekend and nightly backups:
 There are three types of weekend and nightly backups, as
previously mentioned:
▪ Full backups
▪ Differential backups
▪ Incremental backups
 Some types of removable backup media are:
 CD-ROM
 DVD-ROM
 Tape drive
 USB drive
 External hard drive
 One of the main disadvantages of using removable
backup media is that it requires the user to perform
data backup and take the media off-site, which may
increase the risk of data loss. Removable backup
media contains limited storage space, and there is a
risk of media damage.

12/13/2011
65
 There are several risks in the backup and retrieval process. For
instance, data backed up at a remote location can be a target
for thieves and attackers, so the data should be encrypted.
Other risks include the following:
 Storage media may be stolen from the delivery vehicle.
 Storage media on the return trip from the centralized storage site
may be delivered to the wrong customer.
 The tape system could be destroyed.
 The following steps can help manage these risks:
 Carefully scrutinize contracts with the off-site backup provider.
 Use locked containers to transport tapes.
 Encrypt all data prior to writing to backup tapes, or selectively
encrypt only sensitive data.
 Include backup procedures in the corporate strategy.
 Increased complexity and burden
 Limited capabilities of conventional solutions
 Time requirements
 Reliability
 Size of data
 Expensive new technologies
 Lack of a simple disaster recovery process
 Maintenance

12/13/2011
66
 Keep the backup plan as simple as possible.
 Establish reliability through automation.
Minimize the potential for human error.
 Implement immediate, granular recovery
options.
 Allow users to securely recover individual files.
 Make disaster recovery plans flexible.
 When purchasing a third-party data backup solution,
organizations should keep the following considerations in
mind:
 Does it meet the organization’s recovery objectives, including RTO
and RPO?
 How easy and reliable is data restoration?
 Does it store data off-site in case of a disaster?
 Does it comply with the organization’s existing disaster recovery
plan?
 Are the data secure and encrypted?
 What is the labor and maintenance requirement?
 When will the data be backed up?
 How much does the solution cost, including labor, maintenance, and
support?

12/13/2011
67
 It is important to regularly test the backup and
recovery solution, perhaps as often as every
month.
 Off-site backup involves storing backup data at a
separate, secure location, so that if any disaster
occurs at the primary site, the backup data will
remain safe. Data can either be physically moved
to the secondary site using storage media, such
as DVDs or backup tapes, or it can be transmitted
using a network such as the Internet.

12/13/2011
68
 Protects data in the event of a disaster
 Automatically performs the backup operation
 Encrypts data
 Eliminates the necessity of tapes
 Protects data from damages such as hardware
failure, database corruption, and natural disasters
 Can be cheaper than tape alternatives
 Convenient
 Dependable
 Efficient
 Data can be accessed from other remote systems
 File Transfer Protocol (FTP) is typically used to reliably
transfer large files over the Internet.
 Data can be accessed on an FTP server from a backup
program, a special FTP client, or a standard Web browser.
Private FTP servers require authentication, which provides
a layer of security.
 Advantages of using FTP servers for backup include the
following:
 Users can view the files stored on the FTP server any time using
any FTP client or Web browser.
 Mobile users can back up data from anywhere in the world with
an Internet connection.
 FTP backup is less expensive than a specialized remote backup
service.

12/13/2011
69
 The main disadvantage in this method is data
security. FTP is not a secure protocol, and anyone
who discovers a legitimate user’s username and
password can easily access files. Data must be
manually encrypted before transferring it to the FTP
server, or else it could be intercepted and viewed.
 One option is to store data in a standard password-
protected Zip archive. The data can then be
extracted using any Zip client, many of which use
strong encryption algorithms, such as AES or
Blowfish.

12/13/2011
70

12/13/2011
71
 When selecting an offsite storage facility and vendor, the
following criteria should be considered—
 Geographic area: distance from the organization and the probability
of the storage site being affected by the same disaster as the
organization
 Accessibility: length of time necessary to retrieve the data from
storage and the storage facility’s operating hours
 Security: security capabilities of the storage facility and employee
confidentiality, which must meet the data’s sensitivity and security
requirements
 Environment: structural and environmental conditions of the storage
facility (i.e., temperature, humidity, fire prevention, and power
management controls)
 Cost: cost of shipping, operational fees, and disaster
response/recovery services.

12/13/2011
72
 Fully Mirrored Disaster Recovery Site
 This approach entails the maintenance of a fully mirrored or duplicated site
which would enable immediate switching linking the live site and the back
up site. This is usually the more costly option. *****
 Switchable Hot Site
 This approach involves the establishment of a commercial agreement with a
service provider that will assure the preservation of an identical site
including communications that will facilitate a change over of all IT
operations to the hot site within a predetermined time period, typically less
than two hours. This plan's timelines and commitments need to be adhered
to and predefined and understood by all staff. I have recently witnessed a
disaster where the power was cut to an entire city and a particular switch
time line was not in place. Luckily the CIO saved the day and discussed the
situation with the security professional and defined a cutoff time to switch
to the remote DR site. ****

12/13/2011
73
 Hot site
 This approach requires the organization of an agreement with a service provider who will assure
maintenance of a similar site that will facilitate the organization to change the organization's IT
operations to the service provider's location within a predetermined time period, usually less than
eight hours or one working day. ***
 Cold site
 This approach requires the configuration of a disaster site, once the disaster is initiated the service
provider will allow the effected organization's staff to populate the standby site that will have
equipment and software prepared for the delivery of the minimum configuration that is needed by
the organization. The cold site scenario usually enables the organization to be operational within two
days. **
 Relocate and restore
 This approach requires the identification of a appropriate relocation site, where hardware,
peripherals, re-installing software and reconstruction of backed up applications and data can be
restored once an disaster condition has been invoked. This strategy is not useful for big business or for
business that has high IT/IS dependency. *
 No strategy
 This most costly approach to disaster recovery and the approach with the most risk. No effort is made
by the business for disaster recovery. No plan, no backups, no offsite backups, no documentation
etc… all of this points to neglect. No * rating!
 Mobile Recovery
 A mobilized resource purchased or contracted for the purpose
of business recovery. The mobile recovery center might include:
computers, workstations, telephone, electrical power, etc.
 Mobile Standby Trailer
 A transportable operating environment, often a large trailer,
that can be configured to specific recovery needs such as office
facilities, call centers, data centers, etc. This can be contracted
to be delivered and set up at a suitable site at short notice.
 Mobilization
 The activation of the recovery organization in response to a
disaster declaration.

12/13/2011
74
 When searching for a suitable alternate site, organizations
need to make sure they have the necessary resources to
provide continuity of operations. These resources may
include the following:
 Office space
 Identify access to the office space.
 Identify how much workspace each staff member requires.
 Identify room or space that can be allocated for meetings.
 Identify the space for storing folders, papers, or equipment.
 Identify facilities for storing cash, checkbooks, and other
valuable items.
 Office equipment
 Ensure that there are services to enable normal business functions,
such as phone lines, Internet access, and fax machines.
 If possible, assign phones with a direct-dial facility.
 Identify the fax facilities.
 Identify requirements for standard office supplies, such as paper,
pens, and copying facilities.
 Computer equipment
 Identify the computer equipment required by the organization.
 Specify the number of computers that can run the necessary
applications. Confirm who will set up these applications.
 Discuss access to the LAN and how the organization will assign logons
and passwords.
 Identify the number and types of printers that will be made available.

12/13/2011
75
 To be adequately prepared for a disaster, it is
essential to determine and assign responsibilities.
 The following are steps an organization should take:
 Include the role names and the responsibilities of each
role in the disaster recovery plan.
 The person in charge should make policy decisions.
 The person in charge should make critical IT-related
decisions only after consulting with IT personnel.
 Offer disaster recovery training for key (or all) staff
members.
 Set up a clear chain of command.

12/13/2011
76
 Operations Recovery Director
 Contingency Planning Coordinator
 Operations Recovery Manager and Teams
 Facility Recovery Team
 Network Recovery Team
 Platform Recovery Team
 Application Recovery Team
 Damage Assessment and Salvage Team
 Physical Security
 Communications Team
 Hardware Installation Teams
 IT Operations Teams
 IT Technical Teams
 Administration Teams
 Emergency Management Team
 Pre-disaster
 Approving the final DRP and procedures
 Maintaining the DRP and procedures
 Conducting DR training
 Authorizing the periodic testing of the DRP
 Post-disaster
 Declaring the occurrence of a disaster
 Defining the implementation of strategy if more than one strategy
exists
 Authorizing the travel and housing arrangements for team members
 Managing and monitoring the overall recovery process
 Providing updates on the status of disaster recovery efforts to the
senior and user management
 Coordinating media and press releases

12/13/2011
77
 Pre-disaster
 Developing, maintaining, and updating the DRP
 Appointing recovery personnel
 Assigning parts of the DRP to the individual recovery teams and their members
 Coordinating plan testing
 Training disaster recovery team members on plan implementation
 Post-disaster
 Obtaining the required approvals to activate the disaster recovery plan and the recovery teams
 Informing all the recovery team leaders or alternates about the disaster declaration
 Determining the degree of outage due to the disaster
 Coordinating and summarizing the damage reports from all teams
 Informing the organization’s directors of the disaster’s severity
 Conducting briefings with all recovery teams
 Coordinating all recovery teams
 Requesting remote data backup, documentation, and required resources from the IT technical team
 Authorizing purchases and expenditures for required resources
 Reporting the recovery effort status to the operations recovery management director
 Coordinating media press releases
 Pre-disaster
 Preparing the alternate site with hardware and
supplies
 Creating a complete layout and recovery procedure for
the alternate site
 Post-disaster
 Repairing and rebuilding the primary site

12/13/2011
78
 Pre-disaster
 Installs networking equipment at the alternate site
 Post-disaster
 Providing network connections at the alternate site
 Restoring network connections at the primary site
 Pre-disaster
 Maintaining lists of equipment needed in the
restoration process
 Post-disaster
 Installing hardware equipment
 Restoring data and systems from remote backups

12/13/2011
79
 Pre-disaster
 Testing applications for vulnerabilities
 Post-disaster
 Restoring the database
 Addressing specific application-related issues
 Pre-disaster
 Understanding the DR roles and responsibilities
 Working closely with disaster recovery teams to minimize the occurrence of a disaster in the data
center
 Training employees to be well prepared in the case of emergencies
 Participating in the DRP testing as needed
 Post-disaster
 Determining damage and accessibility to the organization’s resources
 Determining the level of the damage to the data center in the organization
 Assessing the need for physical security
 Estimating the recovery time according to the damage assessment
 Identifying the hardware and other equipment that can be repaired
 Explaining to the disaster recovery team the extent of damages, estimated recovery time, physical
safety, and repairable equipment
 Maintaining a repairable hardware and equipment log
 Coordinating with vendors and suppliers to restore, repair, or replace equipment
 Coordinating the transportation of salvaged equipment to a recovery site, if necessary
 Providing support to clean up the data center after a disaster

12/13/2011
80
 Pre-disaster
 Understanding the DR roles and responsibilities
 Working closely with the DR team to ensure the physical safety of the
existing systems and resources
 Training employees
 Becoming familiar with emergency contact numbers
 Participating in DRP testing as needed
 Maintaining the list of members allowed to enter the disaster site
and recovery site
 Post-disaster
 Assessing damage at the disaster site
 Blocking the data center from illegal access
 Scheduling security for transporting files, reports, and equipment
 Providing assistance for investigations of the damaged site
 Pre-disaster
 Understanding DR roles and responsibilities
 Working closely with the DR team to ensure the physical safety of existing
systems and resources
 Participating in disaster recovery plan testing as required
 Establishing and maintaining the communications equipment at the
alternate site
 Post-disaster
 Assessing communication equipment requirements by coordinating with
other teams
 Retrieving the communication configuration from off-site storage units
 Planning, coordinating, and installing communication equipment at the
alternate site
 Planning, coordinating, and installing network cabling at the alternate site

12/13/2011
81
 Pre-disaster
 Understanding the disaster recovery roles and responsibilities
 Coordinating with the DR team to minimize the impact of a disaster in the data center
 Maintaining the current system and LAN configuration in off-site storage
 Post-disaster
 Verifying hardware requirements at the alternate location
 Inspecting the alternate location for the required physical space
 Notifying the alternate site of the impending occupancy
 Interfacing with the IT technical and operation teams about the space configuration of the alternate
location
 Coordinating the transportation of repairable equipment to the alternate location
 Informing the administration team regarding the need for equipment repair and new equipment
 Ensuring the installation of temporary terminals connecting to the alternate location mainframe
 Planning and installing the hardware at the alternate location
 Planning, transporting, and installing hardware at the permanent location, when available
 Setting and operating a sign-in/sign-out method for all resources at the alternate location
 Pre-disaster
 Understanding disaster recovery roles and responsibilities
 Coordinating with the DR team to ensure the physical safety of existing systems and
resources
 Training employees to be well prepared in case of emergencies
 Ensuring complete backups as per the schedule
 Ensuring backups are sent to the remote location as per the schedule
 Post-disaster
 Supporting the IT technical team as needed
 Sending and receiving off-site storage containers
 Ensuring backup tapes are sent to off-site storage
 Maintaining a sign-in/sign-out method for all resources at the alternate location
 Checking the alternate site’s floor configuration to aid in the communication team’s
installation plans
 Checking the security of the alternate location and its LAN network
 Coordinating the transfer of systems, resources, and people to the alternate location

12/13/2011
82
 Pre-disaster
 Understanding the disaster recovery roles and responsibilities
 Working closely with disaster recovery teams to minimize the
occurrence of a disaster in the data center
 Post-disaster
 Restoring system resources from the backup media
 Initializing new tapes as required in the DR process
 Conducting backups at the remote location
 Testing and verifying operating systems
 Modifying the LAN configuration to connect with the alternate
location’s configuration
 Pre-disaster
 Understanding disaster recovery roles and responsibilities
 Ensuring the maintenance of the required business interruption insurance
 Ensuring the adequate availability of emergency funds throughout the DR process
 Assessing the alternative communication required if telephone services become
unavailable
 Post-disaster
 Preparing, coordinating, and obtaining proper sanctions for all procurement
requests
 Maintaining logs of all procurements in process and scheduled deliveries
 Processing the payment requests for all invoices related to the recovery procedure
 Arranging travel and lodging for the recovery teams
 Providing alternative communications for recovery team members if normal
telephone service is not available
 Performing provisional clerical and managerial duties as needed by the DR teams

Module 4 disaster recovery student slides ver 1.0

Module 4 disaster recovery student slides ver 1.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Module 4 disaster recovery student slides ver 1.0

Similar to Module 4 disaster recovery student slides ver 1.0 (20)

More from Aladdin Dandis

More from Aladdin Dandis (20)

Recently uploaded

Recently uploaded (20)

Module 4 disaster recovery student slides ver 1.0