Disaster Recovery: Develop Efficient Critique for an Emergency

Disaster Recovery: Understanding Key Principles

Disaster Recovery, business resumption and emergency preparedness are three integrated
components of a business continuity management solution. Some organizations focus on any one
or part of the solution. It is possible to have a perfectly functioning redundant data center and
still business cannot continue. So it is important that all the below three components of business
continuity management are tightly coupled and integrated to ensure successful business
continuity.

 Emergency preparedness
 Business resumption
 Disaster recovery

Disaster recovery addresses recovery of critical IT infrastructure such as hardware, software,
telecom, network and data for bringing up the mission critical applications to support the
business.

Business recovery involves the recovery of critical business functions and processes that relate or
support the delivery of core products/services to customers. It focuses on products/services, non-
IT employees, vital records and other stakeholders involved in supporting critical business
functions.

Emergency preparedness is designed to enable an effective response to an event. It focuses on
stabilizing the situation. Generally this is coordinated by a team responsible for health & safety
of the employees. The scope of emergency preparedness includes utility failures, non-availability
of employees/pandemic preparedness and non-availability of facilities due to bomb threats, etc. It
also handles transportation and coordination with external agencies.

Disaster Recovery: Defining Project Scope

How to Stop Scope Creepxs

One of the most important steps in Disaster Recovery project management is defining the
disaster scope. It will help the project team to maintain the control of the project. Clearly
understanding what's included in a project is the only way of guaranteeing its success.

Disaster scope should include the nature of impact and timeline. For example, it may be
something like, “If the primary data center is down in the event of any major incident, how
to survive the critical business processes and applications for 3-4 weeks until the primary
data center is brought up.”

The clear disaster scope definition will help in setting up the expectations very clearly and iron
out all issues in the scope.

Another challenge is the ongoing upgrade and development changes in applications. There will
be a potential delay because of these changes and upgrades in completing the disaster recovery
project. So the disaster recovery project team should ensure a mutually agreed cutoff date and
communicate to the application team that the disaster recovery environment will be similar to
production as of that cutoff date. If any changes or upgrades are required, first it should be
implemented in production before the cutoff date and the disaster recovery environment will be
implemented similar to production as on that date and it will not have any new or additional

features. After the cutoff date if any changes happen in production, the application team has to
communicate and implement the same changes to the disaster recovery environment also. This
helps to avoid any technical refresh or upgrade project merging with DR project.

Also, we need to ensure disaster recovery is not confused with high availability. High availability
(HA) is for local system or architecture failures and generally HA is a solution under the same
roof. But disaster recovery allows an application to be recovered at an alternate site away from
the production site in case a higher level failure or disaster strikes.

Hardware Resource Planning

How to Handle Hardware Planning

One of the other major challenges faced in resource planning is end of life (EOL) hardware. This
will be a common case if the application is older and has been in use for more than 10-15 years.
If the production environment is in EOL hardware, then there will be a requirement to buy new
equivalent hardware. The issue in new hardware is that it will not support older operating system
(OS) versions. Sometime, the new M5000 server procured for disaster recovery will support only
Solaris 10 whereas the production environment is still running on Solaris 9. So it may be
necessary to upgrade the production environment to Solaris 10 or procure hardware which will
be compatible

Effective BIA in Disaster Recovery

How to Perform an Effective BIA

Sometimes it may be necessary to conduct a BIA in a very short time and it may not be possible
to conduct a detailed BIA for all the applications in the organization. In those cases, it is
suggested to first have a discussion with senior management of each business division to identify
the candidates for a detailed BIA. It is also necessary that the audit team and IT support team be
involved in the discussion and short listing of the applications which are considered as very
critical by respective business units. Then a BIA questionnaire can be sent to the respective
application owners to determine the impact, RTO/RPO and dependencies. One of the common
challenges in conducting a BIA is explaining RTO and RPO to the business users and ensuring
that there is no duplication or overlap in any impacts among the upstream and downstream
applications. It may be necessary to conduct several awareness sessions and meetings to ensure
all the doubts are clarified. It is also necessary to ensure that BIAs should be signed off by their
respective business finance team to avoid any errors in estimating the impact.

Lesson Learned: Ensure that the BIA is clearly understood by the person filling it out, and
validate the financial impact with respective finance teams to avoid any errors in impact
estimation.

Maximizing the Return on Investment in Hardware

Can the Disaster Recovery Hardware be utilized economically?

In general, DR solutions cost too much, requiring enormous investment in additional server and
networking hardware to replicate existing data centers – increasing infrastructure needs
accordingly. These expenditures inflate the cost of IT, while reducing average system utilization.
These cost and complexity challenges have effectively restricted or degraded many IT disaster
recovery plans.

It may be very difficult to convince management to purchase new hardware when it is known to
everyone that new hardware is going to be kept idle until a disaster strikes the primary site. Also,
another issue is that when it is known that the performance in new hardware is going to be better
than the existing old production hardware, there will be pressures to use the new hardware for
production.

One solution considered in those cases is to use the old hardware for DR and use the new
hardware as production. Another solution may be to use repurposing software which can allow
servers to be used as a staging or QA environment during normal circumstances and bring up the
disaster recovery environment quickly when disaster strikes. This way the hardware procured is
not kept idle and is used effectively in normal times.

Also, it is mandatory to analyze the consolidation and virtualization options while planning any
hardware requirements for disaster recovery which can reduce the hardware requirements
considerably.

Lesson Learned: Consolidation, Virtualization and Repurposing software will be very useful in
optimizing the cost of hardware.

Improving the Disaster Recovery Procedure

How to Improve the Response and Reduce the Recovery Time

In a time of disaster, there is a tremendous amount of pressure and stress to get everything back
up and running and available to users. In manual processes, mistakes will be made for a variety
of reasons. Thus, it is suggested to automate the recovery process as much as possible. Having a
simplified and automated disaster recovery processes would eliminate the unnecessary time
delay and manual errors during the recovery.

Also, traditional recovery procedures involve several groups such as operating system, database,
etc. It is recommended to reduce these levels of dependency to a minimum level and create a first
responder who can run all these procedures alone as an immediate step and contact the respective
administrators only when there is an issue in the procedure. Simple UNIX scripts can be used to
automate most of the steps in the recovery procedure which can simplify the steps and avoid any
manual error in syntax and reduce the recovery time. These steps are helpful to have a better
recovery procedure to respond very quickly to any disaster incident.

Lesson Learned: Simple UNIX scripts can make the recovery steps less complex, avoid manual
errors and reduce the recovery time substantially.

Disaster Recovery: Develop Efficient Critique for an Emergency

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (17)

Similar to Disaster Recovery: Develop Efficient Critique for an Emergency

Similar to Disaster Recovery: Develop Efficient Critique for an Emergency (20)

Recently uploaded

Recently uploaded (20)

Disaster Recovery: Develop Efficient Critique for an Emergency