VIP Call Girls Pune Madhuri 8617697112 Independent Escort Service Pune
Crisis management
1. COLLAGE OF TELECOM & ELECTRONICS
INSA 483: Seminar
Lecturer: Ibrahim Aledane
RESEARCH PROPOSAL
The Importance Of Crisis Management In Data Centers
Name : Abobakr Ahmed Al-Shanqiti
Student ID : 439152275
Name : Musleh Ali Al-Shamrani
Student ID : 439135448
Date of Submission: 15 April 20
2. 1
Abstract:
When a disaster occurs, and there will be one someday, it is too late to start
planning how you will manage the situation. Confusion in a stressful situation can
very easily arise, and can be fatal. All companies need to prepare for crisis
management and emergency responses during disasters such as IT and utility
failures, terrorist attacks, fraud, sabotage, theft, flooding and fire. The aim of your
overall disaster recovery plan is to enable your organization to protect its staff and
assets during an incident, resume and maintain its key activities during the
disruption and return swiftly to normal operations. However, the plans to manage
the event, the process to formally declare an IT disaster, practicing how to work as a
corporate team during the event and manage your ongoing event communications,
etc. are often "forgotten" in the development of your detailed technical recovery
plans. As real experiences have shown, planning for communications and risk
mitigation during the event just does not work!
In this proposal, we will focus on to demonstrate the importance of the existence of
crisis management centers in institutions and data centers. And a statement of the
importance of crisis management to face problems before and after they occur.
We explained the environment for data centers and what affects them, and took an
example NTT FACILITIES. We got acquainted with each of the crises and crisis
management, and what is the plan that must be put in place to manage crises
effectively and successfully, and we discussed the frameworks of crisis management.
3. 2
1. Introduction:
Modern data center infrastructure enables companies robust and maintenance free
environment to run their business applications. To keep business running around the
clock, new data centers need to be designed in a way that they are reliable, cost
effective and flexible in order to meet the business requirements. In addition to
running daily services, many companies have turned their sights to disaster recovery,
crisis management and major incident management, to ensure no revenue is lost in
a case of a disaster.
Data centers built for disaster recovery purposes rely heavily on major incident
management. Organization in question was looking for a data center solution to
match their current MIM process. Based on the available documentation a new data
center was to be built that would follow the industry standards as well as the
organizations internal standards to ensure the best possible solution for business
continuity1.
Today, together with the developments in Information and Communication
Technologies, companies make investment on their IT services and data centers to
provide fast and continuous services to their users and customers. They set up these
data centers with the systems presenting the best services, the highest accessibility
and again for these systems in order not to face with access problems there is an
increasing investment for all IT structure. Companies need a well-arranged Disaster
Recovery Plan for the continuity of their data centers, server systems and IT services
to be able to manage the disaster recovery period effectively and to cut through the
disaster with minimum data and economical loss; because in an emergency disaster
situation they may lose access to some part of the system or all IT and server
systems may be out of order.
Because a malfunction and crisis in data centers can lead to a loss of information and
this leads to paralysis of companies and failure of their work.
In this proposal, we will highlight the importance of crisis and disaster management
in data centers and provide guidance and strategies for overcoming crises and
unforeseen situations to deal with losses in data centers and technology services and
increase their flexibility.
1 Aliisa Partio,DATA CENTER DISASTER RECOVERY & MAJOR INCIDENT MANAGEMENT, Lahti
University of Applied Sciences, Finland,2017,p1.
4. 3
2. Objectives:
Businesses operate in an increasingly risky environment. For setting up and
managing an effective business continuity management system and managing its
crises, an organization needs to define a risk assessment process that will enable it
to understand the threats and vulnerabilities of its critical activities. It is necessary to
assess the impact that would arise if an identified threat became an incident and
caused a business disruption. Crises may be the product of an unforeseen
combination of interdependent risk. They develop in unpredictable ways, and the
response usually requires genuinely creative, as opposed to prepared, solutions. The
roles of strategic management are amplified during a crisis1.
The objective of activating the Crisis Management group is to:
gain an effective overview of the incident,
help exert control over events as they unfold,
minimize damage and consequences,
improved performance and efficiency
ensure that other activities can be resumed as soon as possible.
This proposal contributes to clarifying the role of crisis management in maintaining
the data center, the importance of providing a good information and
communications system that can provide the required information in times of crisis,
what is the role of the force in times of crisis, and the importance of forming task
forces that can deal with crises and reducing Its negative effects.
3. Research Question:
Following research questions for the proposal were based on the business
requirements:
- What is the importance of crisis management in the data center?
- How to ensure business service continuity within data center platform?
- Which technical solution for data center platform should be utilized and
where?
1 Haris Hamidovic,An Introduction to Crisis Management, ISACA JOURNAL, Volume 5, 2012, p1.
5. 4
4. Literature review:
4.1 DATA CENTER ENVIRONMENT
When planning or building a new data center, it is crucial to understand the current
environment to achieve the best possible outcome.
Now is the era of cloud computing where internet-based data are handled from
remote places. Data is being entered. stored. processed, deposited and backed up all
at the central servers located in specific buildings. Data center is a place where all
these servers are gathered in compliance with art of technology1.
Data centers contain IT equipment used for the processing and storage of data, and
communications networking. They are the backbone of IT networks across the globe
and include extensive supporting infrastructures required to power and cool the IT
equipment. A data center can be assimpleasa single rack in a server closet or as
complex as a large warehouse, typically having built-in redundancy for the avoidance
of downtime. Data centers are high energy consumers2.
Data centres house servers, and networking and storageequipment, and are
considered the central nervous system of the 21st century. They contain
comprehensive mechanical andelectrical infrastructures to support the energy
intensivecomputing required to perform one or more of the followingfunctions:
The physical housing of IT equipment such as computers,servers, switches,
routers, data storage devices, racks, andrelated equipment.
The storage, management, processing and exchange of digitaldata.
The provision of application services or management for dataprocessing, such
as web hosting, internet, intranet and telecommunication.
Data centres vary in size from a single rack in a server closet tohuge server farms
withfloor areas reaching 150,000 m2. Data centres are used by businesses,
corporations, educationalestablishments and governments, to provide web hosting
and theinternet, the storage of company information, and the processingof business
transactions. based on the importance of continued access to the data, display
varying levels of:
1 Abdur Rashid,Data Center Architecture Overview, National Academy for Planningand Development
MINISTRY OF PLANNING, Volume 28, 2019, p33.
2 Beth Whitehead, Deborah Andrews, Assessingthe environmental impact of data centers part 1:
Background, energy use and metrics, ELSEVIER JOURNAL, 2014,p1.
6. 5
- Reliability: probability that a component/system/data centre operates
without failure over a set time period. Facilities can have the same
availability, but a facility that has one outage peryear is more reliable than a
facility that has many failures lasting the same amount of time.
- Availability: the average time per time period that a
component/system/data centre operates asdesigned, without downtime.
- Redundancy: the topology of supporting infrastructures that ensure a
component/system/data centre remains available inthe event of a failure.
All the data centers are currently utilizing VMWare’s virtualization platform.
System hardware has been calculated to meet the local business requirements
and all sites are using a harmonized hardware vendor, which in this case was
IBM. Depending on the local infrastructure and organizations history, different
network and storage configurations make the hardware platform sometimes
very heterogeneous and therefore difficult to maintain. By using standardized
platform vendor and hardware options, the data center design aims to be as
universal as possible.
Organization operates on multiple different market fields such as food industry
and restaurant services. Data center operations need to be able to provide
operations in all countries as well as be traversable between different business
units. This puts high requirements for the data center specialist when designing
data centers that can meet all the business requirements. These business
requirements vary based on the business application, but in general almost all
essential applications require the data centers to be running 24 hours a day, 7
days a week, 365 days in a year.
the data center platform is required to be built upon high-availability platform,
which is then back upped by working disaster recovery design and crises. High-
availability platform in this case has been solved by using VMWare’s cluster
technology, where two or more hosts work together to provide a functioning
platform for virtual machines. The organization had identified a growing need for
a standard disaster recovery.
4.2 DATA CENTER layout
standard for data center framework was first presented in TIA-942. Since
the time then present day server farm plans have gotten a few more nitty
gritty details, in light of the fact that the TIA-942 is primarily concentrating
on telecom framework standard. Data centers are introduced from a level
viewpoint by all significant industry benchmarks, for example, ANSI/TIA-
942-An, ANSI/BICSI, EN 50600-x and their business partner UPTIME
7. 6
INSTITUTE. There is just a slight variety in particulars between various
levels.
A case of a huge present day data center can be found in figure 1, which
incorporates things, for example, space the board, security, equipment and
cabling.
Figure 1: data center layout (NTT Facilities)
NTT FACILITIES offers a comprehensive range of data center solutions, including
everything from site selection and countermeasures for power outages, heat,
earthquakes, lightning strikes and noise through to security and monitoring and
maintenance. The company also has the project management capacity to keep
costs, quality and deadlines on track, and thus provide invaluable support for the
construction of reliable, low-cost data centers.
NTT FACILITIES investigates all the options to create data centers tailor made to
provide all the functions you require, and provides the steadfast support needed
to ensure your data center is reliable and well-balanced.
8. 7
4.3 crisis management:
crisis response which is an approach to dealing with an event in a professional
manner that addresses the critical needs of the time. The focus is on surviving
the crisis in progress and easing the acts of the crisis as much as possible. The
planning is designed to address the worst-case scenarios. The goal is to establish
protocols and procedures to guide management decision-making, employee
actions, and client expectations
Preparing for crisis situations and responding appropriately to them if they occur
can mean the difference between closure and survival, or even flourishing. After
all, crises can be fertile
opportunities for learning and change if an organization is equipped with the
right tools to handle them. There are two sides to managing any crisis: planning
and response. Organizations that anticipate the possibility of a crisis and prepare
properly will be better equipped to manage such situations or avoid them
altogether.1
Crisis and crisis management are normal and inevitable occurrences in
organizations. Even with the need to excel, attain perfection and maintain an
ethical code, something always goes wrong. Often times the communications or
public relations department has the mandate to handle a crisis when it occurs
and provide solutions to it before it taints the organization’s image2.
4.3.1 What is crisis?
defines a crisis as an “inherently abnormal, unstable and complex
situation that represents a threat to the strategic objectives, reputation
or existence of an organization”3.
An unexpected event that threatens the wellbeing of a company.
A significant disruption to the company and its normal operations which
impacts its customers, employees and/or investors.
1 Aktouf, Management and Theories of Organizations in the1990s:Toward A Critical Radical
Humanism? Academy of Management, 1992
2 Alex Keya, CRISIS AND CRISIS MANAGEMENT IN ORGANIZATIONS, St. Paul's University,Kenya,p2.
3 British Standards Institution,PAS 200:2011: crisis management – Guidance and good practice.
9. 8
Crises develop in unpredictable ways, and the response usually requires
genuinely creative, as opposed to preprepared solutions of the sort designed to
deal with more predictable and structured incidents. are unlikely to work in
complex and ill-structured crises. They may, in fact, be counterproductive.” As
crisis management is about making major strategic decisions in abnormal,
unstable and complex situations, a lengthy and complicated manual of the sort
familiar to incident managers would be more of a hindrance than a help. The
crisis management plan “is not a guide as to what to do next in a given situation”
but rather a framework in which good decisions can be taken.
To be effective, crisis management must be further enhancing this capability and
to be better prepared to respond to new and unimagined risks as well as to
manage the ever-growing number and diversity of stakeholders, many of whom
have conflicting agendas. This is even more critical when considering the ever
more complex organizations, with regular restructuring, mergers and acquisitions
and divestments taking place, that occur in the business world. This can only be
achieved by working in an integrated way.
Identifying and evaluating risks and issues is the first step, but it is the
management of the risks and issues which is critical and most challenging for
most organizations, especially when dealing with intangible issues. Not all crises
are preventable. However, having effective risk and issues management
processes in place will help organizations foresee, plan scenarios, be more
proactive and decide on whether to take, treat, transfer or terminate the risk.
Actual crisis management planning deals with the loss, just as disaster recovery
and business continuity planning deal with the situation after the loss. Crisis
management is about being prepared to handle adversity and minimise impact
most effectively and facilitating the management process during chaos. Crises do
not discriminate based on a company's size or notoriety1.
4.3.2 What is Crisis Management?
Crisis management is a critical organizational function. Failure can result
in serious harm to stakeholders, losses for an organization, or end its very
existence. Public relations practitioners are an integral part of crisis
management teams.
Britain’s Publicly Available Specification (PAS, 2011) defines crisis
management as the ability to creatively make major tactical decisions in
irregular, unstable and intricate situations.
crisis management is the art of making decisions to head off or mitigate
the effects of such an event, often while the event itself is unfolding.
1 Ellis Holman,Managinga Disaster:Getting Started with IT CrisisManagement and Emergency
Response Teams, 2011,p8.
10. 9
special measures taken to solve problems caused by a crisis.
set of procedures applied in handling, containment, and resolution of an
emergency in planned and coordinated steps
Crisis management requires a manager who specializes in crisis management,
or a high-level executive if the crisis pertains to an organization wide issue.
Either way, the person in charge must possess shrewd crisis decision-making
abilities in order to lessen the effects of the crisis.
In case of Crisis management, depending upon the nature of crisis and the
extent of damage, either the Business Managers or the Senior Management
would be required to take charge. In case of any emergency or crisis effecting
the entire organization and involves dealing with external agencies, media
and other legal formalities etc., the senior management of the Organization
would need to step in and take charge of the situation assisted duly by the
Business managers and other designated officials as per the plan.
4.3.2.1 Crisis management frameworks
Crisis management comprises various phases: preparedness before crisis,
response to limit damages during the crisis and feedback after the crisis.
Before a crisis, preparedness consists in developing knowledge and capacities in
order to effectively anticipate, respond and recover from a crisis1:
- Risk assessment constitutes the fundamental first step in preparedness:
preparing for crisis requires identifying and analysing major threats, hazards
and related vulnerabilities.
- Early warning systems based on the detection of these threats serve to
activate pre-defined emergency or contingency plans.
- Stockpiling, maintaining equipment and supplies, training and exercising
emergency response
- forces and related co-ordination mechanisms through regular drills all
contribute toward preparedness.
- Appropriate institutional structures, clear mandates supported by
comprehensive policies and legislation and the allocation of resources for all
these capacities through regular budgets are also instrumental for thorough
preparedness to crisis.
Once a crisis actually, the response phase begins:
- Detection of a crisis may come about through various sources (e.g.
monitoring networks and early-warning systems, public authorities, citizens,
media, private sector, etc.). It may build up over time or happen suddenly.
1 Charles Baubion,OECD Risk Management: Strategic CrisisManagement, 2013, p9-10.
11. 10
- Monitoring the development of a crisis in order to make sense of its
characteristics and ascertain the operational picture requires an appropriate
intelligence organisation.
- selection of appropriate contingency plans and activation of appropriate
emergency response networks.
- Response efforts need to be coordinated, monitored and adapted as the
crisis develops through the tactical and strategic oversights of crisis cells at
the appropriate levels.
- Standard operating procedures (SOPs) should govern operations and co-
ordination and should include information sharing and communication
protocols as well as scaling-up mechanisms to mobilise additional emergency
response means.
- In addition to ensuring co-operation and exerting decision-making, leadership
plays a key role in crisis communication. Communicating with the media and
the general public to provide sense of events, to maintain trust in the
emergency responders and government, and to transmit specific messages is
an essential function of leaders during crisis.
4.3.2.2 Crisis management plan:
When and if a disaster strikes any business operation or organization, what helps
the Organization to deal with the crisis effectively, continue to run the business
operations to the extent possible and get back on the recovery path are the
Disaster Recovery and Business Continuity plans. However, the effectiveness of
the plans depends largely upon the preparedness of the teams involved and the
ownership.
Any Disaster Recovery or Crisis Management plan involves strategic management
decisions as well as commitment of resources in terms of financial resources as
well as human resources. Execution of the plan also depends upon the
facilitation, effective leadership and control of the team that owns the plans. In
terms of crisis the management needs to demonstrate complete ownership over
the recovery process.
The crisis management plan (CMP) is a set of policies and procedures to help
data center operators prepare for, respond to, and learn from crisis situations
that could eventually lead to a true emergency or disaster that would then
require the execution of EOPs. The CMP should be closely reviewed by all major
stakeholders who would participate in the process.
Preparation and prevention
The best crisis management tool is prevention. It is commonly known that most
data center outages are a direct or indirect result of human error. To minimise
12. 11
errors, data center personnel should undergo intensive training in change
management procedures to ensure proper behavior and execution for work in or
around critical facility systems. All data center work procedures (standard
operating procedures/SOPs) should be created with safety and operational risk
mitigation as the primary goal. It is recommended that all procedures be peer
reviewed on site and undergo an additional review by a quality assurance
specialist.
Detection and incident classification
Not all events appear out of nowhere or are easily identifiable at first glance. It is
important to be able to recognise their early warning signs and threshold
qualities. There is a distinction between an urgent situation and a crisis. An
urgent situation that is being managed with a proven process or procedure
would not normally be considered a crisis. One of the defining characteristics of a
crisis is a loss of control. If a situation passes outside the boundaries of what can
be reliably managed and becomes, or threatens to become out of control, a crisis
may ensue. Another characteristic of a crisis would be a high level of severity.
Data center infrastructure management (DCIM) software tools can be an
effective way to centrally monitor data center system state changes and alarms
to provide more proactive notification of problems and conditions that could
lead to a crisis or disaster.
Response and mitigation
Once a crisis or disaster has been declared, the first inclination on the part of
well-meaning operators might be to immediately jump in and take action to fix
the problem. Until the situation is fully understood and a well-considered
response plan created, however, such actions run the risk of causing further
harm or downtime. Except in obvious cases requiring immediate action (e.g.,
fire), the proper course of action is to craft a plan of action with subject matter
experts and key stakeholders. The time invested in these activities will often, in
the long run, provide a safer, surer, and longer lasting solution than hasty action.
After any first response activities, the primary task is to assess the situation. Basic
information must be put together about the scope and severity of the incident,
as well as the state and stability of the plant. This data must be quickly
established and continuously updated in order to ensure good decision making
and accurate communications.
Recovery and analysis
Once the incident has been fully resolved, a failure analysis report should be
prepared and issued to key stakeholders. It is best to do this quickly – within one
13. 12
week of the incident’s resolution – while the experience is still fresh in people’s
minds.
Plan Should Be Inclusive, the plan cannot limit itself to disasters, or to one or two
types of crises (non-physical crises.) Organizations are exposed to more than one
type of crisis, so the plan must identify actions to be taken based on a number of
different crisis scenarios. The plan will identify these actions based on the
specific “type” of crisis. The actions needed for a product safety problem will
probably differ from the actions that need to be taken when the organization
experiences an incident that is threatening the organization’s reputation, or a
financial crisis.1
4.3.2.3 Objectives of Crisis Management in Data Center:
Crisis management involves managing and facilitating decision making,
authorizing and providing financial resources, facilitating co-ordination and
support from external agencies, facilitating and overseeing medical and other
facilities for the staff and employees if need be coupled with reviewing security
and taking adequate measures for enhancing the security around the site. More
importantly the crises team involving senior management will need to deal with
media, govt. agencies, insurance as well as with families and relatives of
employees as well as the public and manage communications on behalf of the
Company. Managing media and providing the right information is of vital
importance to the Company’s reputation during any crisis for, the information
provided can adversely affect the Company’s share prices as well as the markets.
It is the crises team that holds the executive authority to take decisions of all
nature including financial decisions to ensure that the crisis is managed and
recovery is underway without glitches2.
Crisis management is key to damage control when an organization is in trouble.
Crisis management has many goals, including3:
1. Identifying the Real Problem
The first objective of crisis management is to identify the problem that created
the crisis -- something not always straightforward to do. In fact, it might be a
mystery as to how it all started. Therefore, it’s crucial to investigate and dig
deeper into understanding the problem, so that all sides have a better
understanding of how chaos occurred as a result of the conflict. In trying to
achieve this crucial objective, neither side should withhold information, and both
1RICHARD RNOLD, CrisisManagement Planningand Execution, 2007,p2
2 Prachi Juneja,Disaster Recovery and CrisisManagement, 2015.
3 University Business:Three Steps for ManagingSerious Challenges.
14. 13
parties must have a nonjudgmental tone when investigating the source of the
problem.
2. Managing the Flow of Information
The second objective of crisis management is to manage the flow of information.
Always anticipate that news of the conflict will come out, especially in the age of
the Internet and social media websites. If the harmful event is something that
affects the public, then it’s always best to prepare a press release or hold a press
conference as a preliminary step to cool the panic that they might have as a
result of the conflict. Inform the public, or whomever is affected, what steps the
company is taking to alleviate the problem. Keep things transparent.
3. Understanding the Adversary
The third objective of crisis management is to understand the adversary, that is,
assuming it is someone or some group, as opposed to something. If adversaries
believe they have no leverage, then they will believe that there’s no point in
negotiating -- and the crisis escalates big time. However, they might hold some
advantages that you might not be aware of. It’s best to figure this out on your
own rather than asking them, so that once you find it, you'll know how to
negotiate in a way that both parties walk away satisfied.
4.3.2.4 Preparation and prevention:
The best crisis management tool is prevention. It is commonly known that most data
centeroutagesare a director indirectresultof humanerror. Many of these errors occur
duringinstallationandmaintenance,activities that are performed or supervised by the
Facility Engineering staff. To minimize errors, data center personnel should undergo
intensive training in change management procedures to ensure proper behavior and
executionforworkinoraround critical facilitysystems.All datacenter work procedures
(standard operating procedures, or “SOPs”) should be created with safety and
operational riskmitigationasthe primarygoal.Itis recommendedthatall proceduresbe
peer reviewed on site and undergo an additional review by a Quality Assurance
specialistfortechnical andprocedural accuracy.Inparticular,theyshouldbe scrutinized
for proper risk categorization, safety preparation, work task sequencing, and back-out
procedures. Another important activity is the identification of probable and/or
consequential systemfailure modes,whichisaprecursorto development of emergency
operatingprocedures(EOPs).Thisexercisenotonly identifies what EOPs are needed as
previously explained, but it will also help prevent such incidents from occurring as a
natural consequence of the identification and preparation process. Once established,
regular drills of EOPs will maximize preparation and staff coordination.
4.3.2.5 Response and mitigation:
Once a crisis or disaster has been declared, the first inclination on the part of
wellmeaning operators might be to immediately jump in and take action to fix the
15. 14
problem. Until the situation is fully understood and a well-considered response plan
created, however,suchactionsrunthe riskof causingfurtherharm or downtime.Except
inobviouscasesrequiringimmediate action(e.g.,fire), the proper course of action is to
circle the wagons and craft a plan of action with subject matter experts and key
stakeholders.
The time invested in these activities will often, in the long run, provide a safer, surer,
and longer lasting solution than hasty action. However, of course, if there is an
immediate threat to human safety or the physical plant that can be safety mitigated,
immediate actionshouldbe taken.Commonsensedictatesthatif someoneisorisabout
to be injuredthe needforactionoutweighsthe needfordeliberation - providedthatthe
consequencesof suchactionsdonotrecklesslyendangeranyone.Similarly,if there were
a containable fire andthe safe means to extinguish it, doing so would take precedence
overanythingelse.These are just two possible examples where a first response would
be justifiableandprudent.Thatbeingsaid,extreme caution should be employed in any
situation where the need for an immediate first response is indicated. Only when the
stakes are high and the consequences predictable should such actions be considered.
After any first response activities, the primary task is to assess the situation. Basic
informationmustbe put together about the scope and severity of the incident, as well
as the state and stability of the plant. This data must be quickly established and
continuously updated in order to ensure good decision making and accurate
communications.Doingthiswell requiresstaff whoare well trained,drilled,andwhoare
quick thinking and calm under pressure.
4.3.2.6 Recovering from crises:
While organizations may have very effective data redundancy in their onsite data
center, they often overlook the requirement for offsite data storage. Recovering
from crises is a situation which companies should thrive to avoid. To achieve this
there should be redundancy created for heating, ventilation and air conditioning
(HVAC), physical connectivity paths and devices as well as for power and storage.
On the software layer, technologies such as mirroring and redundant array of
independent disks (RAID) are things which can further increase redundancy and
prevent the need for a recovering from crises. On-site data center redundancy is
a way to provide fast recovery from any hardware or software error without the
need for recovering from crises. Recovering from crises can be built in various
ways depending what are the business needs. Data center duplication is one of
the most robust ways of creating recovering from crises, since the aim is that you
can lose a whole site without affecting any of the business processes. If data
center duplication is not a possibility, organizations can build their own data
center specifically for recovering from crises purposes with minimal required
hardware to keep the business rolling, or they may opt for colocation facility.
16. 15
Colocation facility is a data center from which services are provided for rental to
retail customers1.
5. Design:
Data center disaster recovery can be considered a success if your organization
manages to easily recover and rapidly resume operations without causing major
disruption to your business. Creating a comprehensive data center disaster
recovery plan can help you mitigate potential risks and threats, prepare for
possible disasters, and minimize their impact on business performance and
productivity.
An effective data center disaster recovery plan generally includes the following
steps:
Establishing a disaster recovery (DR) team and assigning roles and
responsibilities
Conducting an operational risk assessment and business impact analysis
Establishing critical data and applications as well as their level of priority
Developing data center disaster recovery strategies
Documenting the data center disaster recovery plan and sharing it with
employees
Testing and updating the data center disaster recovery plan
Unlike other disaster recovery plans, the data center disaster recovery plan
focuses solely on how to recover your data center facility and restore critical
data and IT infrastructure to normal operating mode after a disaster.
Why Do You Need Data Center Disaster Recovery?
In order to stay competitive in their respective markets, organizations need to
provide on-demand services to their customers and minimize the risk of data
loss. This explains the increase in demand for virtualization technology because
business owners are interested in simplifying data center management,
optimizing resource utilization, cutting costs, and ensuring on-demand scalability
and flexibility.
As a result, data center facilities have dramatically transformed in the last
decade. Traditional on-premises data centers have in many cases been replaced
with large-scale virtual environments. However, data centers are still extremely
fragile and can be exposed to various dangers and threats such as security
breach, data theft, ransomware attacks, viruses and worms, etc.
1 Chad Bahan, The Disaster Recovery Plan,2003,p6.
17. 16
Modern data centers are constantly evolving and their capabilities are growing
exponentially. The same goes for attacks on these centers, as they become ever
more sophisticated and hard to predict and avoid. Thus, it is critical that you
prepare for potential disasters in advance and stay aware of their possible
consequences.
In order to ensure reliable data protection and efficient system recovery, a
responsible business owner needs to consider which data center disaster
recovery strategies work best for their particular data center facility. On the basis
of the chosen DR strategies, you can create a comprehensive data center disaster
recovery plan which can guide you through the entire DR process.
How Does Virtualization Help With Disaster Recovery Within a Data Center?
As mentioned above, traditional on-premises data centers are now being
replaced with virtualization platforms on mass scale. The main reason for this is
the multiple benefits that virtualization can provide you with, no matter the size
of the organization or amount of expected workload. Let’s discuss how
virtualization can help with data center disaster recovery in detail below.
Improving resource utilization
Traditional data centers are highly dependent on physical servers, with each
being dedicated to conducting a specific operation or running a single
application. Due to this, most hardware resources are left unused and wasted.
With virtualization, you can abstract away the underlying physical hardware and
replace it with virtual hardware. Thus, you can consolidate multiple virtual
machines (VMs) on top of a single physical server and effectively share
computing resources among those VMs.
Eliminating compatibility issues
Traditional data centers house computer systems which run on a variety of
models of servers, which may lead to hardware compatibility issues during data
center disaster recovery. In such case, you need to install similar hardware on
both the production center and DR site to prevent disaster recovery from failing.
However, building a DR site with hardware equipment similar to that used in the
primary site can be a costly option.
With virtualization, on the other hand, a VM can be easily recovered to any
hardware. Physical hardware does not have to be compatible on both sites to
efficiently perform data center disaster recovery. All you need is a remote
location with a few physical servers which are properly set up and ready to take
on the production workload, should there be any need.1
1 JessieReed, Data Center Disaster Recovery: A Complete Guide, 2019.
18. 17
Conducting successful data center disaster recovery
Virtualization makes it easier to protect your critical data and applications by
creating VM backups and replicas and storing them at a remote location.
Virtualization also allows you to easily move VMs from one server to another,
without affecting your VM performance or data integrity. As a result, if a disaster
affects your data center, you can rapidly move the production workload to a DR
site and resume your operations there.
Currently, the virtualization market provides many different backup and recovery
solutions which allow you to schedule backup and replication jobs, conduct
failover and failback testing, and completely automate the DR process.
Non-disruptive testing
Even once you have built a DR site and designed a comprehensive DR plan, there
is still a high risk of failing at data center disaster recovery. Thus, you need to
conduct DR plan testing in order to verify that the data center disaster recovery
plan is functional, identify any issues and inconsistencies in your DR plan, and
later update it accordingly. Third-party data protection solutions allow for testing
DR strategies even during working hours without affecting your production
environment.
Ensuring cost-efficiency
With virtualization, organizations can reduce the expenses of purchasing and
maintaining physical hardware in data centers. Due to the efficient use of
available physical resources, you can build a DR site which requires less
equipment, takes up less physical space, and is easy to maintain.
Moreover, virtualization can considerably reduce the data center footprint,
meaning that to support DR activities, you now need a smaller number of
physical servers, less networking hardware, and fewer server racks. Essentially,
the smaller your data center footprint, the higher your ability to successfully
recover will be during data center disaster recovery.
Minimized downtime
If a disaster strikes a traditional on-premises data center, it generally takes weeks
or even months, depending on the exact nature of the damage caused, to
resume operations and restore the production center to its original state. In
contrast, the time spent on recovery of a virtualized data center is significantly
shorter because you can easily back up critical data and applications, store them
in a remote location, and rapidly fail over to a DR site, should disaster strike.
Many data protection solutions can even enable you to automate the DR process
from start to finish, thus reducing downtime and minimizing its impact on your
productivity.
How to design a Data Center Disaster Recovery Plan:
19. 18
Creating a data center disaster recovery plan is extremely important as it may
affect the outcome of your disaster recovery. In order to design a comprehensive
data center disaster recovery plan, you first need to conduct operational risk
assessment and business impact analysis. As a result, you will be able to identify
the risks that your data center is most exposed to, measure their possible impact
on your productivity, and evaluate the preparedness of your infrastructure for
data center disaster recovery.
Thanks to this plan, you can determine which recovery objectives are most
appropriate for your business, which DR strategies work best for a specific DR
scenario, and which data and applications should be considered most critical for
your virtual environment and, thus, be recovered first.
Keep in mind that a data center disaster recovery plan doesn’t simply provide
you with guidelines on how to respond to a disaster. A comprehensive data
center disaster recovery plan should include the measures and procedures
required for preventing a DR event from occurring, detecting potential threats
and risks, and mitigating vulnerabilities of your data center.
In order to present such control measures, you need to periodically test your
data center disaster recovery plan and check the preparedness of your data
center facility for an actual DR event. This way, you will be able to identify
inconsistencies in your plan and improve the DR preparedness of your data
center and IT infrastructure.
6. Methodology and Procedures:
Information technology Business continuity and Crises recovery planning has
become a popular topic due increased information system interdependency.
Organizations cannot afford downtime due to primarily for financial reasons.
Methodologies have emerged to guide organizations through planning,
implementation, and maintenance lifecycle phases. Execution is addressed
from a theoretical point of view, Business interruptions due to the unplanned
downtime of IT systems will always remain a risk. Good preparation is the
best defense, and will help ensure responses are timely, effective, and
errorfree. Preparedness begins with developing emergency operating
procedures (EOPs) for all identified high-risk failure scenarios, such as the loss
of a chiller plant, failure of the generator to start, and so on. Escalation
procedures also need to be developed and rehearsed to ensure the chain of
20. 19
command is informed and the appropriate resources are brought to bear as
the situation develops. Scenario drills should be regularly conducted to
rehearse and evaluate both team and individual emergency response
effectiveness. Once an incident has been dealt with and its effects mitigated,
an analysis should be conducted to understand what the root causes were
and how effective the emergency response was in dealing with the problem.
Formal failure analysis for significant facility events is a fundamental part of
the overall continuous improvement process that is needed to reduce failures
and improve response effectiveness in future events.
In this proposal, we explain how such crises occurred and how to prevent and
overcome them through techniques and methods that must be followed
before crises occur to reduce crises and during their occurrence.
The case study relied heavily on design science approach as well as previously
gathered knowledge working with data centers and data center hardware.
7. Ethical Considerations:
All ethical standards and legal procedures were observed in proposal.
8. Delimitation and Limitation:
8.1 Delimitation:
As can be seen from the above, in practice it is often quite difficult, if not
impossible to distinguish between an emergency situation, a crisis and a
catastrophe. When a large-scale incident occurs, it places varying demands
on certain agencies in certain time intervals. What one agency defines as a
catastrophe another will define as an ongoing crisis or emergency situation.
each crisis represents a unique event that requires a unique combination of
needs and demands in terms of providing an adequate response.
Besides, crises occur in totally differing contexts, which makes attempts at
comparison extremely difficult.
21. 20
8.2 Limitations
Limitations include:
Difficulty identifying actors
Limitations on what may be discussed due to risk of liability
Degraded memory of actual events and policies active at the time of the
incident
Very little data on dealing with public affairs crisis in a multi-service overseas
environment exists. However, crises in such atmospheres routinely make
headlines. Each crisis is different, circumstances are influenced by time,
location and intensity, and the task of developing comprehensive plans is
daunting. Many crises plan provide check lists or lessons learned from actual
situations but fail to take into account scientific methods for discovering
trends and using empirical data to control and predict future actions.
22. 21
Reference:
Partio, A, 2017, DATA CENTER DISASTER RECOVERY & MAJOR INCIDENT
MANAGEMENT, Lahti University of Applied Sciences, Finland.
Hamidovic, H, 2012 An Introduction to Crisis Management, ISACA
JOURNAL, Volume 5.
Rashid, A, 2019, Data Center Architecture Overview, National Academy
for Planning and Development MINISTRY OF PLANNING, Volume 28.
Whitehead, B, 2014, Deborah Andrews, Assessing the environmental
impact of data centers part 1: Background, energy use and metrics,
ELSEVIER JOURNAL.
Aktouf, 1992, Management and Theories of Organizations in the 1990s:
Toward A Critical Radical Humanism? Academy of Management.
Keya, A, CRISIS AND CRISIS MANAGEMENT IN ORGANIZATIONS, St. Paul's
University, Kenya.
British Standards Institution, PAS 200:2011: crisis management– Guidance
and good practice.
Holman, E, 2011, Managing a Disaster: Getting Started with IT Crisis
Management and Emergency Response Teams.
Baubion, C. 2013, OECD Risk Management: Strategic Crisis Management.
RNOLD, R. 2007 Crisis Management Planning and Execution.
Juneja, P, 2015, Disaster Recovery and Crisis Management.
University Business: Three Steps for Managing Serious Challenges
Bahan, C, 2003. The Disaster Recovery Plan [referenced 15.10.2016].
Available at
https://www.sans.org/readingroom/whitepapers/recovery/disaster-
recovery-plan-1164.
Juneja, P, 2015, Disaster Recovery and Crisis Management. Available at
https://www.managementstudyguide.com/disaster-recovery-and-crisis-
management.htm.
Reed, J, 2019, Data Center Disaster Recovery: A Complete Guide,
Available at
https://www.nakivo.com/blog/data-center-disaster-recovery-a-complete-
guide/