SlideShare a Scribd company logo
1 of 34
Download to read offline
AnInformation Technology Wake-up Call
Disaster Recovery Planning
Impact to Capital Markets Technology
& Data Center Critical Infrastructure
TECHNICAL WHITE PAPER
Assess and Mitigate Risk and Vulnerabilities to Business Continuity and Disaster Recovery
In the New York Metropolitan Area
Vincent Pelly
Scott Haglund
Sophie Pascal, Contributing Editor
Table of Contents
Executive Summary...................................................................................................................................1
What happens to Business when the lights go out?.................................................................................1
Intended Audience and Structure.............................................................................................................2
Keeping Business in Business....................................................................................................................3
The unanticipated hidden risks.................................................................................................................3
Lessons Learned........................................................................................................................................4
Review of past events are key to effective Disaster Recovery .................................................................4
Risks to the Critical Infrastructure ............................................................................................................6
Climate Conditions and Patterns ..............................................................................................................6
Seismic Activity and Risk...........................................................................................................................7
Electrical Distribution and the Power Grid ...............................................................................................8
Data Center Reliability Classification ......................................................................................................10
Best Practices..........................................................................................................................................11
For Infrastructure Design........................................................................................................................11
High Availability & Disaster Recovery.....................................................................................................11
RTO and RPO...........................................................................................................................................11
Expectations for Continuous Availability ................................................................................................12
Virtualization...........................................................................................................................................13
Replication and Network Bandwidth......................................................................................................14
Database Replication ..............................................................................................................................14
Types of Backup Recovery and Replication Architectures......................................................................15
Disaster Recovery Site Selection.............................................................................................................16
Summary and Recommendations...........................................................................................................17
Best Practices..........................................................................................................................................18
Business Continuity Management Framework.......................................................................................19
Elements of Business Recovery Planning................................................................................................20
FEMA Flood Maps...................................................................................................................................21
Appendix A..............................................................................................................................................22
FEMA Flood Hazard Mapping - HIGH......................................................................................................22
FEMA Flood Hazard Mapping - HIGH (cont’d) ........................................................................................23
FEMA Flood Hazard Mapping - LOW.......................................................................................................24
FEMA Flood Hazard Mapping - LOW (cont’d).........................................................................................25
FEMA Flood Hazard Mapping - LOW (cont’d).........................................................................................26
Appendix B..............................................................................................................................................27
Natural Disaster Risk Profiles for Data Centers ......................................................................................27
Natural Disaster Risk Profiles for Data Centers (cont’d).........................................................................28
Appendix C..............................................................................................................................................29
East Coast Liquidity Venues....................................................................................................................29
Works Cited & References......................................................................................................................30
About Citihub..........................................................................................................................................31
About the Authors ..................................................................................................................................31
1
Executive Summary
What happens to Business when the lights go out?
In the aftermath of Hurricane Sandy, significant flooding to coastal areas caused a majority of
the Northeastern United States to be left without commercial electricity. Many businesses lost
power because their buildings were located in zones that were flooded with seawater and
because the main electrical panels were located below the rising water level. Generators that
supported data centers weren’t able to supply fuel because pumps were located in flooded
basements. Firms that had not pre-purchased fuel or secured delivery contracts for their backup
generators were unable to operate their data centers beyond fuel storage capacity, and firms
that did pre-purchase fuel could not receive deliveries due to flooded roadways. Employees
were unable to access their offices, critical staff members were unable to travel to offsite
recovery locations because government mandates forbade access to roadways for non-
essential personnel, and customers were unable to complete online transactions. The overall
impact of Hurricane Sandy was evaluated at between $30 billion and $50 billion.1
The numerous failures to IT mission critical infrastructure brought immediate attention to some
very important design flaws in Recovery plans and processes today. The design flaws identify
that data center facilities are vulnerable, leaving Business exposed to outages it cannot afford.
The objective of this white paper is to provide senior executives with an overview of Disaster
Recovery preparedness as well as the potential risks and vulnerabilities that exist in critical
infrastructure, specifically in the New York metropolitan area. It will also help senior executives to
become aware of critical details that may not be covered in their current Disaster Recovery plans.
We at Citihub believe in the importance of having an end-to-end Business Continuity solution
that includes not only a tested and validated data center and infrastructure design, but also the
ability to provide staff with remote access to the key applications needed to continue operations.
The recommendations listed in this white paper outline high-level frameworks designed for
addressing business systems redundancy. It will also demonstrate how to significantly reduce
data loss by using various design principles and best practices to obtain the best Disaster
Recovery system to support Business requirements.
Although the target industry is financial services, this paper can serve as a primary reference for
building the appropriate Disaster Recovery solution for any company, regardless of industry or
geography.
Finally, this paper will offer a long-term business case for addressing critical vulnerabilities as
well as factors that senior executives should take into consideration when setting priorities
regarding critical infrastructure. This will ensure Business Continuity and prevent loss of
revenue in the event of another major outage.
1
http://online.wsj.com/article/SB10001424052970204712904578092663774022062.html?mod=googlenews_wsj
2
Intended Audience and Structure
This white paper is intended to help senior management and senior-level executives of financial
services institutions navigate the Business Continuity and Disaster Recovery landscape. It
outlines successful implementation strategies and best practices, and assumes that readers
have basic knowledge of networks and infrastructure, as well as awareness of the geographical
specificity of their businesses.
Citihub will examine how site selection, power, cooling, and inadequacies within the system
recovery architecture can contribute to the data centers risk of downtime. The analysis will
explore specific data center infrastructure vulnerabilities, and suggest recommendations and
best practices that identify and remediate gaps within the infrastructure to minimize downtime
and achieve the highest possible return on investment.
3
Keeping Business in Business
The unanticipated hidden risks
The technological ecosystem supporting financial markets relies heavily on centralized data
centers, infrastructure and communication networks as the core processing engines of capital
markets. Uninterrupted operations are critical to the daily operations of the financial services
industry, serving e-commerce, market data and pricing, matching engines, settlements and
other critical systems, transactions and data that enable sell-side and buy-side firms to maintain
worldwide market liquidity.
Firms are at risk when disruption to the IT infrastructure occurs; systems are down and
information is unavailable, adversely impacting business operations. Financial markets including
retail banks and institutional securities firms require reliable and consistent operations to
support front and back office systems, particularly settlement and clearing firms that process
open transactions and communications with customers, counterparties and third parties.
Disruptions to daily operations can prevent the ability of financial institutions to manage liquidity,
which can increase financial risk to their organizations.
These are some of the business and technical drivers behind the design and implementation of
robust Disaster Recovery plans that should be considered in priority when selecting proper
backup sites and developing sound Recovery management processes.
Examples of system outages that should be considered when designing business and system
resiliency plans:
 Isolated failures caused by software, hardware errors or recent system upgrades that were
not fully tested
 External outages to telecommunications and electrical feeds caused by inadvertent damage
to primary lines
 Loss of critical infrastructure and mechanical and electrical systems, as well as failure of
backup systems to provide continues operations
 Wide-spread outages caused by natural disasters and catastrophic events
Immediate threats and consequences of not having a Disaster Recovery plan:
 Loss of revenue and of customer confidence, and damage to the corporate brand and
reputation can arise from the inability of clients to access systems and account information
or execute transactions
 Cost to restore operations to normal state; without proper planning and Disaster Recovery
management, this can be expensive
 Potential fines or fees can be imposed for non-compliance related to unprepared resiliency
plans resulting from extended outages2
2
Dodd-Frank H.R. 4173 – 316 “(ii)establish and maintain emergency procedures, backup facilities, and a plan for disaster recovery”
4
Lessons Learned
Review of past events are key to effective Disaster Recovery
Business today has not fully internalized the significant findings of this paper dated almost ten
years ago.
During the past 12 years the East coast of the United States, in particular the Northeast and the
New York metropolitan area, has experienced several widespread power outages related to
extreme weather conditions that have greatly impacted technology infrastructure. These events
confirm that our IT critical infrastructure is vulnerable to regional disruption (power outages,
climate change and natural disasters) as demonstrated from the increase of wide scale and
regional disruption over the past decade.
In response to these events, IT executives have planned accordingly by revising Business
Continuity plans and introducing alternative backup sites, such as tertiary sites in geographical
regions that are outside the location of the primary corporate site. Within the financial services
community, senior industry leaders along with the Federal Reserve Board, OCC and SEC
issued in 2005 an interagency white paper3
that described best practices to strengthen the
resiliency of U.S. financial services post 9/11. The paper stressed the critical importance of
protecting the financial system from new risks associated with widespread outages by focusing
on the following high-level Business Continuity objectives:
 Rapid recovery and timely resumption of critical operations
 Key staff to resume critical operations in one major operating location
 Comprehensive testing that demonstrates effective internal and external continuity
arrangements
3Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System, September 2005,
www.sec.gov/rules/concept/34-46432.htm
“Firms that play significant roles in critical financial markets should maintain sufficient
geographically dispersed resources, including staff, equipment and data to recover clearing
and settlement activities within the business day on which a disruption occurs. Firms may
consider the costs and benefits of a variety of approaches that ensure rapid recovery from a
wide-scale disruption. However, if a backup site relies largely on staff from the primary site, it
is critical for the firm to determine how staffing needs at the backup site would be met if a
disruption results in loss or inaccessibility of staff at the primary site.”
- Federal Reserve White Paper on the Resiliency of the U.S. Financial System, 2005
5
The results of the Federal interagency white paper, as well as the analyses and discussions
held with financial industry technology experts and practitioners, show that sound practices
based on the above key points have resulted in the development and implementation of best
practices regarding Business Continuity. It is understood industry wide that many firms at the
time did not embrace the urgency of the report, mostly for cost considerations. But today they
can no longer be ignored.
On the strength of that interagency paper and in reviewing past and recent events, it is
imperative that these key points be taken into consideration when designing and building the
Disaster Recovery architecture:
 Performing a top down assessment of critical business activities that are mapped to
supporting IT systems and key staff members
 Prioritise systems to recover first and assign required support staff for a potentially limited
capability in recovery mode
 Establishing a crisis management team who will coordinate activities and make prioritisation
calls on the ground. Critical time is often lost in the decision making process to invoke a
Disaster Recovery plan.
 Having a solid Recovery plan around established backup site(s) for data centers and all key
business staff that is separate from the core processing location.
 Periodically test back-up systems and network connectivity, and perform application role
swaps on a scheduled basis to ensure Recovery plans function properly.
Comprehensive Disaster Recovery testing should be end-to-end and involve telecommunication
firms, third-party service providers and securities exchanges, as well as vetting of the business
process and the proper activation sequence for application systems. It should also serve to
familiarize business users with operational procedures in unusual situations.
6
Risks to the Critical Infrastructure
Climate Conditions and Patterns
“NOAA estimated approximately $1 billion in damage that occurred in 2011 from 12-14 major events”4
- NOAA 2012
A significant concern when reviewing an organization’s primary and recovery site is the
geographic vulnerability to severe weather. Using tools and resources available from FEMA, the
National Oceanic and Atmospheric Administration5
, and historical weather patterns can provide
data on locations that have had consistent damage due to severe weather.
Below is a summary of the NOAA 2011 and 2012 National Events Map for the U.S.
Significant U.S. Weather and Climate Events
As outlined in the Uptime Institute Natural Disaster Risk Profiles6
, the summary of risk profiles
located in Appendix B outlines the risks to data center sites geographically associated with
severe weather. The impact to the data center in or near the storm path should expect
disruption, as well as minor to severe infrastructure damage when subject to the following
natural disasters:
 Tornado
 Hurricane
 Earthquake
 Ice Storm
 Blizzard
 Thunderstorm
 Lightning
 Flood
For detailed FEMA flood maps of the New York Metro area, please refer to Appendix B. Listed
are the primary locations of critical data centers serving financial services in Appendix A.
4
http://www.noaa.gov/extreme2011/index.html
5
NOAA National Climatic Data Center, State of the Climate: National Overview for Annual 2012, published online December 2012 from
http://www.ncdc.noaa.gov/sotc/national/2012/13.
6
Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
2011 2012
7
Seismic Activity and Risk
Historically, earthquakes and seismic activity are a rare occurrence in the New York
metropolitan area, with the exception of the 2011 Virginia Earthquake7
that produced tremors
throughout the New York area. Although no damage or outages occurred during the 2011 event,
it’s a best practice to evaluate seismic activity when selecting a primary and recovery site.
The following graphs summarize historically the impact, magnitude and spread of Seismic
activity for the U.S. and New York area. 8
U.S. & New York Area Seismic Hazard Map
Source: USGS
7 http://en.wikipedia.org/wiki/2011_Virginia_earthquake
8 United States Geological Survey, http://earthquake.usgs.gov/earthquakes/states/new_york/hazards.php
8
Electrical Distribution and the Power Grid
When planning for alternative Disaster Recovery backup locations, as well as performing a risk
and vulnerabilities assessment on the primary site, another key area of concern relates to the
location of the power utilities and the major interconnections of the power grid. This type of
assessment becomes critical when planning for 2N9
redundancy for primary and secondary
locations. In order to lower the risk of localized power outages, a full disclosure of the locations
of power stations, substations and feeds to the facility, as well as the redundancy within the
feeds, is necessary to determine where electrical power gaps may exist.
9
When referring to the data center utility feed, a 2N system contains double the amount needed that run separately with no single points of
failure.
U.S. Electrical Grid and Power Plants
“The U.S. electric grid is a complex network of independently owned and operated
power plants and transmission lines.”
- NPR, Visualizing The U.S. Electric Grid
Source: NPR
9
Data Center Components
The critical items within a data center contain a number of systems that control and run the
electrical and mechanical components necessary for successful operation. Many of these
systems are tied into the Building Management Systems (BMS), and others are directly linked to
IT monitoring systems. Within the past two years, the industry has taken the stance that both
BMS and IT critical systems should be managed and monitored by a single system and reported
via a holistic dashboard. These systems are part of the building envelope and each contain a
set of core delivery mechanisms and risk profiles.
During Hurricane Sandy, many of the critical systems, specifically the electrical and mechanical
(M&E), were severely damaged due to the fact that storm surge water entered the basements
and took down main electrical panels, water and fuel pumps, etc. Many data centers with
generator fuel pumps located in basements had difficulty starting up backup generators, and in
some cases fuel had to be manually delivered to generators located on higher floors (via the
bucket brigade).
In addition to the M&E systems, other common infrastructure dependencies required to
maintaining operations during a recovery period are generally related to the operations of the
telecommunications infrastructure. During a widespread outage it is critical that the
telecommunications infrastructure remain intact across the United States. Firms can mitigate
this risk by implementing resiliency through the use of circuit diversity and routing when
establishing geographically dispersed facilities.
Source: Citihub
10
Data Center Reliability Classification
Several data center industry experts have defined reliability classifications for the data center infrastructure. The term reliability refers to a
variety of subjects including availability, durability and quality, as to how the data center has been engineered. The following five
performance-based metrics have been defined to classify the reliability of the data center based on the Building Industry Consulting Services
International (BICSI) standard for IT systems10
.
Class F0 Class F1 Class F2 Class F3 Class F4
Single Path without Alternate
Power Source
Single Path Single Path with Redundant
Components
Concurrently Maintainable Fault Tolerant
 Class F0 support basic
environmental and energy
requirements of the IT functions
without supplementary equipment
 Capital cost avoidance is the major
driver
 There is a high risk of downtime
due to planned and unplanned
events
 Class F0 facilities maintenance
performed during non-scheduled
hours, and downtime of several
hours or even days has minimum
impact on the mission
 Critical power distribution system
separate from the general use
power systems would not exist
 No back-up generator system
 The system might deploy power
conditioning or surge protective
devices to allow the specific
equipment to function adequately
(utility grade power does not meet
the basic requirements of critical
equipment)
 No for power or air conditioning
 Class F1 support the basic
environmental and energy
requirements of the IT functions
 There is high risk of downtime due to
planned and unplanned events
 Class F1 facilities maintenance can
be performed during non-scheduled
hours, and the impact of downtime is
relatively low
 The critical power distribution system
would deploy a power conditioning
device to allow the critical equipment
to function adequately (utility grade
power does not meet the basic
requirements of critical equipment)
 No redundancy of any kind would be
used for power or air conditioning for
a similar reason
 Class F2 provide level of reliability
higher than Class F1 to reduce the
risk of downtime due to component
failure
 Class F2 facilities there is a moderate
risk of downtime due to planned and
unplanned events
 Maintenance activities can typically
be performed during unscheduled
hours
 The critical power system would need
redundancy in those parts of the
electrical distribution system that are
most likely to fail
 These would include any products
that have a high parts count or
moving parts, such as UPS, controls,
air conditioning, generators or ATS
 In addition, it may be appropriate to
specify premium quality devices that
provide longer life or better reliability
 Class F3 provide additional
reliability and maintainability to
reduce the risk of downtime due to
natural disasters, human-driven
disasters, planned maintenance,
and repair activities
 Maintenance and repair activities
will typically need to be performed
during full production time with no
opportunity for curtailed operations
 Critical power system in a Class F3
facility must provide for reliable,
continuous power even when major
components (or, where necessary,
major subsystems) are out of
service for repair or maintenance
 To protect against unplanned
downtime, the power system must
be able to sustain operations while
a dependent component or
subsystem is out of service
 Class F4 eliminate downtime
through the application of all
tactics to provide continuous
operation regardless of planned or
unplanned activities
 All recognizable single points of
failure from the point of connection
to the utility to the point of
connection to the critical loads are
eliminated
 Systems are typically automated
to reduce the chances for human
error and are staffed 24×7
 Rigorous training is provided for
the staff to handle any
contingency
 Compartmentalization and fault
tolerance are prime requirements
for a Class F4 facility
 Critical power system in a Class
F4 facility must provide for
reliable, continuous power even
when major components (or,
where necessary, major
subsystems) are out of service for
repair or maintenance
 To protect against unplanned
downtime, the power system must
be able to sustain operations while
a dependent component or
subsystem is out of service
10
BICSI Standards for Data Centers, https://www.bicsi.org/default.aspx
11
Best Practices
For Infrastructure Design
High Availability & Disaster Recovery
High Availability and Disaster Recovery are both concepts related to Business Continuity. But
whereas Business Continuity applies to the whole business (including IT), HA & DR typically are
more related to IT Continuity, as part of overall Business Continuity. High Availability solutions
mainly address outages at a single site, while Disaster Recovery solutions mainly address sudden,
site-wide disasters. High Availability and Disaster Recovery objectives and metrics are different.
A highly available site provides resiliency from errors of the underlying platform and single points
of failure. Availability encompasses reliability, recovery, and failure. One of the most common
measures of availability is the percentage of time that a given system is active and working. The
following table correlates the percentage of availability to calendar time equivalents.
Acceptable Uptime Downtime Per day Downtime Per month Downtime Per year
99% 14.40 minutes 7 hours 3.65 days
99.9% 86.40 seconds 43 minutes 8.77 hours
99.99% 8.64 seconds 4 minutes 52.60 minutes
99.999% 0.86 seconds 26 seconds 5.26 minutes
RTO and RPO
RTO is the elapsed time from service interruption until service is restored. It answers the
question: "How long can you be without service?" RTO represents a time limit that cannot be
exceeded without facing severe consequences. A unified High Availability and Disaster Recovery
approach would establish both an uptime objective and an RTO for each service.
RPO, on the other hand, is the point of time represented by the data upon service resumption. It
answers the question: "How old can the data be?"
12
Expectations for Continuous Availability
Data Replication
The two basic methods of data replication are synchronous and asynchronous. In general
terms, synchronous capabilities are used for shorter distances, and asynchronous capabilities
are used for longer distances. The method chosen depends on Business Recovery
requirements.
Synchronous replication ensures that a remote copy of the data, identical to the primary copy,
is created at the time the primary copy is updated. In synchronous replication, an update
operation is not considered done until completion is confirmed at both the primary and
secondary site. An incomplete operation is rolled back at both locations, ensuring that the
remote copy is always an exact mirror image of the primary.
Asynchronous replication places data updates in a queue on the primary server. However, it
does not wait for the update acknowledgments on the secondary server. So, all data that did not
have time to be copied across the network on the secondary server are lost if the first server
fails. Application data may be lost in this type of failure.
Most companies cannot tolerate more than a few hours or even minutes of downtime without
serious impact to the bottom line. Synchronous data replication may be the appropriate solution
for companies seeking the fastest possible data recovery, minimal data loss, and protection
against database integrity problems.
13
Virtualization
Virtualization makes it possible to implement Disaster Recovery plans at a significantly lower
cost. Since virtual machines are hardware-independent, any physical server can be used as a
recovery target for any virtual machine. As virtualization also makes it possible to consolidate
workloads onto fewer servers, organizations can significantly reduce the cost of hardware for
Disaster Recovery by reducing the number of servers needed at the primary site.
Many organizations have already embraced the benefits of virtualization, as it can add
tremendous value to Disaster Recovery planning. Before virtualization, Disaster Recovery was
often too expensive to implement, and many organizations chose only to protect the most
critical applications. Consolidating multiple physical servers as virtual hosts significantly reduces
the amount of physical servers that need to be recovered in the event of an outage.
14
Replication and Network Bandwidth
Network bandwidth can also introduce challenges to data replication strategies. It’s important to
understand the amount of changed data that can occur within a given period of time. Depending
on the rate of changed data in a given system, one can determine the amount of bandwidth
needed. This period of time is referred to as the replication latency window. The network
bandwidth guideline below can assist with these calculations.
Database Replication
Database replication is similar to database mirroring. These solutions use production database
transaction logs to maintain a current copy of the production database on a standby server. In
the event of a server outage, the database replication software, automatically switches the
standby database into the production database. There are traditionally no restrictions on where
the databases can reside, provided that they can communicate with each other.
Synchronous replication however, does have some drawbacks. It has a theoretical distance
limitation of 200 kilometres (km) or 124 miles, but the practical distance limitation for a busy
system could be as little as 50(km) or 30 miles.
Estimated Hours To Replicate Capacity
Network 20 GB 80 GB 120 GB 200 GB 300 GB 730 GB
T1 42.33 169.31 253.97 423.28 634.92 1544.97
10Base-T LAN 6.50 26.01 39.01 65.02 97.52 237.31
DS3 / T3 1.50 6.02 9.03 15.05 22.57 54.93
100Base-T LAN 0.65 2.60 3.90 6.50 9.75 23.73
OC3 0.42 1.68 2.52 4.19 6.29 15.31
OC12 0.10 0.42 0.63 1.05 1.57 3.82
15
Types of Backup Recovery and Replication Architectures
Choosing the best suited backup and recovery option for an organization can be challenging.
Traditionally, businesses request little to no downtime when recovering from a disaster or other
type of outage. Implementing these types of solutions may represent a sizable investment.
Management will have to decide which recovery option best fits the organization’s needs,
particularly in relation to risk assessment, compliance and other requirements, as outlined
earlier in this paper.
Single Site
Backup and
Recovery
Multi-Site
Asynchronous Data
Replication
Multi-Site
Synchronous Data
Replication
Cloud Backup and
Recovery
 Backups and snapshots
required for off-site
storage must be created
periodically
 Data can only be as up-
to-date as the last
backup; daily, weekly or
monthly.
 Recovery is limited to
the point in time of the
last backup
 Asynchronous replication is
supported by disk arrays,
networks and host based
replication products
 Changes to data are committed to
the source first, then buffered or
journaled and sent to the
replication target(s)
 It's designed to work over long
distances and greatly reduces
bandwidth requirements
 This can introduce delays that are
nearly instantaneous to several
hours, dependant on network
latency
 There is also no guarantee that
the secondary system will have
the most recent copy of the data if
the primary fails
 Used primarily for high-end
transactional applications that
require instantaneous failover if
the primary node fails.
 With synchronous replication,
data is written to the primary
and secondary storage
systems at the same time, and
is not complete until it is
acknowledged by both local
and remote storage systems.
 Synchronous replication
requires considerable
bandwidth, which makes it also
more expensive.
 Applications and data remain
on-premises in this approach,
with data being backed up into
the cloud and restored onto on-
premises hardware when a
disaster occurs.
 In other words, the backup in
the cloud becomes a substitute
for tape-based off-site
backups.
 Many backup software vendors
now provide options to directly
back up to popular cloud
service providers such as
AT&T, Amazon, Microsoft and
Rackspace.
16
Disaster Recovery Site Selection
During the process of assessing the type of backup recovery and replication architecture, one of
the key critical components is the disaster recovery site selection. Using leading industry best
practices, the following recommendations provide guidance during a disaster recovery data
center site selection. In general, primary and backup sites should not be subjected to the same
threat profile (severe weather risks, same power grid, and flood zones).
 Disaster Recovery sites should be located a significant distance11
from the primary site
 Proven practices suggest a minimum of 50 to 200 miles from the primary data center,
though neither the SEC or FSA12
are specific to any mandates required
 Leading Disaster Recovery practices indicate between 200 and 800 miles, provided there
are no technical limitations imposed by solution architectures such as low latency /
algorithmic trading, synchronous replication, and fiber channel distance limitations
 Avoid flood prone areas, major airport flight paths, earthquake areas and ensure diversity of
power feeds
 Mitigate key man risk by ensuring labor pool resiliency (data center staff and application
recovery resources) and creating appropriate documentation for cross regional training
11
2003 SEC guidelines on Disaster Recovery (http://www.sec.gov/news/studies/34-47638.htm
12
FSA BCM guide (http://www.fsa.gov.uk/pubs/other/bcm_guide.pdf
50 to 200
miles
Google Earth Imagery 2013: Blue/Red pins (data centers), Red area (0 – 25 miles) / Yellow Area (25-200 Miles) marginal / Green (200-800 Miles)
17
Summary and Recommendations
Target Focus Areas
When performing an evaluation and assessment of IT critical infrastructure, certain issues
should be addressed in order to properly frame and design a sound Business Recovery plan.
The following interview questions can be used as a guide when assessing an environment:
1. Can the IT infrastructure be trusted to withstand a major disruption?
2. Has the resiliency of the Data Center, Network and Compute environment been
proven?
3. Has a Disaster Recovery test been performed recently? Were the critical business
applications included in the last test? What were the results?
4. Have the business requirements been mapped to the IT infrastructure via a top-down
review?
5. Does management fully understand the regulatory ramifications of not adhering to
sound business recovery plans?
If those questions cannot be answered, then the business may be at risk of failure because of
its inability to recover production systems.
Citihub would recommend an end-to-end assessment of IT infrastructure, along with an in-depth
review of business continuity plans.
A detailed infrastructure assessment of the Disaster Recovery plan and processes should
include the following:
 A thorough review of the existing primary and backup data centers, as well as the network
and compute infrastructure, and the Disaster Recovery plan designs and architecture
 An assessment of critical backup systems and confirmation that generator fuel pumps are
not located in high risk areas such as basement buildings in flood zones
 Review schedules for regular backup exercises and confirmation of failover procedures;
confirmation that critical power has been tested and generators are functioning with
sufficient fuel levels.
 A review of regional and local FEMA flood zone maps (US), or the international equivalent,
to determine the level of acceptable risk for data centres and critical systems
 An understanding of fuel delivery schedules and the assurance that contracts are in place
for emergency fuel delivery, taking into consideration that hospitals and emergency facilities
have priority for fuel deliveries
 A review of the backup data center location, making sure that the site is outside the primary
geographic area and on separate utility grids if possible.
 The education of teams for preparedness, so they react proactively and at the appropriate
time (not delay in switching to backup power in the middle of the event)
 An evaluation of service provider backup plans to identify dependency risks
 The evaluation of remote access procedures and support systems; confirmation of sufficient
capacity to support key staff working remotely.
18
Best Practices
To help spearhead a Business Continuity Management plan and a Disaster Recovery program,
the following best practices can drive awareness of the critical nature of these processes as well
as help senior management establish or revise existing plans and eliminate gaps.
 Establish a planning group to develop resiliency designs and recovery strategies
 Build management awareness by establishing Key Performance Indicators (KPI) for Disaster
Recovery to include the following:
- Status of previous Disaster Recovery events/tests with periodic reports to
senior management
- Other core IT competencies that are critical to Disaster Recovery planning
- Periodic tests to verify implementation of the Disaster Recovery plan and
reports about gaps and risks
- A review process that includes the deployment of new solutions
 Perform Risk Assessments and Audits that will:
- Complete top-down inventory assessment of all critical assets required to
sustain operations
- Review process structure assessments, audits, and reports
- Assess gap and risks from previous events or audits
- Create implementation plan to eliminate gaps
- Document Disaster Recovery plan actions and escalation procedures
- Build comprehensive training material
- Develop test verification criteria and procedures
 Separate people from technology and confirm business processes that require onsite staff to
resume operations
 Establish real remote access strategy for staff who are unable to commute during severe
weather conditions
19
Business Continuity Management Framework
Source: Citihub Business Continuity Management and Disaster Recovery Framework
20
Elements of Business Recovery Planning
The business process assessment for determining critical areas of recovery begins with a top-
down review as shown below. This approach confirms the technical infrastructure and
dependencies associated with each business process.
The above process enables end-to-end mapping of dependencies critical to providing an
understanding of the key components that make up an application system. In order to determine
business unit IT needs, and provide a gap analysis against IT capabilities, Citihub has
developed a business impact analysis methodology on critical processes and the IT systems
which support them.
The three areas of focus are:
Business Unit Overview Process Summary Application Requirements
Summary
The business unit overview and
readiness heat map is used to capture
business process criticality and IT
capability readiness in the event of a
catastrophic outage
The key process summary examines
business processes and rates the
impact of a sustained outage on the
business on three dimensions:
Operational Impact, Financial Impact
and Reputational Impact
The application requirements gap
analysis section summarizes the
applications each business unit
requires and provides a RAG status
when compared against IT capabilities
Source: Citihub
Source: Citihub Business Impact Analysis Methodology
21
FEMA Flood Maps
Appendix B illustrates one of the more critical vulnerabilities that exist within the New York
metropolitan area. The storm surge during Hurricane Sandy13
, which caused major flooding in
parts of the region, impacted critical systems in the core BMS and data center M&E, as well as
transportation infrastructures in and out of New York City and the Tri-State area.
The maps are ranked high to low by impact due to flooding and storm surge severity.
Rank Risk Impact Mitigation
LOW  No impact due to storm surge  None  Ensure redundancy site is active and tested
MEDIUM  Storm surge impact can occur
but unlikely
 Partial or no building damage
and/or access to main entrance
 Ensure redundancy site is active and tested
 Recovery plans activated
HIGH  Storm surge impact is severe  Damage to main electrical
switch gear and/or generators or
fuel pumps
 Ensure redundancy site is active and tested
 Recovery plans activated
 Staff plan activated
13
http://www.nhc.noaa.gov/refresh/graphics_at3+shtml/030345.shtml?gm_esurge
22
Appendix A
FEMA Flood Hazard Mapping - HIGH
New York Locations: Lower Manhattan and 55 Water Street
New York Locations: 25 Broadway and 32 Ave of the Americas
New Jersey Locations: 410 Commerce Blvd. and 760 Washington Ave.
23
FEMA Flood Hazard Mapping - HIGH (cont’d)
New Jersey Locations: 545 Washington Blvd. and 755 Secaucus Road
New Jersey Locations: 15 Enterprise Ave. North and 300 Boulevard East
24
FEMA Flood Hazard Mapping - LOW
New York Locations: 111 8th Ave. and 360 Hamilton Ave., White Plains
New York Locations: 480 North Bedford Road, Chappaqua and 11 Skyline Drive, Hawthorne
25
FEMA Flood Hazard Mapping - LOW (cont’d)
New Jersey Locations: 1400 Federal Blvd. and 3003 Woodbridge Ave.
New Jersey Locations: 165 Halsey Street and 100 Delawanna Ave
26
FEMA Flood Hazard Mapping - LOW (cont’d)
Chicago Locations: 350 East Cermak, Chicago, IL and 2905 Diehl Road, Aurora IL
27
Appendix B
Natural Disaster Risk Profiles for Data Centers
Type On-Site Off-Site Impact
Tornado In or near the storm path,
expect disruption and minor to
severe infrastructure damage
In or near the storm path,
expect disruption and minor to
severe infrastructure damage
 Advanced warning of tornado potential but no site specific warning
 Employees remain at site
 Duration is brief although intense
 Roof and outside equipment (cooling towers, etc.) damaged or destroyed
 Potential damage to the building structure
 Loss of local utility and communications
Hurricane In or near the storm path,
expect disruption and minor to
severe infrastructure damage
Expect severe region-wide
damage to public infrastructure,
utilities and communications
 Significant advanced warning
 Duration is hours to a few days
 Employees may require evacuation from site
 Post-storm security may be required
 Emergency supplies needed for at least several days
 Roof and outside equipment (cooling towers, etc.) damaged or destroyed
 Potential damage to the building structure
 Loss of local utility and communications
 Repair to regional damage may require days, weeks or longer for massive
reconstruction of electric power transmission or distribution facilities
 Potential for off-sit public infrastructure damage
Earthquake Expect catastrophic damage
and disruption to data centers
near the epicenter and
infrastructure damage to data
centers further away
Expect severe region-wide
damage to public infrastructure,
utilities and communications
 No warning
 Brief duration with the threat of continued aftershocks
 Employees may be unable to leave site
 Emergency supplies needed for several days of operation
 Building structural damage
 Toppling of un-braced computer hardware and site infrastructure equipment including
collapse of raised floor
 Site may be isolated for an extended period
 Highways and bridges may be damaged or destroyed preventing movement of diesel
fuel and other operating supplies required for continues operation
 Power and communications may sustain extensive damage requiring days, weeks or
longer to repair
Source: Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
28
Natural Disaster Risk Profiles for Data Centers (cont’d)
Type On-Site Off-Site Impact
Ice Storm /
Blizzard
Expect some disruption or
failure of data center if outside
equipment is not designed to
survive severe ice and snow
accumulation`
Expect severe region-wide
damage to public infrastructure,
utilities and communications
 Several days warning generally expected
 Storm or multiple storms may last several days with accumulative effects
 Employees may be unable to leave or enter site
 Emergency supplies needed for at least several days
 Ice damage to structure and outside equipment
 Roof failure from excessive snow load
 Potential freezing of pipes
 Loss of overhead power and /or communications lines over large areas may require
several days, weeks or longer to repair
 Roads dangerous or impassable
Thunderstorm /
Lightning
Expect disruption ranging from
disaster to no impact
depending on distance to
lightning strike and proper
operation of surge
suppression, UPS, and engine-
generator systems
Expect frequent momentary
public utility disruptions from
lightning strikes hitting the
electric power transmission grid
 Special sensors can provide minutes of storm approach warning
 Duration is brief but may recur daily during thunderstorm season
 Frequent UPS battery discharges shorten remaining battery life
 Extended power interruption if utility service is overhead or radial and a nearby
lightning strike causes protective devices to open
 Possible flooding and roof leakage
 Momentary under voltages can affect hundreds of square miles
 Fires started by lightning can destroy public infrastructure located in rural areas
Flood Expect catastrophic damage
and disruption to data centers
in severe flood areas or with
infrastructure systems below
grade
Expect severe region-wide
damage to public infrastructure,
utilities and communications
 Several day warning generally expected
 Employees may be unable to leave site
 Emergency supplies needed for at least several days operation
 Site infrastructure damage requiring days to weeks to repair
 Site may be isolated for an extended period
 Highways and bridges may be damaged preventing movement of diesel fuel and
other operating supplies required for continues operation
 Power and communications may sustain extensive damage requiring days, weeks or
longer to repair
Source: Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
29
Appendix C
East Coast Liquidity Venues
The New York metro area is responsible for approximately 94% of the volume of shares traded
for the US cash equity market14
. The following maps illustrate the major liquidity venues in the
New York and Chicago metropolitan locations.
New York
Chicago
14
http://www.batstrading.com/market_data/daily_volume/
30
Works Cited & References
Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System,
September 2005. http://www.sec.gov/news/studies/34-47638.htm
Dodd-Frank H.R. 4173 Wall Street Reform and Consumer Protection Act, January 2010
National Public Radio (NPR), Visualizing The U.S. Electric Grid, April 24
th
2009,
www.npr.org/templates/story/story.php?storyId=110997398
NOAA National Climatic Data Center, State of the Climate: National Overview for Annual 2012, published
online December 2012 from http://www.ncdc.noaa.gov/sotc/national/2012/13.
Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
United States Geological Survey, http://earthquake.usgs.gov/earthquakes/states/new_york/hazards.php
BICSI Standards for Data Centers, https://www.bicsi.org/default.aspx
Colocation Selection, best practices and critical considerations for choosing the right data center
colocation solution. Bill Kleyman, Cloud and Virtualization Architect, October 2012
Climate Change and Infrastructure, Urban Systems, and Vulnerabilities, Technical Report for the U.S.
Department of Energy in Support of the National Climate Assessment, February 29, 2012
The historic nor’easter of 13-14 March 2010, Richard H. Grumm, National Weather Service
31
About Citihub
Founded in 1998, Citihub provides IT expertise to some of the world’s leading enterprise
organizations and is comprised of industry veterans who relish the challenge of complex
technology and cultural change. We take a fresh approach to the technical challenges of today
and believe in partnering with our clients through change. Citihub clients include Investment
Banks, Hedge Funds, Media, and Manufacturing.
About the Authors
Vincent Pelly
Vincent Pelly is an Associate Partner at Citihub with more than 30 years of experience across
the financial services industry with specialization in infrastructure, program management and IT
strategy. He has extensive experience managing large enterprise projects in infrastructure and
data center advisory and technology implementation, and has managed large infrastructure
transformation programs.
Scott Haglund
Scott Haglund is an independent consultant with more than 30 years of experience in the
development and execution of global infrastructure strategy, architecture, transformation,
technology roadmaps, optimization, and service delivery standards for the enterprise. He
specializes in data center automation strategies, and has led many enterprise infrastructure
transformation programs.

More Related Content

What's hot

It infrastructure management
It infrastructure managementIt infrastructure management
It infrastructure managementShoaib Patel
 
Delphix_Analyst_Report_Aite_Sept_2014
Delphix_Analyst_Report_Aite_Sept_2014Delphix_Analyst_Report_Aite_Sept_2014
Delphix_Analyst_Report_Aite_Sept_2014James Spafford
 
Business Continuation - The basics according to John Small 2014-02-21
Business Continuation - The basics according to John Small 2014-02-21Business Continuation - The basics according to John Small 2014-02-21
Business Continuation - The basics according to John Small 2014-02-21Business As Usual, Inc.
 
Excel In Managing Spreadsheet Risk
Excel In Managing Spreadsheet RiskExcel In Managing Spreadsheet Risk
Excel In Managing Spreadsheet Riskgreghawes
 
Excel In Managing Spreadsheet Risk Presentation
Excel In Managing Spreadsheet Risk PresentationExcel In Managing Spreadsheet Risk Presentation
Excel In Managing Spreadsheet Risk Presentationgreghawes
 
Business Risk: Effective Technology Protecting Your Business
Business Risk: Effective Technology Protecting Your BusinessBusiness Risk: Effective Technology Protecting Your Business
Business Risk: Effective Technology Protecting Your Businessat MicroFocus Italy ❖✔
 
Enterprise_Architecture_and_Disaster_Recovery_Planning
Enterprise_Architecture_and_Disaster_Recovery_PlanningEnterprise_Architecture_and_Disaster_Recovery_Planning
Enterprise_Architecture_and_Disaster_Recovery_PlanningDavid Rudawitz
 
Riding The Technology Wave - Effective Dashboard Data Visualization
Riding The Technology Wave - Effective Dashboard Data VisualizationRiding The Technology Wave - Effective Dashboard Data Visualization
Riding The Technology Wave - Effective Dashboard Data VisualizationLisa McCorkle, Ph.D.
 
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...Alessandro Vigilante
 
The Use of Spreadsheets in Commodity Trading – 2015
The Use of Spreadsheets in Commodity Trading – 2015The Use of Spreadsheets in Commodity Trading – 2015
The Use of Spreadsheets in Commodity Trading – 2015CTRM Center
 
Information system of hbl
Information system of hblInformation system of hbl
Information system of hblkinza999
 
Ipm executive-summary-vi-federal
Ipm executive-summary-vi-federalIpm executive-summary-vi-federal
Ipm executive-summary-vi-federalJohn McDonald
 
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034Maru Schietekat
 

What's hot (15)

It infrastructure management
It infrastructure managementIt infrastructure management
It infrastructure management
 
Delphix_Analyst_Report_Aite_Sept_2014
Delphix_Analyst_Report_Aite_Sept_2014Delphix_Analyst_Report_Aite_Sept_2014
Delphix_Analyst_Report_Aite_Sept_2014
 
Business Continuation - The basics according to John Small 2014-02-21
Business Continuation - The basics according to John Small 2014-02-21Business Continuation - The basics according to John Small 2014-02-21
Business Continuation - The basics according to John Small 2014-02-21
 
Excel In Managing Spreadsheet Risk
Excel In Managing Spreadsheet RiskExcel In Managing Spreadsheet Risk
Excel In Managing Spreadsheet Risk
 
The Architecture for Rapid Decisions
The Architecture for Rapid DecisionsThe Architecture for Rapid Decisions
The Architecture for Rapid Decisions
 
MTW03011USEN.PDF
MTW03011USEN.PDFMTW03011USEN.PDF
MTW03011USEN.PDF
 
Excel In Managing Spreadsheet Risk Presentation
Excel In Managing Spreadsheet Risk PresentationExcel In Managing Spreadsheet Risk Presentation
Excel In Managing Spreadsheet Risk Presentation
 
Business Risk: Effective Technology Protecting Your Business
Business Risk: Effective Technology Protecting Your BusinessBusiness Risk: Effective Technology Protecting Your Business
Business Risk: Effective Technology Protecting Your Business
 
Enterprise_Architecture_and_Disaster_Recovery_Planning
Enterprise_Architecture_and_Disaster_Recovery_PlanningEnterprise_Architecture_and_Disaster_Recovery_Planning
Enterprise_Architecture_and_Disaster_Recovery_Planning
 
Riding The Technology Wave - Effective Dashboard Data Visualization
Riding The Technology Wave - Effective Dashboard Data VisualizationRiding The Technology Wave - Effective Dashboard Data Visualization
Riding The Technology Wave - Effective Dashboard Data Visualization
 
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...
IDC (sponsored by COLT Telecom): High Quality Network: A Prerequisite for Uni...
 
The Use of Spreadsheets in Commodity Trading – 2015
The Use of Spreadsheets in Commodity Trading – 2015The Use of Spreadsheets in Commodity Trading – 2015
The Use of Spreadsheets in Commodity Trading – 2015
 
Information system of hbl
Information system of hblInformation system of hbl
Information system of hbl
 
Ipm executive-summary-vi-federal
Ipm executive-summary-vi-federalIpm executive-summary-vi-federal
Ipm executive-summary-vi-federal
 
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034
Delphix_IDC_Analyst_Report_Holistic.pdf-aliId=496034
 

Similar to White paper data center critical infrastructure risk and vulnerabilities

Delphix modernization whitepaper
Delphix  modernization whitepaperDelphix  modernization whitepaper
Delphix modernization whitepaperFranco_Dagosto
 
Big data-comes-of-age ema-9sight
Big data-comes-of-age ema-9sightBig data-comes-of-age ema-9sight
Big data-comes-of-age ema-9sightJyrki Määttä
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...Sokho TRINH
 
Whitepaper : Building a disaster ready infrastructure
Whitepaper : Building a disaster ready infrastructureWhitepaper : Building a disaster ready infrastructure
Whitepaper : Building a disaster ready infrastructureJake Weaver
 
Business continuity & disaster recovery
Business continuity & disaster recoveryBusiness continuity & disaster recovery
Business continuity & disaster recoveryGeorge Coutsoumbidis
 
Industrializing investment banking_wp_2
Industrializing investment banking_wp_2Industrializing investment banking_wp_2
Industrializing investment banking_wp_2EMC
 
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...EMC
 
Will You Be Prepared When The Next Disaster Strikes - Whitepaper
Will You Be Prepared When The Next Disaster Strikes - WhitepaperWill You Be Prepared When The Next Disaster Strikes - Whitepaper
Will You Be Prepared When The Next Disaster Strikes - WhitepaperChristian Caracciolo
 
Virtual credit cards helping banks to promote and develop businesses
Virtual credit cards helping banks to promote and develop businessesVirtual credit cards helping banks to promote and develop businesses
Virtual credit cards helping banks to promote and develop businessesPavan367172
 
Ocp cfops-framework.2019v1.4
Ocp cfops-framework.2019v1.4Ocp cfops-framework.2019v1.4
Ocp cfops-framework.2019v1.4ssusereb347d
 
Asset Information and Analytics Drivers of Process Industry Operational Excel...
Asset Information and Analytics Drivers of Process Industry Operational Excel...Asset Information and Analytics Drivers of Process Industry Operational Excel...
Asset Information and Analytics Drivers of Process Industry Operational Excel...Rolta
 
Optimize Your Execution by Aligning Business and IT
Optimize Your Execution by Aligning Business and ITOptimize Your Execution by Aligning Business and IT
Optimize Your Execution by Aligning Business and ITcapstera
 
Aligning business and tech thru capabilities - A capstera thought paper
Aligning business and tech thru capabilities  - A capstera thought paperAligning business and tech thru capabilities  - A capstera thought paper
Aligning business and tech thru capabilities - A capstera thought paperSatyaIluri
 
Thinking like a global financial institution - Account Based Marketing
Thinking like a global financial institution - Account Based MarketingThinking like a global financial institution - Account Based Marketing
Thinking like a global financial institution - Account Based MarketingThe Craft Consulting
 
Mergers and acquisitions
Mergers and acquisitionsMergers and acquisitions
Mergers and acquisitionsGdemichael
 
White Paper Servicios Frost & Sullivan English
White Paper Servicios Frost & Sullivan EnglishWhite Paper Servicios Frost & Sullivan English
White Paper Servicios Frost & Sullivan EnglishFelipe Lamus
 

Similar to White paper data center critical infrastructure risk and vulnerabilities (20)

Delphix modernization whitepaper
Delphix  modernization whitepaperDelphix  modernization whitepaper
Delphix modernization whitepaper
 
Big data-comes-of-age ema-9sight
Big data-comes-of-age ema-9sightBig data-comes-of-age ema-9sight
Big data-comes-of-age ema-9sight
 
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
data-to-insight-to-action-taking-a-business-process-view-for-analytics-to-del...
 
Digital Disruption Commercial Real Estate
Digital Disruption Commercial Real EstateDigital Disruption Commercial Real Estate
Digital Disruption Commercial Real Estate
 
Whitepaper : Building a disaster ready infrastructure
Whitepaper : Building a disaster ready infrastructureWhitepaper : Building a disaster ready infrastructure
Whitepaper : Building a disaster ready infrastructure
 
Business continuity & disaster recovery
Business continuity & disaster recoveryBusiness continuity & disaster recovery
Business continuity & disaster recovery
 
Industrializing investment banking_wp_2
Industrializing investment banking_wp_2Industrializing investment banking_wp_2
Industrializing investment banking_wp_2
 
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...
Managing Information Storage: Trends, Challenges, and Options (2013-2014) (Wh...
 
Will You Be Prepared When The Next Disaster Strikes - Whitepaper
Will You Be Prepared When The Next Disaster Strikes - WhitepaperWill You Be Prepared When The Next Disaster Strikes - Whitepaper
Will You Be Prepared When The Next Disaster Strikes - Whitepaper
 
proactive_it_management_eliminating_mean_time_to_surprise
proactive_it_management_eliminating_mean_time_to_surpriseproactive_it_management_eliminating_mean_time_to_surprise
proactive_it_management_eliminating_mean_time_to_surprise
 
Crisis management
Crisis managementCrisis management
Crisis management
 
Virtual credit cards helping banks to promote and develop businesses
Virtual credit cards helping banks to promote and develop businessesVirtual credit cards helping banks to promote and develop businesses
Virtual credit cards helping banks to promote and develop businesses
 
Ocp cfops-framework.2019v1.4
Ocp cfops-framework.2019v1.4Ocp cfops-framework.2019v1.4
Ocp cfops-framework.2019v1.4
 
Asset Information and Analytics Drivers of Process Industry Operational Excel...
Asset Information and Analytics Drivers of Process Industry Operational Excel...Asset Information and Analytics Drivers of Process Industry Operational Excel...
Asset Information and Analytics Drivers of Process Industry Operational Excel...
 
Optimize Your Execution by Aligning Business and IT
Optimize Your Execution by Aligning Business and ITOptimize Your Execution by Aligning Business and IT
Optimize Your Execution by Aligning Business and IT
 
Aligning business and tech thru capabilities - A capstera thought paper
Aligning business and tech thru capabilities  - A capstera thought paperAligning business and tech thru capabilities  - A capstera thought paper
Aligning business and tech thru capabilities - A capstera thought paper
 
Thinking like a global financial institution - Account Based Marketing
Thinking like a global financial institution - Account Based MarketingThinking like a global financial institution - Account Based Marketing
Thinking like a global financial institution - Account Based Marketing
 
Mergers and acquisitions
Mergers and acquisitionsMergers and acquisitions
Mergers and acquisitions
 
Mergers and acquisitions for screen
Mergers and acquisitions for screenMergers and acquisitions for screen
Mergers and acquisitions for screen
 
White Paper Servicios Frost & Sullivan English
White Paper Servicios Frost & Sullivan EnglishWhite Paper Servicios Frost & Sullivan English
White Paper Servicios Frost & Sullivan English
 

Recently uploaded

NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdfKhaled Al Awadi
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchirictsugar
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfJos Voskuil
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCRashishs7044
 
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadIslamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadAyesha Khan
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportMintel Group
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...lizamodels9
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africaictsugar
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy Verified Accounts
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Timedelhimodelshub1
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 

Recently uploaded (20)

NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdfNewBase  19 April  2024  Energy News issue - 1717 by Khaled Al Awadi.pdf
NewBase 19 April 2024 Energy News issue - 1717 by Khaled Al Awadi.pdf
 
Marketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent ChirchirMarketplace and Quality Assurance Presentation - Vincent Chirchir
Marketplace and Quality Assurance Presentation - Vincent Chirchir
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Digital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdfDigital Transformation in the PLM domain - distrib.pdf
Digital Transformation in the PLM domain - distrib.pdf
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR8447779800, Low rate Call girls in Saket Delhi NCR
8447779800, Low rate Call girls in Saket Delhi NCR
 
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in IslamabadIslamabad Escorts | Call 03274100048 | Escort Service in Islamabad
Islamabad Escorts | Call 03274100048 | Escort Service in Islamabad
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
India Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample ReportIndia Consumer 2024 Redacted Sample Report
India Consumer 2024 Redacted Sample Report
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
Call Girls In Sikandarpur Gurgaon ❤️8860477959_Russian 100% Genuine Escorts I...
 
Kenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby AfricaKenya’s Coconut Value Chain by Gatsby Africa
Kenya’s Coconut Value Chain by Gatsby Africa
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Buy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail AccountsBuy gmail accounts.pdf Buy Old Gmail Accounts
Buy gmail accounts.pdf Buy Old Gmail Accounts
 
Call Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any TimeCall Girls Miyapur 7001305949 all area service COD available Any Time
Call Girls Miyapur 7001305949 all area service COD available Any Time
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 

White paper data center critical infrastructure risk and vulnerabilities

  • 1. AnInformation Technology Wake-up Call Disaster Recovery Planning Impact to Capital Markets Technology & Data Center Critical Infrastructure TECHNICAL WHITE PAPER Assess and Mitigate Risk and Vulnerabilities to Business Continuity and Disaster Recovery In the New York Metropolitan Area Vincent Pelly Scott Haglund Sophie Pascal, Contributing Editor
  • 2. Table of Contents Executive Summary...................................................................................................................................1 What happens to Business when the lights go out?.................................................................................1 Intended Audience and Structure.............................................................................................................2 Keeping Business in Business....................................................................................................................3 The unanticipated hidden risks.................................................................................................................3 Lessons Learned........................................................................................................................................4 Review of past events are key to effective Disaster Recovery .................................................................4 Risks to the Critical Infrastructure ............................................................................................................6 Climate Conditions and Patterns ..............................................................................................................6 Seismic Activity and Risk...........................................................................................................................7 Electrical Distribution and the Power Grid ...............................................................................................8 Data Center Reliability Classification ......................................................................................................10 Best Practices..........................................................................................................................................11 For Infrastructure Design........................................................................................................................11 High Availability & Disaster Recovery.....................................................................................................11 RTO and RPO...........................................................................................................................................11 Expectations for Continuous Availability ................................................................................................12 Virtualization...........................................................................................................................................13 Replication and Network Bandwidth......................................................................................................14 Database Replication ..............................................................................................................................14 Types of Backup Recovery and Replication Architectures......................................................................15 Disaster Recovery Site Selection.............................................................................................................16 Summary and Recommendations...........................................................................................................17 Best Practices..........................................................................................................................................18 Business Continuity Management Framework.......................................................................................19 Elements of Business Recovery Planning................................................................................................20 FEMA Flood Maps...................................................................................................................................21 Appendix A..............................................................................................................................................22 FEMA Flood Hazard Mapping - HIGH......................................................................................................22 FEMA Flood Hazard Mapping - HIGH (cont’d) ........................................................................................23 FEMA Flood Hazard Mapping - LOW.......................................................................................................24
  • 3. FEMA Flood Hazard Mapping - LOW (cont’d).........................................................................................25 FEMA Flood Hazard Mapping - LOW (cont’d).........................................................................................26 Appendix B..............................................................................................................................................27 Natural Disaster Risk Profiles for Data Centers ......................................................................................27 Natural Disaster Risk Profiles for Data Centers (cont’d).........................................................................28 Appendix C..............................................................................................................................................29 East Coast Liquidity Venues....................................................................................................................29 Works Cited & References......................................................................................................................30 About Citihub..........................................................................................................................................31 About the Authors ..................................................................................................................................31
  • 4. 1 Executive Summary What happens to Business when the lights go out? In the aftermath of Hurricane Sandy, significant flooding to coastal areas caused a majority of the Northeastern United States to be left without commercial electricity. Many businesses lost power because their buildings were located in zones that were flooded with seawater and because the main electrical panels were located below the rising water level. Generators that supported data centers weren’t able to supply fuel because pumps were located in flooded basements. Firms that had not pre-purchased fuel or secured delivery contracts for their backup generators were unable to operate their data centers beyond fuel storage capacity, and firms that did pre-purchase fuel could not receive deliveries due to flooded roadways. Employees were unable to access their offices, critical staff members were unable to travel to offsite recovery locations because government mandates forbade access to roadways for non- essential personnel, and customers were unable to complete online transactions. The overall impact of Hurricane Sandy was evaluated at between $30 billion and $50 billion.1 The numerous failures to IT mission critical infrastructure brought immediate attention to some very important design flaws in Recovery plans and processes today. The design flaws identify that data center facilities are vulnerable, leaving Business exposed to outages it cannot afford. The objective of this white paper is to provide senior executives with an overview of Disaster Recovery preparedness as well as the potential risks and vulnerabilities that exist in critical infrastructure, specifically in the New York metropolitan area. It will also help senior executives to become aware of critical details that may not be covered in their current Disaster Recovery plans. We at Citihub believe in the importance of having an end-to-end Business Continuity solution that includes not only a tested and validated data center and infrastructure design, but also the ability to provide staff with remote access to the key applications needed to continue operations. The recommendations listed in this white paper outline high-level frameworks designed for addressing business systems redundancy. It will also demonstrate how to significantly reduce data loss by using various design principles and best practices to obtain the best Disaster Recovery system to support Business requirements. Although the target industry is financial services, this paper can serve as a primary reference for building the appropriate Disaster Recovery solution for any company, regardless of industry or geography. Finally, this paper will offer a long-term business case for addressing critical vulnerabilities as well as factors that senior executives should take into consideration when setting priorities regarding critical infrastructure. This will ensure Business Continuity and prevent loss of revenue in the event of another major outage. 1 http://online.wsj.com/article/SB10001424052970204712904578092663774022062.html?mod=googlenews_wsj
  • 5. 2 Intended Audience and Structure This white paper is intended to help senior management and senior-level executives of financial services institutions navigate the Business Continuity and Disaster Recovery landscape. It outlines successful implementation strategies and best practices, and assumes that readers have basic knowledge of networks and infrastructure, as well as awareness of the geographical specificity of their businesses. Citihub will examine how site selection, power, cooling, and inadequacies within the system recovery architecture can contribute to the data centers risk of downtime. The analysis will explore specific data center infrastructure vulnerabilities, and suggest recommendations and best practices that identify and remediate gaps within the infrastructure to minimize downtime and achieve the highest possible return on investment.
  • 6. 3 Keeping Business in Business The unanticipated hidden risks The technological ecosystem supporting financial markets relies heavily on centralized data centers, infrastructure and communication networks as the core processing engines of capital markets. Uninterrupted operations are critical to the daily operations of the financial services industry, serving e-commerce, market data and pricing, matching engines, settlements and other critical systems, transactions and data that enable sell-side and buy-side firms to maintain worldwide market liquidity. Firms are at risk when disruption to the IT infrastructure occurs; systems are down and information is unavailable, adversely impacting business operations. Financial markets including retail banks and institutional securities firms require reliable and consistent operations to support front and back office systems, particularly settlement and clearing firms that process open transactions and communications with customers, counterparties and third parties. Disruptions to daily operations can prevent the ability of financial institutions to manage liquidity, which can increase financial risk to their organizations. These are some of the business and technical drivers behind the design and implementation of robust Disaster Recovery plans that should be considered in priority when selecting proper backup sites and developing sound Recovery management processes. Examples of system outages that should be considered when designing business and system resiliency plans:  Isolated failures caused by software, hardware errors or recent system upgrades that were not fully tested  External outages to telecommunications and electrical feeds caused by inadvertent damage to primary lines  Loss of critical infrastructure and mechanical and electrical systems, as well as failure of backup systems to provide continues operations  Wide-spread outages caused by natural disasters and catastrophic events Immediate threats and consequences of not having a Disaster Recovery plan:  Loss of revenue and of customer confidence, and damage to the corporate brand and reputation can arise from the inability of clients to access systems and account information or execute transactions  Cost to restore operations to normal state; without proper planning and Disaster Recovery management, this can be expensive  Potential fines or fees can be imposed for non-compliance related to unprepared resiliency plans resulting from extended outages2 2 Dodd-Frank H.R. 4173 – 316 “(ii)establish and maintain emergency procedures, backup facilities, and a plan for disaster recovery”
  • 7. 4 Lessons Learned Review of past events are key to effective Disaster Recovery Business today has not fully internalized the significant findings of this paper dated almost ten years ago. During the past 12 years the East coast of the United States, in particular the Northeast and the New York metropolitan area, has experienced several widespread power outages related to extreme weather conditions that have greatly impacted technology infrastructure. These events confirm that our IT critical infrastructure is vulnerable to regional disruption (power outages, climate change and natural disasters) as demonstrated from the increase of wide scale and regional disruption over the past decade. In response to these events, IT executives have planned accordingly by revising Business Continuity plans and introducing alternative backup sites, such as tertiary sites in geographical regions that are outside the location of the primary corporate site. Within the financial services community, senior industry leaders along with the Federal Reserve Board, OCC and SEC issued in 2005 an interagency white paper3 that described best practices to strengthen the resiliency of U.S. financial services post 9/11. The paper stressed the critical importance of protecting the financial system from new risks associated with widespread outages by focusing on the following high-level Business Continuity objectives:  Rapid recovery and timely resumption of critical operations  Key staff to resume critical operations in one major operating location  Comprehensive testing that demonstrates effective internal and external continuity arrangements 3Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System, September 2005, www.sec.gov/rules/concept/34-46432.htm “Firms that play significant roles in critical financial markets should maintain sufficient geographically dispersed resources, including staff, equipment and data to recover clearing and settlement activities within the business day on which a disruption occurs. Firms may consider the costs and benefits of a variety of approaches that ensure rapid recovery from a wide-scale disruption. However, if a backup site relies largely on staff from the primary site, it is critical for the firm to determine how staffing needs at the backup site would be met if a disruption results in loss or inaccessibility of staff at the primary site.” - Federal Reserve White Paper on the Resiliency of the U.S. Financial System, 2005
  • 8. 5 The results of the Federal interagency white paper, as well as the analyses and discussions held with financial industry technology experts and practitioners, show that sound practices based on the above key points have resulted in the development and implementation of best practices regarding Business Continuity. It is understood industry wide that many firms at the time did not embrace the urgency of the report, mostly for cost considerations. But today they can no longer be ignored. On the strength of that interagency paper and in reviewing past and recent events, it is imperative that these key points be taken into consideration when designing and building the Disaster Recovery architecture:  Performing a top down assessment of critical business activities that are mapped to supporting IT systems and key staff members  Prioritise systems to recover first and assign required support staff for a potentially limited capability in recovery mode  Establishing a crisis management team who will coordinate activities and make prioritisation calls on the ground. Critical time is often lost in the decision making process to invoke a Disaster Recovery plan.  Having a solid Recovery plan around established backup site(s) for data centers and all key business staff that is separate from the core processing location.  Periodically test back-up systems and network connectivity, and perform application role swaps on a scheduled basis to ensure Recovery plans function properly. Comprehensive Disaster Recovery testing should be end-to-end and involve telecommunication firms, third-party service providers and securities exchanges, as well as vetting of the business process and the proper activation sequence for application systems. It should also serve to familiarize business users with operational procedures in unusual situations.
  • 9. 6 Risks to the Critical Infrastructure Climate Conditions and Patterns “NOAA estimated approximately $1 billion in damage that occurred in 2011 from 12-14 major events”4 - NOAA 2012 A significant concern when reviewing an organization’s primary and recovery site is the geographic vulnerability to severe weather. Using tools and resources available from FEMA, the National Oceanic and Atmospheric Administration5 , and historical weather patterns can provide data on locations that have had consistent damage due to severe weather. Below is a summary of the NOAA 2011 and 2012 National Events Map for the U.S. Significant U.S. Weather and Climate Events As outlined in the Uptime Institute Natural Disaster Risk Profiles6 , the summary of risk profiles located in Appendix B outlines the risks to data center sites geographically associated with severe weather. The impact to the data center in or near the storm path should expect disruption, as well as minor to severe infrastructure damage when subject to the following natural disasters:  Tornado  Hurricane  Earthquake  Ice Storm  Blizzard  Thunderstorm  Lightning  Flood For detailed FEMA flood maps of the New York Metro area, please refer to Appendix B. Listed are the primary locations of critical data centers serving financial services in Appendix A. 4 http://www.noaa.gov/extreme2011/index.html 5 NOAA National Climatic Data Center, State of the Climate: National Overview for Annual 2012, published online December 2012 from http://www.ncdc.noaa.gov/sotc/national/2012/13. 6 Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications 2011 2012
  • 10. 7 Seismic Activity and Risk Historically, earthquakes and seismic activity are a rare occurrence in the New York metropolitan area, with the exception of the 2011 Virginia Earthquake7 that produced tremors throughout the New York area. Although no damage or outages occurred during the 2011 event, it’s a best practice to evaluate seismic activity when selecting a primary and recovery site. The following graphs summarize historically the impact, magnitude and spread of Seismic activity for the U.S. and New York area. 8 U.S. & New York Area Seismic Hazard Map Source: USGS 7 http://en.wikipedia.org/wiki/2011_Virginia_earthquake 8 United States Geological Survey, http://earthquake.usgs.gov/earthquakes/states/new_york/hazards.php
  • 11. 8 Electrical Distribution and the Power Grid When planning for alternative Disaster Recovery backup locations, as well as performing a risk and vulnerabilities assessment on the primary site, another key area of concern relates to the location of the power utilities and the major interconnections of the power grid. This type of assessment becomes critical when planning for 2N9 redundancy for primary and secondary locations. In order to lower the risk of localized power outages, a full disclosure of the locations of power stations, substations and feeds to the facility, as well as the redundancy within the feeds, is necessary to determine where electrical power gaps may exist. 9 When referring to the data center utility feed, a 2N system contains double the amount needed that run separately with no single points of failure. U.S. Electrical Grid and Power Plants “The U.S. electric grid is a complex network of independently owned and operated power plants and transmission lines.” - NPR, Visualizing The U.S. Electric Grid Source: NPR
  • 12. 9 Data Center Components The critical items within a data center contain a number of systems that control and run the electrical and mechanical components necessary for successful operation. Many of these systems are tied into the Building Management Systems (BMS), and others are directly linked to IT monitoring systems. Within the past two years, the industry has taken the stance that both BMS and IT critical systems should be managed and monitored by a single system and reported via a holistic dashboard. These systems are part of the building envelope and each contain a set of core delivery mechanisms and risk profiles. During Hurricane Sandy, many of the critical systems, specifically the electrical and mechanical (M&E), were severely damaged due to the fact that storm surge water entered the basements and took down main electrical panels, water and fuel pumps, etc. Many data centers with generator fuel pumps located in basements had difficulty starting up backup generators, and in some cases fuel had to be manually delivered to generators located on higher floors (via the bucket brigade). In addition to the M&E systems, other common infrastructure dependencies required to maintaining operations during a recovery period are generally related to the operations of the telecommunications infrastructure. During a widespread outage it is critical that the telecommunications infrastructure remain intact across the United States. Firms can mitigate this risk by implementing resiliency through the use of circuit diversity and routing when establishing geographically dispersed facilities. Source: Citihub
  • 13. 10 Data Center Reliability Classification Several data center industry experts have defined reliability classifications for the data center infrastructure. The term reliability refers to a variety of subjects including availability, durability and quality, as to how the data center has been engineered. The following five performance-based metrics have been defined to classify the reliability of the data center based on the Building Industry Consulting Services International (BICSI) standard for IT systems10 . Class F0 Class F1 Class F2 Class F3 Class F4 Single Path without Alternate Power Source Single Path Single Path with Redundant Components Concurrently Maintainable Fault Tolerant  Class F0 support basic environmental and energy requirements of the IT functions without supplementary equipment  Capital cost avoidance is the major driver  There is a high risk of downtime due to planned and unplanned events  Class F0 facilities maintenance performed during non-scheduled hours, and downtime of several hours or even days has minimum impact on the mission  Critical power distribution system separate from the general use power systems would not exist  No back-up generator system  The system might deploy power conditioning or surge protective devices to allow the specific equipment to function adequately (utility grade power does not meet the basic requirements of critical equipment)  No for power or air conditioning  Class F1 support the basic environmental and energy requirements of the IT functions  There is high risk of downtime due to planned and unplanned events  Class F1 facilities maintenance can be performed during non-scheduled hours, and the impact of downtime is relatively low  The critical power distribution system would deploy a power conditioning device to allow the critical equipment to function adequately (utility grade power does not meet the basic requirements of critical equipment)  No redundancy of any kind would be used for power or air conditioning for a similar reason  Class F2 provide level of reliability higher than Class F1 to reduce the risk of downtime due to component failure  Class F2 facilities there is a moderate risk of downtime due to planned and unplanned events  Maintenance activities can typically be performed during unscheduled hours  The critical power system would need redundancy in those parts of the electrical distribution system that are most likely to fail  These would include any products that have a high parts count or moving parts, such as UPS, controls, air conditioning, generators or ATS  In addition, it may be appropriate to specify premium quality devices that provide longer life or better reliability  Class F3 provide additional reliability and maintainability to reduce the risk of downtime due to natural disasters, human-driven disasters, planned maintenance, and repair activities  Maintenance and repair activities will typically need to be performed during full production time with no opportunity for curtailed operations  Critical power system in a Class F3 facility must provide for reliable, continuous power even when major components (or, where necessary, major subsystems) are out of service for repair or maintenance  To protect against unplanned downtime, the power system must be able to sustain operations while a dependent component or subsystem is out of service  Class F4 eliminate downtime through the application of all tactics to provide continuous operation regardless of planned or unplanned activities  All recognizable single points of failure from the point of connection to the utility to the point of connection to the critical loads are eliminated  Systems are typically automated to reduce the chances for human error and are staffed 24×7  Rigorous training is provided for the staff to handle any contingency  Compartmentalization and fault tolerance are prime requirements for a Class F4 facility  Critical power system in a Class F4 facility must provide for reliable, continuous power even when major components (or, where necessary, major subsystems) are out of service for repair or maintenance  To protect against unplanned downtime, the power system must be able to sustain operations while a dependent component or subsystem is out of service 10 BICSI Standards for Data Centers, https://www.bicsi.org/default.aspx
  • 14. 11 Best Practices For Infrastructure Design High Availability & Disaster Recovery High Availability and Disaster Recovery are both concepts related to Business Continuity. But whereas Business Continuity applies to the whole business (including IT), HA & DR typically are more related to IT Continuity, as part of overall Business Continuity. High Availability solutions mainly address outages at a single site, while Disaster Recovery solutions mainly address sudden, site-wide disasters. High Availability and Disaster Recovery objectives and metrics are different. A highly available site provides resiliency from errors of the underlying platform and single points of failure. Availability encompasses reliability, recovery, and failure. One of the most common measures of availability is the percentage of time that a given system is active and working. The following table correlates the percentage of availability to calendar time equivalents. Acceptable Uptime Downtime Per day Downtime Per month Downtime Per year 99% 14.40 minutes 7 hours 3.65 days 99.9% 86.40 seconds 43 minutes 8.77 hours 99.99% 8.64 seconds 4 minutes 52.60 minutes 99.999% 0.86 seconds 26 seconds 5.26 minutes RTO and RPO RTO is the elapsed time from service interruption until service is restored. It answers the question: "How long can you be without service?" RTO represents a time limit that cannot be exceeded without facing severe consequences. A unified High Availability and Disaster Recovery approach would establish both an uptime objective and an RTO for each service. RPO, on the other hand, is the point of time represented by the data upon service resumption. It answers the question: "How old can the data be?"
  • 15. 12 Expectations for Continuous Availability Data Replication The two basic methods of data replication are synchronous and asynchronous. In general terms, synchronous capabilities are used for shorter distances, and asynchronous capabilities are used for longer distances. The method chosen depends on Business Recovery requirements. Synchronous replication ensures that a remote copy of the data, identical to the primary copy, is created at the time the primary copy is updated. In synchronous replication, an update operation is not considered done until completion is confirmed at both the primary and secondary site. An incomplete operation is rolled back at both locations, ensuring that the remote copy is always an exact mirror image of the primary. Asynchronous replication places data updates in a queue on the primary server. However, it does not wait for the update acknowledgments on the secondary server. So, all data that did not have time to be copied across the network on the secondary server are lost if the first server fails. Application data may be lost in this type of failure. Most companies cannot tolerate more than a few hours or even minutes of downtime without serious impact to the bottom line. Synchronous data replication may be the appropriate solution for companies seeking the fastest possible data recovery, minimal data loss, and protection against database integrity problems.
  • 16. 13 Virtualization Virtualization makes it possible to implement Disaster Recovery plans at a significantly lower cost. Since virtual machines are hardware-independent, any physical server can be used as a recovery target for any virtual machine. As virtualization also makes it possible to consolidate workloads onto fewer servers, organizations can significantly reduce the cost of hardware for Disaster Recovery by reducing the number of servers needed at the primary site. Many organizations have already embraced the benefits of virtualization, as it can add tremendous value to Disaster Recovery planning. Before virtualization, Disaster Recovery was often too expensive to implement, and many organizations chose only to protect the most critical applications. Consolidating multiple physical servers as virtual hosts significantly reduces the amount of physical servers that need to be recovered in the event of an outage.
  • 17. 14 Replication and Network Bandwidth Network bandwidth can also introduce challenges to data replication strategies. It’s important to understand the amount of changed data that can occur within a given period of time. Depending on the rate of changed data in a given system, one can determine the amount of bandwidth needed. This period of time is referred to as the replication latency window. The network bandwidth guideline below can assist with these calculations. Database Replication Database replication is similar to database mirroring. These solutions use production database transaction logs to maintain a current copy of the production database on a standby server. In the event of a server outage, the database replication software, automatically switches the standby database into the production database. There are traditionally no restrictions on where the databases can reside, provided that they can communicate with each other. Synchronous replication however, does have some drawbacks. It has a theoretical distance limitation of 200 kilometres (km) or 124 miles, but the practical distance limitation for a busy system could be as little as 50(km) or 30 miles. Estimated Hours To Replicate Capacity Network 20 GB 80 GB 120 GB 200 GB 300 GB 730 GB T1 42.33 169.31 253.97 423.28 634.92 1544.97 10Base-T LAN 6.50 26.01 39.01 65.02 97.52 237.31 DS3 / T3 1.50 6.02 9.03 15.05 22.57 54.93 100Base-T LAN 0.65 2.60 3.90 6.50 9.75 23.73 OC3 0.42 1.68 2.52 4.19 6.29 15.31 OC12 0.10 0.42 0.63 1.05 1.57 3.82
  • 18. 15 Types of Backup Recovery and Replication Architectures Choosing the best suited backup and recovery option for an organization can be challenging. Traditionally, businesses request little to no downtime when recovering from a disaster or other type of outage. Implementing these types of solutions may represent a sizable investment. Management will have to decide which recovery option best fits the organization’s needs, particularly in relation to risk assessment, compliance and other requirements, as outlined earlier in this paper. Single Site Backup and Recovery Multi-Site Asynchronous Data Replication Multi-Site Synchronous Data Replication Cloud Backup and Recovery  Backups and snapshots required for off-site storage must be created periodically  Data can only be as up- to-date as the last backup; daily, weekly or monthly.  Recovery is limited to the point in time of the last backup  Asynchronous replication is supported by disk arrays, networks and host based replication products  Changes to data are committed to the source first, then buffered or journaled and sent to the replication target(s)  It's designed to work over long distances and greatly reduces bandwidth requirements  This can introduce delays that are nearly instantaneous to several hours, dependant on network latency  There is also no guarantee that the secondary system will have the most recent copy of the data if the primary fails  Used primarily for high-end transactional applications that require instantaneous failover if the primary node fails.  With synchronous replication, data is written to the primary and secondary storage systems at the same time, and is not complete until it is acknowledged by both local and remote storage systems.  Synchronous replication requires considerable bandwidth, which makes it also more expensive.  Applications and data remain on-premises in this approach, with data being backed up into the cloud and restored onto on- premises hardware when a disaster occurs.  In other words, the backup in the cloud becomes a substitute for tape-based off-site backups.  Many backup software vendors now provide options to directly back up to popular cloud service providers such as AT&T, Amazon, Microsoft and Rackspace.
  • 19. 16 Disaster Recovery Site Selection During the process of assessing the type of backup recovery and replication architecture, one of the key critical components is the disaster recovery site selection. Using leading industry best practices, the following recommendations provide guidance during a disaster recovery data center site selection. In general, primary and backup sites should not be subjected to the same threat profile (severe weather risks, same power grid, and flood zones).  Disaster Recovery sites should be located a significant distance11 from the primary site  Proven practices suggest a minimum of 50 to 200 miles from the primary data center, though neither the SEC or FSA12 are specific to any mandates required  Leading Disaster Recovery practices indicate between 200 and 800 miles, provided there are no technical limitations imposed by solution architectures such as low latency / algorithmic trading, synchronous replication, and fiber channel distance limitations  Avoid flood prone areas, major airport flight paths, earthquake areas and ensure diversity of power feeds  Mitigate key man risk by ensuring labor pool resiliency (data center staff and application recovery resources) and creating appropriate documentation for cross regional training 11 2003 SEC guidelines on Disaster Recovery (http://www.sec.gov/news/studies/34-47638.htm 12 FSA BCM guide (http://www.fsa.gov.uk/pubs/other/bcm_guide.pdf 50 to 200 miles Google Earth Imagery 2013: Blue/Red pins (data centers), Red area (0 – 25 miles) / Yellow Area (25-200 Miles) marginal / Green (200-800 Miles)
  • 20. 17 Summary and Recommendations Target Focus Areas When performing an evaluation and assessment of IT critical infrastructure, certain issues should be addressed in order to properly frame and design a sound Business Recovery plan. The following interview questions can be used as a guide when assessing an environment: 1. Can the IT infrastructure be trusted to withstand a major disruption? 2. Has the resiliency of the Data Center, Network and Compute environment been proven? 3. Has a Disaster Recovery test been performed recently? Were the critical business applications included in the last test? What were the results? 4. Have the business requirements been mapped to the IT infrastructure via a top-down review? 5. Does management fully understand the regulatory ramifications of not adhering to sound business recovery plans? If those questions cannot be answered, then the business may be at risk of failure because of its inability to recover production systems. Citihub would recommend an end-to-end assessment of IT infrastructure, along with an in-depth review of business continuity plans. A detailed infrastructure assessment of the Disaster Recovery plan and processes should include the following:  A thorough review of the existing primary and backup data centers, as well as the network and compute infrastructure, and the Disaster Recovery plan designs and architecture  An assessment of critical backup systems and confirmation that generator fuel pumps are not located in high risk areas such as basement buildings in flood zones  Review schedules for regular backup exercises and confirmation of failover procedures; confirmation that critical power has been tested and generators are functioning with sufficient fuel levels.  A review of regional and local FEMA flood zone maps (US), or the international equivalent, to determine the level of acceptable risk for data centres and critical systems  An understanding of fuel delivery schedules and the assurance that contracts are in place for emergency fuel delivery, taking into consideration that hospitals and emergency facilities have priority for fuel deliveries  A review of the backup data center location, making sure that the site is outside the primary geographic area and on separate utility grids if possible.  The education of teams for preparedness, so they react proactively and at the appropriate time (not delay in switching to backup power in the middle of the event)  An evaluation of service provider backup plans to identify dependency risks  The evaluation of remote access procedures and support systems; confirmation of sufficient capacity to support key staff working remotely.
  • 21. 18 Best Practices To help spearhead a Business Continuity Management plan and a Disaster Recovery program, the following best practices can drive awareness of the critical nature of these processes as well as help senior management establish or revise existing plans and eliminate gaps.  Establish a planning group to develop resiliency designs and recovery strategies  Build management awareness by establishing Key Performance Indicators (KPI) for Disaster Recovery to include the following: - Status of previous Disaster Recovery events/tests with periodic reports to senior management - Other core IT competencies that are critical to Disaster Recovery planning - Periodic tests to verify implementation of the Disaster Recovery plan and reports about gaps and risks - A review process that includes the deployment of new solutions  Perform Risk Assessments and Audits that will: - Complete top-down inventory assessment of all critical assets required to sustain operations - Review process structure assessments, audits, and reports - Assess gap and risks from previous events or audits - Create implementation plan to eliminate gaps - Document Disaster Recovery plan actions and escalation procedures - Build comprehensive training material - Develop test verification criteria and procedures  Separate people from technology and confirm business processes that require onsite staff to resume operations  Establish real remote access strategy for staff who are unable to commute during severe weather conditions
  • 22. 19 Business Continuity Management Framework Source: Citihub Business Continuity Management and Disaster Recovery Framework
  • 23. 20 Elements of Business Recovery Planning The business process assessment for determining critical areas of recovery begins with a top- down review as shown below. This approach confirms the technical infrastructure and dependencies associated with each business process. The above process enables end-to-end mapping of dependencies critical to providing an understanding of the key components that make up an application system. In order to determine business unit IT needs, and provide a gap analysis against IT capabilities, Citihub has developed a business impact analysis methodology on critical processes and the IT systems which support them. The three areas of focus are: Business Unit Overview Process Summary Application Requirements Summary The business unit overview and readiness heat map is used to capture business process criticality and IT capability readiness in the event of a catastrophic outage The key process summary examines business processes and rates the impact of a sustained outage on the business on three dimensions: Operational Impact, Financial Impact and Reputational Impact The application requirements gap analysis section summarizes the applications each business unit requires and provides a RAG status when compared against IT capabilities Source: Citihub Source: Citihub Business Impact Analysis Methodology
  • 24. 21 FEMA Flood Maps Appendix B illustrates one of the more critical vulnerabilities that exist within the New York metropolitan area. The storm surge during Hurricane Sandy13 , which caused major flooding in parts of the region, impacted critical systems in the core BMS and data center M&E, as well as transportation infrastructures in and out of New York City and the Tri-State area. The maps are ranked high to low by impact due to flooding and storm surge severity. Rank Risk Impact Mitigation LOW  No impact due to storm surge  None  Ensure redundancy site is active and tested MEDIUM  Storm surge impact can occur but unlikely  Partial or no building damage and/or access to main entrance  Ensure redundancy site is active and tested  Recovery plans activated HIGH  Storm surge impact is severe  Damage to main electrical switch gear and/or generators or fuel pumps  Ensure redundancy site is active and tested  Recovery plans activated  Staff plan activated 13 http://www.nhc.noaa.gov/refresh/graphics_at3+shtml/030345.shtml?gm_esurge
  • 25. 22 Appendix A FEMA Flood Hazard Mapping - HIGH New York Locations: Lower Manhattan and 55 Water Street New York Locations: 25 Broadway and 32 Ave of the Americas New Jersey Locations: 410 Commerce Blvd. and 760 Washington Ave.
  • 26. 23 FEMA Flood Hazard Mapping - HIGH (cont’d) New Jersey Locations: 545 Washington Blvd. and 755 Secaucus Road New Jersey Locations: 15 Enterprise Ave. North and 300 Boulevard East
  • 27. 24 FEMA Flood Hazard Mapping - LOW New York Locations: 111 8th Ave. and 360 Hamilton Ave., White Plains New York Locations: 480 North Bedford Road, Chappaqua and 11 Skyline Drive, Hawthorne
  • 28. 25 FEMA Flood Hazard Mapping - LOW (cont’d) New Jersey Locations: 1400 Federal Blvd. and 3003 Woodbridge Ave. New Jersey Locations: 165 Halsey Street and 100 Delawanna Ave
  • 29. 26 FEMA Flood Hazard Mapping - LOW (cont’d) Chicago Locations: 350 East Cermak, Chicago, IL and 2905 Diehl Road, Aurora IL
  • 30. 27 Appendix B Natural Disaster Risk Profiles for Data Centers Type On-Site Off-Site Impact Tornado In or near the storm path, expect disruption and minor to severe infrastructure damage In or near the storm path, expect disruption and minor to severe infrastructure damage  Advanced warning of tornado potential but no site specific warning  Employees remain at site  Duration is brief although intense  Roof and outside equipment (cooling towers, etc.) damaged or destroyed  Potential damage to the building structure  Loss of local utility and communications Hurricane In or near the storm path, expect disruption and minor to severe infrastructure damage Expect severe region-wide damage to public infrastructure, utilities and communications  Significant advanced warning  Duration is hours to a few days  Employees may require evacuation from site  Post-storm security may be required  Emergency supplies needed for at least several days  Roof and outside equipment (cooling towers, etc.) damaged or destroyed  Potential damage to the building structure  Loss of local utility and communications  Repair to regional damage may require days, weeks or longer for massive reconstruction of electric power transmission or distribution facilities  Potential for off-sit public infrastructure damage Earthquake Expect catastrophic damage and disruption to data centers near the epicenter and infrastructure damage to data centers further away Expect severe region-wide damage to public infrastructure, utilities and communications  No warning  Brief duration with the threat of continued aftershocks  Employees may be unable to leave site  Emergency supplies needed for several days of operation  Building structural damage  Toppling of un-braced computer hardware and site infrastructure equipment including collapse of raised floor  Site may be isolated for an extended period  Highways and bridges may be damaged or destroyed preventing movement of diesel fuel and other operating supplies required for continues operation  Power and communications may sustain extensive damage requiring days, weeks or longer to repair Source: Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
  • 31. 28 Natural Disaster Risk Profiles for Data Centers (cont’d) Type On-Site Off-Site Impact Ice Storm / Blizzard Expect some disruption or failure of data center if outside equipment is not designed to survive severe ice and snow accumulation` Expect severe region-wide damage to public infrastructure, utilities and communications  Several days warning generally expected  Storm or multiple storms may last several days with accumulative effects  Employees may be unable to leave or enter site  Emergency supplies needed for at least several days  Ice damage to structure and outside equipment  Roof failure from excessive snow load  Potential freezing of pipes  Loss of overhead power and /or communications lines over large areas may require several days, weeks or longer to repair  Roads dangerous or impassable Thunderstorm / Lightning Expect disruption ranging from disaster to no impact depending on distance to lightning strike and proper operation of surge suppression, UPS, and engine- generator systems Expect frequent momentary public utility disruptions from lightning strikes hitting the electric power transmission grid  Special sensors can provide minutes of storm approach warning  Duration is brief but may recur daily during thunderstorm season  Frequent UPS battery discharges shorten remaining battery life  Extended power interruption if utility service is overhead or radial and a nearby lightning strike causes protective devices to open  Possible flooding and roof leakage  Momentary under voltages can affect hundreds of square miles  Fires started by lightning can destroy public infrastructure located in rural areas Flood Expect catastrophic damage and disruption to data centers in severe flood areas or with infrastructure systems below grade Expect severe region-wide damage to public infrastructure, utilities and communications  Several day warning generally expected  Employees may be unable to leave site  Emergency supplies needed for at least several days operation  Site infrastructure damage requiring days to weeks to repair  Site may be isolated for an extended period  Highways and bridges may be damaged preventing movement of diesel fuel and other operating supplies required for continues operation  Power and communications may sustain extensive damage requiring days, weeks or longer to repair Source: Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications
  • 32. 29 Appendix C East Coast Liquidity Venues The New York metro area is responsible for approximately 94% of the volume of shares traded for the US cash equity market14 . The following maps illustrate the major liquidity venues in the New York and Chicago metropolitan locations. New York Chicago 14 http://www.batstrading.com/market_data/daily_volume/
  • 33. 30 Works Cited & References Interagency Paper on Sound Practices to Strengthen the Resilience of the U.S. Financial System, September 2005. http://www.sec.gov/news/studies/34-47638.htm Dodd-Frank H.R. 4173 Wall Street Reform and Consumer Protection Act, January 2010 National Public Radio (NPR), Visualizing The U.S. Electric Grid, April 24 th 2009, www.npr.org/templates/story/story.php?storyId=110997398 NOAA National Climatic Data Center, State of the Climate: National Overview for Annual 2012, published online December 2012 from http://www.ncdc.noaa.gov/sotc/national/2012/13. Uptime Institute, Natural Disaster Risk Profiles for Data Centers, http://uptimeinstitute.com/publications United States Geological Survey, http://earthquake.usgs.gov/earthquakes/states/new_york/hazards.php BICSI Standards for Data Centers, https://www.bicsi.org/default.aspx Colocation Selection, best practices and critical considerations for choosing the right data center colocation solution. Bill Kleyman, Cloud and Virtualization Architect, October 2012 Climate Change and Infrastructure, Urban Systems, and Vulnerabilities, Technical Report for the U.S. Department of Energy in Support of the National Climate Assessment, February 29, 2012 The historic nor’easter of 13-14 March 2010, Richard H. Grumm, National Weather Service
  • 34. 31 About Citihub Founded in 1998, Citihub provides IT expertise to some of the world’s leading enterprise organizations and is comprised of industry veterans who relish the challenge of complex technology and cultural change. We take a fresh approach to the technical challenges of today and believe in partnering with our clients through change. Citihub clients include Investment Banks, Hedge Funds, Media, and Manufacturing. About the Authors Vincent Pelly Vincent Pelly is an Associate Partner at Citihub with more than 30 years of experience across the financial services industry with specialization in infrastructure, program management and IT strategy. He has extensive experience managing large enterprise projects in infrastructure and data center advisory and technology implementation, and has managed large infrastructure transformation programs. Scott Haglund Scott Haglund is an independent consultant with more than 30 years of experience in the development and execution of global infrastructure strategy, architecture, transformation, technology roadmaps, optimization, and service delivery standards for the enterprise. He specializes in data center automation strategies, and has led many enterprise infrastructure transformation programs.