This file contains the slide presentation from the 2008 Network World IT Roadmap Keynote Address. It contains the concepts and events surrounding the successful disaster recovery of Lagasse IT functions following Hurricane Katrina\'s landfall in 2005.
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Musina
Katrina Recovery - Lagasse Inc
1. Property of United Stationers Inc 1
Surviving Katrina
Marshall Lancaster
April 2nd
, 2008
2. March 30, 2015
Property of United
2
Agenda
Introduction
Katrina Overview and Impact
Case Study: Lagasse Katrina Recovery
» Business overview and history
» Key terms
» Business imperatives during Hurricane Katrina
» Lagasse DR design and results
General recommendations
» “Best practices” for DR
» Cost-effective DR
» Lessons learned and recommendations
Review / Questions and Answers
3. March 30, 2015
Property of United
3
Introduction
In August, 2005, Lagasse Inc. was headquartered in New
Orleans, LA. Hurricane Katrina and the subsequent events
made the entire area inaccessible, including the primary
computing facility and customer care center.
As a result of the storm, Lagasse experienced no system
down-time, and recorded it’s second and third largest sales
days in the week following.
In addition, the ability to quickly redeploy personnel allowed a
major acquisition and integration effort to continue on schedule.
4. March 30, 2015
Property of United
4
Hurricane Katrina Overview
Made second U.S. landfall as a Category 3/4 Hurricane
» Initially impacted Florida as a Category 1
» Massive storm, briefly a Category 5
• New Orleans received sustained Cat 1 /
Cat 2 winds for several hours
» Levees breached in 73 locations
• Major failures: 17th
Street Canal,
Industrial Canal, London Avenue Canal
• 80% of New Orleans flooded
» Over 1,400 lives lost in the aftermath
• 1,100 in New Orleans
Massive relocation resulted
» Over 6,000 evacuees in Chicago (the largest non-Southern migration)
5. March 30, 2015
Property of United
5
Regional Impact Assessment
Immediate loss of all critical
infrastructure services
» Power, water, food, fuel
» Medical and police severely strained
First 14 days:
» Most basic infrastructure still off-line
» Access to city highly restricted by
local and federal authorities
15 days to present:
» Slow, arduous recovery process with on-going structural problems
» Huge deficit of professional talent due to massive relocations
» Persistently degraded socials services (medical, educational, etc.)
6. March 30, 2015
Property of United
6
Lagasse Business Overview
Wholesale distribution company focused on Janitorial
-Sanitation, Foodservice, and Paper products
»Over 1,000 associates nationwide
»2007 Gross Sales approx. $750M
»31 Warehouse facilities
Immediately prior to Katrina, acquired Sweet Paper
Sales Corporation, with 10 facilities and $250M in
annual sales
Lagasse is a wholly owned subsidiary of United
Stationers, a $4.5B broad-line wholesale distribution
company (NASDAQ:USTR)
7. March 30, 2015
Property of United
7
Hurricane Katrina
Primary Lagasse data center
located in New Orleans, LA.
Impacted business functions:
»Largest Customer Call Center
»New Orleans Distribution Facility
»Operations Procurement
»Marketing Pricing Product
Management
»Accounting Finance Executive
»Information Technology
Most Information Technology staff were based out of New
Orleans
Over 200 associates were displaced, the majority in
professional and support activities. 90% of administrative
capacity was disrupted
8. March 30, 2015
Property of United
8
Associates (Before the Storm)
= ~25 associates
Associates (One Week After the Storm)
= Data Center
= Warehouse hub
= Warehouse
10. March 30, 2015
Property of United
10
Major Timeline
E-commerce Implementation
SP Facility Integration - Raleigh
Disaster Recovery Effort
Disaster Asset Recovery Team
(DART)
Business Recovery Effort (BCP)
New Orleans Headquarters Closed
IT Team Move (Chicago)
SP Facility Integration - Boston
SP Facility Integration – San
Francisco
SOX Compliance Audit
SP Facility Integration - Atlanta
SP Facility Integration - Tampa
Aug DecSept Oct Nov Jan ‘06
11. March 30, 2015
Property of United
11
Key Terms
Recovery Point Objective (RPO): maximum amount of
data loss that is deemed acceptable by data owners /
business
Recovery Time Objective (RTO): maximum period during
which data can be unavailable for access or modification
Seven Tiers of Disaster Recovery (SHARE 1992):
»Tier 0: No recovery capabilities
»Tier 1: Tape backup, no off-site hardware
»Tier 2: Tape backup, with off-site hardware
»Tier 3: Electronic vaulting – across the wire protection of some data
»Tier 4: Point-in-time-copies
»Tier 5: Transaction integrity
»Tier 6: Zero or near-zero data loss
»Tier 7: Highly automated integrated solutions
NotadoptedbyLagasse
12. March 30, 2015
Property of United
12
Lagasse DR Plan
Multiple ‘dry-runs’ had improved capabilities
» (2002) Hurricanes Isadore, Lilli
» (2004) Hurricane Ivan
» (2005) Hurricane Dennis
Established three application tiers for planning recovery
» Tier 1: Required for pick, pack, ship, and manual order entry
• Semi-automated recovery
• 0-15 minutes of data loss, 6 hour recovery
» Tier 2: Platforms critical for consistent customer experience
• Limited staff required to recover
• 0-24 hours of data loss, 24-72 hours to recover
» Tier 3: Computing resources that support reporting, administration
• Specialized staff required to recover, no commitment to recover
Planning is the act of trading chaos
for error ~ Author unknown
14. March 30, 2015
Property of United
14
Lagasse Computing Environment
(cont.)
Geographically diverse (Chicago backup DC)
Leveraged parent-company infrastructure
Simplified architecture: avoided application and
infrastructure complexity whenever possible
Utilized technology refresh cycles to redeploy old
production equipment into DR
‘Blended’ recovery approach: range of SHARE DR tiers 1
through 6, depending on application importance and
complexity
Supportive processes and technologies in place
»Database Transaction Logging (i.e. Oracle Archive Re-do)
»DNS / DHCP
15. March 30, 2015
Property of United
15
ERP Recovery
Database log files are generated to /db/dbrecover
Log files are FTP’d to remote
server directory
»Only new files are migrated
»Utilizes sequence numbering
UNIX cron job runs hourly to
‘roll forward’ the database
transactions on backup server
» Limitation: DR database cannot be used for
testing
Additional process automates replication for key UNIX files
and directories (i.e. /etc/passwd, printer defs, etc.)
ProductionProduction
ServerServer
/db/dbrecover/db/dbrecover
Backup / TestBackup / Test
ServerServer
/db/dbrecover/db/dbrecover
DB file01
DB file02
DB filexx…
16. March 30, 2015
Property of United
16
Network Recovery
All IP address-based connectivity
was removed and replaced with
DNS naming system
Recovery configurations for
routers were stored on
remote host servers
»Changes to ‘IP Helper’
statements cause redirection of
DHCP and DNS requests
»NAT statements utilized to redirect
any systems that do not support DNS
(TFTP, remote terminal services, etc.)
High bandwidth link used for file synchronization becomes
primary connectivity mode for remote warehouses and call
centers
17. March 30, 2015
Property of United
17
Systems Recovery - Various
E-storefront
» Tape restore to like hardware
for initial build
» Code changes manually passed
across WAN
» Configuration changes required:
• Messaging
• E-storefront DB
• Catalog DB
• Application integration layer
EDI
» External DNS re-targeted to
Chicago host address
Application Integration Suite
» Tape restore to like hardware
for initial build
» Incremental changes and
patches passed across WAN
Document Imaging System
» Replaced older document
imaging system, deployed to
Chicago in known working state
» All production config changes
made to backup system
Small Parcel Shipping
» Automatically handled as part of
DNS re-pointing
18. March 30, 2015
Property of United
18
Systems Recovery - Failed
Legacy E-commerce / E-storefront
» Restore from backup tape was only partially successful
» Heavy modification of original source code prevented ‘clean install’
» Old environment recovered as part of ‘DART’ (Disaster Asset Recovery
Team), which was an effort that recovered viable technology assets from
the New Orleans data center
» Resolved by moving customers to new product
Hyperion Data Warehouse
» Tape recovery proved unsuccessful,
and staff with the knowledge to rebuild
were not available for almost 3 weeks
» Recovered as part of DART
Activity Based Costing System
» Tape unsuccessful, recovered during
DART
19. March 30, 2015
Property of United
19
Business Continuity
Headquarters relocated to
United Stationers corporate
offices in Chicago
‘Southeast Headquarters’ set
up in Atlanta in order to support
displaced professional workers
Temporary ‘remote’ call centers
established in Dallas, Atlanta,
San Antonio, and Miami
Due to the on-going integration of Sweet Paper, project
teams had to be re-assembled quickly
»First 2-3 weeks spent rebuilding contact information
»IT support re-assembled in Chicago, Oaks, Atlanta
»Many associates become ‘virtualized’ – operate from hotels, new
houses, Internet cafes
20. March 30, 2015
Property of United
20
Lessons Learned
Do not locate your primary data systems in New Orleans!
(other cities may not be great choices, either…)
Tape recovery is extremely complex, and not very reliable
Communication quickly breaks down during disaster
»Cell phones are nearly useless if homed in disaster area
»Associate web-site (alternate DNS) and alternate associate hot-line
were invaluable fall-backs
Laptops, Citrix, and VPN are powerful BCP tools
People are more important than plan…
»It all starts with hiring / development
»Can they be where you need them when the time comes?
21. March 30, 2015
Property of United
21
Lessons Learned (cont.)
A robust DRP is only part of the answer; a solid BCP
completes the puzzle
‘Force Majeure’ clauses exist in most off-site storage
vendor contracts and agreements
22. March 30, 2015
Property of United
22
Best Practices DR Architecture
Rely on best-in-class co-location partner
» Provides security, power, cooling, and fire suppression management
» Geographically diverse, preferably +200 miles between DCs
Select premier storage supplier
» Block-level replication for Tier 1, file-level for Tier 2
» Tape recovery reserved for Tier 3
Create network architecture with telecommunications
provider redundancy and ‘last-mile’ diversity
Automated fail-over, clustering
Build highly specialized technical competencies
» Required to support complex recovery environment
23. March 30, 2015
Property of United
23
Supporting Technologies
Application Stacking: Hosting multiple applications
within the same compute environment. Applications share
hardware, operating systems, databases, web servers, etc.
4:00 PM
CPU Load > 90%
Allocate COD
Reducing CPU
Load to 50%
CPU Pool
6:00 PM
CPU Load < 50%
De-Allocate COD
returning CPUs to
Pool
Resource Monitor
Allocate
COD
De-allocate
COD
4:00 PM
CPU Load > 90%
Allocate COD
Reducing CPU
Load to 50%
CPU Pool
6:00 PM
CPU Load < 50%
De-Allocate COD
returning CPUs to
Pool
Resource Monitor
Allocate
COD
De-allocate
COD
Resource Monitor
Allocate
COD
De-allocate
COD
Capacity on Demand: Ability to dynamically
allocate additional compute capacity (CPU, RAM, Storage,
etc.) to multiple virtualized environments as needed.
OS
DB
Web Srv
App 1
OS
DB
Web Srv
App 1
OS
DB
Web Srv
App 5
OS
DB
Web Srv
App 5
OS
DB
Web Srv
App 2
OS
DB
Web Srv
App 2
OS
DB
Web Srv
App 6
OS
DB
Web Srv
App 6
OS
DB (N) DB (N+1)Scheme1
Scheme2
Scheme3
Scheme4
Scheme5
Scheme6
OS
DB (N) DB (N+1)Scheme1
Scheme2
Scheme3
Scheme4
Scheme5
Scheme6
Data Base Stacking
OS
App1
App2
App3
App4
App5
App6
OS
App1
App2
App3
App4
App5
App6
URL Stacking
OS
Web Srv
(N)
Web Srv
(N+1)
URL1
URL2
URL3
URL4
URL5
URL6
OS
Web Srv
(N)
Web Srv
(N+1)
URL1
URL2
URL3
URL4
URL5
URL6
OS
DB
Web Srv
App 3
OS
DB
Web Srv
App 3
OS
DB
Web Srv
App 4
OS
DB
Web Srv
App 4
Virtualization: Establishment of ‘virtual’ compute environments
from a larger pool of resources. Enablers include enterprise servers
with virtualization capabilities and VM software solutions.
24. March 30, 2015
Property of United
24
Current United Stationers Design
Tier 1 - applications replicated using EMC’s SRDF/A
Tier 2 - supported by SRDF/A or DoubleTake software
Tier 3 - utilize tape recovery (with some tape virtualization)
Internet
High Speed
Network
MPLS WAN
Production
Data Center
Non-Production
Data Center
United Stationers
25. March 30, 2015
Property of United
25
Cost-effective DR Architecture
Take advantage of technology refresh cycles
» Older equipment becomes DR / test environment
Use application and database stacking
Utilize ‘owned’ space where possible
» Remote call centers, facilities, parent data centers
» Small footprint co-location if necessary
Perform market-based risk analysis, and target recovery
plan towards most likely scenarios
The ‘doughnut tire’ approach…
Maintain the minimum DR capacity to get through a crisis
The ‘doughnut tire’ approach…
Maintain the minimum DR capacity to get through a crisis
26. March 30, 2015
Property of United
26
Scenario Pre-Planning
Impact
Likelihood
Power Failure
Facility Loss
Facility Uninhabitable
Facility Inaccessible
Hurricane
Disaster mitigation efforts
prioritized based upon:
• Likelihood
• Impact
Focus placed on higher impact,
higher probability events
Limited resources expended on
items inside the red curve
Hurricane and inaccessible
facility (flooding) deemed
largest risks.
He who defends everything defends nothing ~ Frederick the Great
27. March 30, 2015
Property of United
27
Creating Value in a Difficult Economy
IT
Spending
Percentage
Operations /
Maintenance /
KTLO
Strategic
Infrastructure
Current
Spend
New
Applications
Application
Modification
New
Spend
Non-Discretionary IT SpendingDiscretionary IT Spending
Cost-effective, targeted DR
will be a critical component
for reducing operational
expenses.
As pressures to control operating expenses increase, IT departments will
be forced to make trade-offs between strategic and non-strategic initiatives.
Differentiating
technologies
Protect investments
in this area…
28. March 30, 2015
Property of United
28
One Last Point… The People Element
During a disaster, what are your associate’s priorities?
» Safety, family, property, (employment)
» How much focus will they devote to company objectives?
» Create steps to mitigate associate concerns
What will be the long-term impacts?
» “Liberation Effect”
• People view disaster as a life-changing
event (and an opportunity)
• Career changes, relocations, etc.
» Post-traumatic stress disorder
• Irrational decision making
• Shock, depression
How will you be affected…?
Maslow’s Hierarchy
Physiological Needs
Survival Needs
Belonging Needs
Esteem Needs
Understanding
Aesthetic
Transcendance
Self-Actualization
29. March 30, 2015
Property of United
29
Final Recommendations
Lead formal application tiering exercise with business
leadership
»E-mail is Tier 1, regardless of business determination
»Include personnel availability assumptions in planning
Seek out and eliminate all examples of ‘hard-coding’
Use sound technology architectural principles everyday
(corollary: the more complex your environment, the more
complex it is to recover)
Use application and database stacking in recovery center
Capacity on demand is an option, albeit expensive
Make testing as real as possible
30. March 30, 2015
Property of United
30
Final Recommendations (cont.)
Work with the business to build effective (but simple) BCP
Review all major supplier contracts for ‘force majeure’
exclusions
Create a robust communication strategy
»Avoid cell-phones as primary
»Create fail-safe contact methodology (associate web-site, alternate
phone number, etc)
Provide all critical team members with laptops and remote
connectivity (i.e. VPN or Citrix)