The Nutsand Bolts ofDisasterRecoverySteve KnippleVP Engineering &OperationsMay 2, 2013
Agenda Introduction Audience Survey Definitions Industry Statistics Disaster Recovery-as-a-Service Planning Buildin...
A bit about me Steve Knipple, VP Engineering and Operations 17 years’ experience in IT enterprise strategy, architecture...
Disaster Recovery (from 50,000 feet) The processes, policies, and procedures that are related topreparing for recovery or...
Where are you with respect to Disaster Recovery? How many people have Disaster Recovery solutions inplace? (Why?) How ma...
Some scary DR stats6According to Gartner, what percentageSMBs have a comprehensive DR plan in place?35%
Some scary DR stats7According to The Hartford Financial Services Group,what percentage of businesses that experience a dis...
Some scary DR stats8Of those 57% that do reopen, what percentage arestill in business two years later?29%
Some scary DR stats9According to International Data Corp (IDC),what is the average cost of business downtime?35%$84,000per...
There’s plenty of risk (no matter where you go)10
And we’re waiting for trouble…11Source: EMC2, European Disaster Recovery Survey 2011: “Data Today Gone Tomorrow: How Well ...
Yet many don’t act…Major DR challenges Organizational No clear goals No plan No budget No time Technological No “as...
The nuts and bolts of Disaster Recovery How to apply the basics of “Plan, Build, Test and Run”to Disaster Recovery13
The nuts and bolts of Disaster Recovery14PLANA Business Discussion
The catastrophic failure of an IT environmentcaused by a unplanned event which resulting inservice interruption within or ...
Costs per OccurrenceEventFrequency(No.Annually)11001/10$ $$$ $$$$$Virus Data CorruptionWormsApplication OutageDisk Failure...
Classifying interruptionsRoutine Interruptions Limited scale disruption, short term effects Generally have an informal c...
Identifying Disaster Recovery goals Recovery Point Objective (RPO) Determines replication frequency Guide to determine ...
What would happen to my business during acatastrophic IT Service outage? My business IS IT services You get news coverag...
Plan (Get the train on the track) You need to make some major “forking” decisions and adjust those toyour limitations (an...
The nuts and bolts of Disaster Recovery21BUILDAn Execution Challenge
If you’ve had the business discussion, it’s all aboutexecution…22Maybe wecan do this…Get the toolbox.
RTO decision drives your options23
RPO decisions drive your data replication strategy Synchronous replication Guarantees “zero data loss.” Data is not avai...
Your business operation drives geography Where are your customers and employees? If a localized or regional disaster wou...
Your past choices determine your DR options26
Build vs. buy decision drives which services you buy27ImplementationBASIC ENHANCED PREMIUMPhysical infrastructure: Comput...
IaaS clouds offer several options28
Develop (and test) Recovery PlaybookHigh-level exampleIV. Failover Recovery Plan1. Disaster is declared2. Promote Director...
The nuts and bolts of Disaster Recovery30RUNA Discipline Challenge
What does it mean to run DR? The Disaster Recovery environment is part of theproduction environment All regular processe...
Single pane of glass for compute / applications32
Single pane of glass for networking(including bandwidth)33
Maintain the DR Playbook Change management Different systems People changes Technology changes Regular testing Keeps...
The nuts and bolts of Disaster Recovery35Real Life ExamplesTrends
Customer example #1 Oregon-based hospital Large skilled internal IT staff Significant assets already in place Hardware...
Customer example #2 Arizona-based healthcare provider New “green-field” Clinical Information System Complicated modern ...
Customer example #3 Oregon-based education assessment provider Existing complicated environment Highly skilled internal...
Disaster Recovery trends from an IaaS providerperspective Most new customers are actively pursuing DR solutions Long ter...
Where do you start?1.Understand where you are and where you want to be and when youwant to be therea. RTO / RPO, organizat...
Thank you! And remember… Disaster Recovery is really a journey… It is difficult butvery achievable Disaster Recovery is ...
Upcoming SlideShare
Loading in …5
×

The Nuts and Bolts of Disaster Recovery

860 views

Published on

Presented at InnoTech Oregon 2013. All rights reserved.

Published in: Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
860
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The Nuts and Bolts of Disaster Recovery

  1. 1. The Nutsand Bolts ofDisasterRecoverySteve KnippleVP Engineering &OperationsMay 2, 2013
  2. 2. Agenda Introduction Audience Survey Definitions Industry Statistics Disaster Recovery-as-a-Service Planning Building Running (and Testing) Real-Life Examples Q&A2
  3. 3. A bit about me Steve Knipple, VP Engineering and Operations 17 years’ experience in IT enterprise strategy, architecture,management and operations A specialist in Corporate IT transformational programs B.S.M.E. — University of Wisconsin M.S. in Applied Information Management —University of Oregon 12 years at Munich based Wacker Chemical standardizing,consolidating, and improving their global infrastructure disastertolerance 3 years at EasyStreet Online Services developing and growing theircolocation, cloud and disaster recovery service offerings3
  4. 4. Disaster Recovery (from 50,000 feet) The processes, policies, and procedures that are related topreparing for recovery or continuation of technology infrastructurewhich are vital to an organization after a natural or human-induced disaster A subset of business continuity (larger in scope) The longest running IT initiative in most IT organizations Usually “next years” project Frequently “check box and pray” Periodically “I think we got it” Sometimes “we’re covered” It’s different today… There are more cattle There are better options to take care of kittens The technology has dramatically matured, it is much more about theexecution of best practices4
  5. 5. Where are you with respect to Disaster Recovery? How many people have Disaster Recovery solutions inplace? (Why?) How many people believe their Disaster Recovery solutionswork? (Why?) How many people have Disaster Recovery as an activeinitiative? (How long has this been active?)5
  6. 6. Some scary DR stats6According to Gartner, what percentageSMBs have a comprehensive DR plan in place?35%
  7. 7. Some scary DR stats7According to The Hartford Financial Services Group,what percentage of businesses that experience a disasterwithout an emergency plan never reopen?43%
  8. 8. Some scary DR stats8Of those 57% that do reopen, what percentage arestill in business two years later?29%
  9. 9. Some scary DR stats9According to International Data Corp (IDC),what is the average cost of business downtime?35%$84,000per Hour
  10. 10. There’s plenty of risk (no matter where you go)10
  11. 11. And we’re waiting for trouble…11Source: EMC2, European Disaster Recovery Survey 2011: “Data Today Gone Tomorrow: How Well Companies Are Poised For IT Recovery”
  12. 12. Yet many don’t act…Major DR challenges Organizational No clear goals No plan No budget No time Technological No “as built” Legacy applications that can’tleverage modern capabilities Operational No playbook No “single pane of glass”12
  13. 13. The nuts and bolts of Disaster Recovery How to apply the basics of “Plan, Build, Test and Run”to Disaster Recovery13
  14. 14. The nuts and bolts of Disaster Recovery14PLANA Business Discussion
  15. 15. The catastrophic failure of an IT environmentcaused by a unplanned event which resulting inservice interruption within or to the business.A significant and extended failure of an IT serviceresulting in: A failure to satisfy SLA(s) or KPA(s) Revenue loss or significant core service impact Loss of market share, confidence, or goodwill Adverse customer impact Exposure to litigation / legal action– or –What is a ‘service failure’?15
  16. 16. Costs per OccurrenceEventFrequency(No.Annually)11001/10$ $$$ $$$$$Virus Data CorruptionWormsApplication OutageDisk FailureComponent FailureNetwork ProblemPower FailureBuilding FireFlood / TornadoTerrorism/Civil UnrestHigh AvailabilityBusiness ContinuityDisaster ResiliencyPandemic10Disaster RecoveryEarthquakeEarthquake (Damage)Source: Coherence Advisors, 201216Service interruption spectrum
  17. 17. Classifying interruptionsRoutine Interruptions Limited scale disruption, short term effects Generally have an informal continuance plan Momentary interruptions cause inconvenience Constrained human resources, transportationNon-routine Interruptions Potential for disaster, significant disruption Large-scale incidents, long term effects Potential to adversely affect customers, brand Generally have a formal continuance plan Interrupted personnel, supply-chain,transportation Source: Coherence Advisors, 201217
  18. 18. Identifying Disaster Recovery goals Recovery Point Objective (RPO) Determines replication frequency Guide to determine production system load and bandwidthrequirements Recovery Time Objective (RTO) Guide to determine failover mechanism and procedures18
  19. 19. What would happen to my business during acatastrophic IT Service outage? My business IS IT services You get news coverage (Netflix, reddit) Better have a plan B IT services are critical to day to day operation of mybusiness Most large businesses today (ERP systems, CRM, mail, etc.) IT services augment my business I have an informational website19
  20. 20. Plan (Get the train on the track) You need to make some major “forking” decisions and adjust those toyour limitations (and there are no right answers) What is your RTO (Recovery Time Objective)? Defined as: The duration of time and a service levelwithin which a process must be restored after a disaster in order to avoid unacceptable consequences What is RPO (Recovery Point Objective)? Defined as: The Maximum tolerable period in which datamight be lost from an IT service outage How far away from your main site is “far enough”? (associated with WHERE you do business) Decide what events constitute a “disaster” to you. (usually related to extended downtime andrecoverability) You need to understand your IT services infrastructure BEFORE youstart Disaster Recovery What are your critical services? What dependencies exist between the your services? Identify legacy (and often problematic) systems Decide who will do the work and how you will pay for it OpEx vs. CapEx Build vs. buy Insource vs. outsource20
  21. 21. The nuts and bolts of Disaster Recovery21BUILDAn Execution Challenge
  22. 22. If you’ve had the business discussion, it’s all aboutexecution…22Maybe wecan do this…Get the toolbox.
  23. 23. RTO decision drives your options23
  24. 24. RPO decisions drive your data replication strategy Synchronous replication Guarantees “zero data loss.” Data is not available until it issuccessfully written to two environments Can be handled at either the infrastructure or application layer Asynchronous replication A delay exists between the primary and secondary systems Sometimes you need to quiesce (or pause) the primary system Oftentimes used in addition to synchronous replication to enablemultiple state backups24
  25. 25. Your business operation drives geography Where are your customers and employees? If a localized or regional disaster would also affect your entire customerbase, it may not make sense to utilize geo-distant DR If you have a national/international presence, your DR footprint shouldalso be national25
  26. 26. Your past choices determine your DR options26
  27. 27. Build vs. buy decision drives which services you buy27ImplementationBASIC ENHANCED PREMIUMPhysical infrastructure: Computer power Data center Connectivity Etc.ADD: InfrastructureReplicationtechnologyADD: Operation Plan(Playbook) Application Levelreplication Recovery TestingDo it all internally or…Colocation IaaS Cloud ServiceEnterprise BackupMix of colocation, cloud,And professional services
  28. 28. IaaS clouds offer several options28
  29. 29. Develop (and test) Recovery PlaybookHigh-level exampleIV. Failover Recovery Plan1. Disaster is declared2. Promote Directory Services3. Promote standby backend databases4. Bring up application services5. Verify functionality6. Switch DNS7. Clean up29 “Soft” DR testing simplifies failbackto primary site. “Hard” DR testing assumes you can’tfailback… you need to migrate back.
  30. 30. The nuts and bolts of Disaster Recovery30RUNA Discipline Challenge
  31. 31. What does it mean to run DR? The Disaster Recovery environment is part of theproduction environment All regular processes… Incident, problem, changemanagement apply to the whole environment Regular testing is necessary to validate environment Special processes are required to specify… Disaster declaration Emergency teams and roles Communication plans31
  32. 32. Single pane of glass for compute / applications32
  33. 33. Single pane of glass for networking(including bandwidth)33
  34. 34. Maintain the DR Playbook Change management Different systems People changes Technology changes Regular testing Keeps Disaster Recovery “top ofmind” in the organization Identifies opportunities forimprovements34
  35. 35. The nuts and bolts of Disaster Recovery35Real Life ExamplesTrends
  36. 36. Customer example #1 Oregon-based hospital Large skilled internal IT staff Significant assets already in place Hardware refresh provided opportunity to improve DR Solution Primary Infrastructure in EasyStreet colocation 9 cabinet cage Redundant diverse connectivity DR infrastructure located at hospital Data Replication / DR playbook managed by hospital IT36
  37. 37. Customer example #2 Arizona-based healthcare provider New “green-field” Clinical Information System Complicated modern application Extremely high availability / performance required Solution HOT/HOT Disaster Recovery solution (RPO 1 hour, RTO 4 hours) Identical dedicated private clouds in Beaverton and Phoenix Multiple replication techniques used‒ Database / storage / hypervisor based DR Playbook jointly developed by customer and EasyStreet37
  38. 38. Customer example #3 Oregon-based education assessment provider Existing complicated environment Highly skilled internal IT department Failed DR attempts in the past at IT service provider Critical systems that are the companies business Solution HOT/HOT Disaster Recovery solution (RPO 1 hour, RTO 4 hours) Primary private cloud in Beaverton, secondary lower capacity privatecloud in Phoenix Multiple replication techniques used‒ Database / storage / hypervisor based DR Playbook jointly developed by customer and EasyStreet38
  39. 39. Disaster Recovery trends from an IaaS providerperspective Most new customers are actively pursuing DR solutions Long term customers are looking at ways to consolidateprimary infrastructure in the cloud and then use the savingsto enable DR capability Most are doing it in steps and taking 6 - 9 months to fullycomplete (primarily due to application complexity) Technology improvements are enabling DR that simplywas unaffordable in the past Customers are reinvesting savings in improving operationalprocesses and institutionalizing disaster recovery andbusiness continuity39
  40. 40. Where do you start?1.Understand where you are and where you want to be and when youwant to be therea. RTO / RPO, organizational risk tolerance... Make the case.b. Categorize your services and their dependencies2.Make basic forking decisionsa. Geography… Does it matter?b. Cold/Warm/Hot?c. How much do you do in-house and what do you source?3.Establish a long term plan with incremental achievable milestones4.Start small and build on success. Don’t overreach in the beginning.5.Maintain operational focus (DR is not a project…It is part of day-to-day operations)40
  41. 41. Thank you! And remember… Disaster Recovery is really a journey… It is difficult butvery achievable Disaster Recovery is more than a project, it fundamentalitychanges how you run your business It takes a team to be successful! Call 503-601-2617 Email stevek@easystreet.com41

×