Note to Presenter: View in Slide Show mode for animation. When EMC or its partners talk about remote replication, they usually mean between storage at two locations. The source and target are physically separated to reduce the risks associated with co-location. Remote replicated systems could be across a campus, across a town, or across the globe. Their physical distance and technology selected can affect how quickly you recover from a disruption and how much data is lost.Organizations normally set requirements for how much lost data and how much time to come back online is acceptable. The recovery point objective (RPO) is the amount of data that can be lost, measured in terms of time without being catastrophic to the business. The recovery time objective (RTO) is the amount of time that it takes to recover the data and restart your business services from the recovered data. Remote replication provides much lower RPOs (at or close to zero) and very small RTOs, depending on implementation. The bottom line is that replication is appropriate for all types of data, and the RPO and RTO you target are going to affect your implementation.For multiple RPOs and for remote replication with either zero or low RPO—and near-instant to instant recovery with DVR-like technology, EMC offers the RecoverPoint family.
Disaster Recovery - Business & Technology
Disaster RecoveryBusiness & Technology Varrow Madness March 15, 2012 Andrew Miller Technical Consultantt: @andriven w:www.thinkmeta.net
One Big Reason to Do ThisExpectations for Disaster Recovery ≠ IT Capabilities for Disaster Recovery
What is a Disaster?• Disaster: An event that affects a service or system such that significant effort is required to restore the original performance level. » IT Service Management Forum But what does that look like IN OUR ENVIRONMENT? What disaster and recovery scenarios should we plan for? Where do we begin? How do we do it?
Disaster Recovery vs. Operational Recovery• Disaster Recovery – To cope with & recover from an IT crisis that moves work to an alternative system in a non-routine way. – A real “disaster” is large in scope and impact – DR typically implies failure of the primary data center and recovery to an alternate site• Operational Recovery – Addresses more “routine” types of failures (server, network, storage, etc.) – Events are smaller in scope and impact than a full “disaster” – Typically implies recovering to alternate equipment within the primary data center• Business expectations for recovery timeframe is typically shorter for “operational recovery” issues than a true “disaster”• Each should have its own clearly defined objectives
Risks, Threats and VulnerabilitiesRisk is a function of the likelihood of a given threatacting upon a particular potential vulnerability,and the resulting impact of that adverse event onthe organization.
Some threats that can cause Disasters…• Human Error• Localized IT systems / network failure• Extended power outage• Telecommunications outage• Storm / Weather damage• Earthquake / Volcano• Fire in the facility• Facility flooding• Local evacuation• Cyber attack• Sabotage
(Varrow) Disaster Recovery Approach• Interviews with key personnel to understand Business Process priorities and establish Business Impact Analysis (BIA).• Review existing IT production infrastructure, including applications, servers, storage, network, and external connectivity. Identify Risks and Gaps.• Establish Disaster Impact Scenarios and Disaster Recovery strategies to meet requirements.• Recommend Roadmap for establishing recovery capabilities and documenting plans.• Implement required recovery capabilities.• Develop framework and content for IT DR Plan.• Develop maintenance and test procedures for IT DR Plan.• Address Business Continuity requirements and planning as appropriate.
What is the Business Impact Analysis?• A conversation between IT and key stakeholders to understand: – What are the most time-critical and information-critical business processes? – How does the business REALLY rely upon IT Service and Application availability? – What are the Student, Financial, Regulatory, Reputational, and other impacts of IT Service and Application unavailability? – What availability or recoverability capabilities are justifiable based on these requirements, potential impact, and costs?
Disaster Recovery: Key Measures Recovery Point Objectives Recovery Time Objectives (RPO) (RTO) 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 a.m. a.m. a.m. a.m. a.m. a.m. a.m. a.m. p.m. p.m. p.m. p.m. p.m. p.m. p.m.RPO: Amount of data lost from DECLARE RTO: Targeted amount of timefailure, measured as the amount DISASTER to restart a business service 10 a.m.of time from a disaster event after a disaster event
Disaster Recovery: Key Measures• Recovery Time Objective (RTO) Maximum duration of disruption of service• Recovery Point Objective (RPO) Point in time to which application data is recovered / Maximum data loss Weeks Days Hours Minutes Seconds Seconds Minutes Hours Days Weeks Recovery Point Recovery Time Real Time Cost
BIA - Example Priority Tiers Priority Tier DescriptionPriority 1 Services whose unavailability more than a brief period can have a severe impact onHigh Availability / customers or time-critical business operations.Immediate RecoveryPriority 2 Services whose unavailability significantly impacts customers or business1-2 day recovery operations.Priority 3 Services which can tolerate up to five days of disruption in a disaster.3-5 day recoveryPriority 4 Services which can tolerate up to ten days of disruption in a disaster.6-10 day recovery Priority 3 and 4 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first.Priority 5 Non-critical services which can tolerate two weeks or more of disruption in a“Best effort” recovery disaster. These systems will be restored on a best-effort basis, after other more critical systems have been restored and ongoing operations have resumed. Priority 5 systems may be restored in less time, depending on the situation. However, higher priority functions will be restored first. In some cases, systems deemed to not be required for continued operations may not be restored.
What does it take to RECOVER from an IT Disaster?• Data Protection – Backups, Replication• Recovery Facility – Location to rebuild IT infrastructure or provision services• Data Recovery & Storage – Get Data into a form that is usable• Servers / Compute Capacity – Sufficient servers or virtual compute capacity to actually run the applications• Network, Voice, and Data Communications – Connect servers, storage and workers – Connect the recovery site to work sites – Communicate with customers – Includes network, telecom, demarcation equipment; cabling; telecom provisioning• DR Plan – Documented and tested procedures for what to do, and how to do it• People
Example Disaster Recovery Strategies Priority Disaster Recovery Strategy Data Protection ApproachPriority 1 Establish hot site for systems and data in a Replicate / remote mirror / short4 hour RTO or secondary data center at a remote interval remote disk-to-disk less location that is unlikely to be impacted backup by a local or regional event.Priority 2 Maintain sufficient remote physical or virtual Remote disk-to-disk backup24-48 hour RTO infrastructure for restoration. Ensure sufficient space/power in recovery facility.Priority 3 Ensure ability to quickly acquire Tape (with sufficient off-site rotation)72 hour RTO infrastructure for restoration. Ensure or remote disk-to-disk backup sufficient space/power in recovery facility.Priority 4 Ensure ability to quickly acquire Tape (with sufficient off-site rotation)1-2 week RTO infrastructure for restoration. Ensure or remote disk-to-disk backup sufficient space/power in recovery facility.
Storage Arrays + Replication PRODUCTION SITE OPTIONAL DISASTER RECOVERY SITEApplication Local RecoverPoint bi-directional Remote Standby servers copy replication/recovery copy servers RecoverPoint RecoverPoint appliance appliance Production and local journals Prod Fibre Remote SAN LUN Channel/WAN journal SAN s Storage Storage Host-based write splitter arrays arrays Fabric-based write splitter Symmetrix VMAXe, VNX-, and CLARiiON-based write splitter
Site A (Primary) Site B (Recovery) Site SitevCenter Server Recovery vCenter Server Recovery Manager Manager vSphere vSphere vSphere Replication Storage-based replication vSphere Replication Simple, cost-efficient replication for Tier 2 applications and smaller sites Storage-based Replication High-performance replication for business-critical applications in larger sites