2. Introduction –
VMware for Business Continuity
2 Confidential
3. Disasters Happen. Do You Need Protection?
43% of companies experiencing disasters never
re-open, and 29% close within two years.
(McGladrey and Pullen)
93% of business that lost their data center for
10 days went bankrupt within one year.
(National Archives & Records Administration)
40% of all companies that experience a major
disaster will go out of business if they cannot
gain access to their data within 24 hours.
(Gartner)
Top executives say 10 hours to recovery;
IT managers say up to 30 hours.
(Harris Interactive)
3
4. Business-Critical Applications Require Business Continuity
% of Application Instances Running on VMware in Customer Base
67%
53%
2011
47%
42% 43% 2010
34% 28% 28%
38% 25% 25%
18%
MS MS MS Oracle Oracle SAP
Exchange SharePoint SQL Middleware DB
Source: VMware customer survey, Jan 2010 and April 2011 interim results,
Data: Total number of instances of that workload deployed in your organization and the percentage of those instances that are virtualized
Availability Expectations on vSphere Continue to Increase
RTO‟s decreasing from >24 hours to <12 hours
4
5. Drawbacks Of Traditional Business Continuity Solutions
Application-level availability silos:
Availability
Complex and expensive requirements
Middleware / Local Availability
Java
Data Protection
Oracle RAC MS Clustering DB Access App Server Disaster Recovery
Groups Cluster
Oracle Session State
DB Mirroring CCR / SCR
DataGuard Replication
Backup Data replication Shared availability
services:
Longer RTOs and
RPOs
5
6. Improving Business Continuity At All Levels
Local Site Failover Site
vSphere vSphere vSphere vSphere vSphere
Improved
in 2011
Local Availability Disaster Recovery
Improved
vSphere High Availability in 2011 vCenter Site Recovery Manager
vSphere Fault Tolerance Includes vSphere Replication
vMotion and Storage vMotion
New
Data Protection in 2011
Improved
vSphere Data Recovery in 2011
Storage APIs for Data Protection
6
7. Transforming Cost And Complexity Of Business Continuity
RTO / • Similar RTOs to app-level
RPO availability solutions
Continuous
• Much lower cost / complexity
App-level availability
VMware business (Oracle RAC, MSCS, …)
continuity
Minutes
(HA, FT, vMotion, SRM,
VDR)
• Much better RTOs than
Hours
traditional backup and replication
• Similar or lower cost
Shared availability
services
Days (traditional backup,
replication)
$100 $1,000 $10,000
Cost ($ per app)
7
8. Better Business Continuity Is #1 Objective For Virtualization
Top Five Objectives for Virtualization
Use virtualization to improve Business Continuity and
Disaster Recovery (BCDR)
46%
Improve virtual machine performance 33%
Increase the server consolidation ratio 32%
Improve VM environment management 31%
More mission-critical applications 24%
Source: WW VMware customer survey, January 2010
N=1083
8
10. Challenges of Traditional Disaster Recovery
Complex Unreliable
Expensive
Recovery Plans Failovers
Software Apps
Hosts ? ?
Hosts
?
Storage Storage
?
?
?
Facilities ? Network
?
>$10K per app
Failure to meet business requirements
• Long RTOs – days to weeks
• Too much time and resources consumed
10
11. vSphere Provides The Best Foundation For Disaster Recovery
Cost-Efficient Infrastructure
vSphere • Reduced hardware requirements at recovery site
Consolidation
• Use recovery hardware to run low-priority apps
Flexible Infrastructure
• Eliminate need for identical hardware across
Hardware vSphere vSphere sites
Independence • Enable waterfalling of equipment to recovery site
Simple Application Protection
• Entire system – including application, OS,
Encapsulation and data – is stored as virtual machine files
• Entire system can be protected with data
protection tools
11
12. Encapsulation Simplifies Application Protection And Recovery
Physical
40+ Hrs.
Install
Configure Install Configure Start “Single-step
backup
hardware OS OS automatic recovery”
agent
Virtual
< 4 Hrs.
Restore Power
VM on VM
Simplify recovery
• No operating system re-install or bare-metal recovery
• No time spent reconfiguring hardware
Standardize recovery process
• Consistent process independent of applications,
operating systems and hardware
12
13. vCenter Site Recovery Manager Ensures Simple, Reliable DR
Site Recovery Manager Complements vSphere to provide the simplest
and most reliable disaster protection and site migration for all applications
Provide cost-efficient replication of
applications to failover site
• Built-in vSphere Replication
• Broad support for storage-based
Site A (Primary) Site B (Recovery)
VMware
vCenter Server
Site Recovery
Manager
VMware
vCenter Server
Site Recovery
Manager
replication
Simplify management of recovery and
VMware vSphere VMware vSphere
migration plans
• Replace manual runbooks with
centralized recovery plans
• From weeks to minutes to set up new
plan
Servers Servers Automate failover and migration
processes for reliable recovery
• Enable frequent non-disruptive testing
• Ensure fast, automated failover
• Automate failback processes
13
14. SRM Momentum
Introduced in Q2’ 2008
125,000+ units sold
5,000+ customers
50% annual growth in 2010
“If your organization is already taking advantage of virtualization,
then adding Site Recovery Manager to handle disaster recovery
is a no-brainer.”
― Jerry Wilkin
Senior Systems Administrator, Dayton Superior Corp
14
15. Key Components Of SRM 5
Site Recovery Manager
• Manages recovery plans
Site
vCenter Server Recovery • Automates failovers and failbacks
Manager
• Tightly integrated with vCenter and replication
Choice of Replication Options
vSphere vSphere Replication
• Bundled with SRM
• Replicates virtual machines between
vSphere clusters
Storage
Storage-Based Replication (3rd party)
• Provided by replication vendor
• Integrated via replication adapters created,
certified and supported by replication vendor
Required at Both Protected
and Recovery Sites
15
16. Site Recovery Manager Complements vSphere For DR
Traditional DR VMware
Consolidation to reduce costs
Hardware independence at vSphere
failover site
Functionality
Encapsulation for simple recovery
of entire systems
vSphere Replication
Simple management of recovery
and migration plans SRM
Automated DR failover and non- Functionality
disruptive testing
Streamline planned migrations
and automated failback
16
17. SRM Provides Broad Application Coverage
RTO: 30 minutes to hours
RPO: Flexible based on storage replication
Continuous
App-level geo-clustering / load balancing
Tier 1
RTO Hours
Tier 2
Site Recovery Manager
Days
Tier 3
Days Hours Synchronous
RPO
17
18. SRM Supports Flexible Topologies
Active-Passive Active-Active Bi-directional Shared
Failover Failover Failover Recovery Sites
Production Production Production
Recovery Recovery Production
• Most common • Leverage recovery • Production applications • Many-to-one failover
traditional scenario infrastructure for test, at both sites • Particularly useful for
• Expensive dedicated development, training • Each site acts as the Remote Office /
resources • Utilize sunk cost of recovery site for the Branch Office
recovery site other
18
19. What’s New In Site Recovery Manager 5.0?
vSphere Replication
Bundled with SRM at no additional cost
Expand DR coverage to
Tier 2 apps and smaller
Provides simple, cost-efficient replication
between vSphere clusters
sites
Automated failback
Bi-directional recovery plans
Automates failback to original site Streamline planned
Planned migration migrations
New workflow that can be applied to any
(for disaster avoidance,
recovery plan planned maintenance, …)
Ensures no data-loss, application-consistent
migrations of virtual machines
Others
More granular control over VM startup order
Protection-side APIs
IPv6 support
19
21. DR Coverage Often Limited Due To High Protection Costs
Tier 1 Apps - Protected
APP APP APP Need to expand DR protection
OS OS OS
• Tier 2 / 3 applications in larger
datacenters
Tier 2 / 3 Apps – Backup only • Small and medium businesses
APP APP APP • Remote office / branch offices
APP APP APP APP
OS OS OS
OS OS OS OS
Small Sites – Backup only
Small Business
Corporate Datacenter
Remote Office / Branch Office
21
22. SRM Provides Broad Choice of Replication Options
Site A (Primary) Site B (Recovery)
Site Site
vCenter Server Recovery vCenter Server Recovery
Manager Manager
vSphere vSphere
vSphere
Replication
Storage-based
replication
vSphere Replication
Simple, cost-efficient replication for Tier 2 applications and smaller sites
Storage-based Replication
High-performance replication for business-critical applications in larger sites
22
23. vSphere Replication For Cost-Efficient, Simple Replication
Cost-efficient Simple Powerful
Reduce storage costs by 2X Manage replication directly 15 minute RPOs
• Support for heterogeneous from vCenter • Set RPOs between 15
storage across sites, • Eliminate complex minutes and 24 hours
including non-replicating interactions with storage
storage teams Efficient network utilization
• Use lower-end or older • Replicate only changed disk
storage at failover site Manage replication at the areas
individual VM level
Eliminate replication • Eliminate need for Highly scalable
software costs complicated VM-to-LUN • 500 virtual machines
• vSphere Replication mapping
included with Site Recovery Limitations
Manager at no additional • No automated failback
cost • File-level consistency only
(except planned migration)
• No FT, templates, linked
clones, physical RDMs
23
24. Expand DR Protection To Tier 2 Apps And Small Sites
Tier 1 Apps Storage, Replication, and SRM
Costs per Protected VM
$2,000/VM
$2,000
SRM
Enterprise
Storage Replication
Replication SW
$1,000
Tier 2 / 3 Apps Tier 1 Storage $600/VM
Failover Site SRM Standard
Tier 2 Storage
Failover Site
vSphere Storage Replication vSphere Replication
Large site Small site
vSphere Replication
Small Sites
vSphere Replication
Small Business
Corporate Datacenter
Remote Office / Branch Office
24
25. Simplify Replication Management With vSphere Replication
Storage-based Replication Overview
SharePoint Datastore Group
VMFS A vSphere Replication provides simple management
Web Datastore of replication
LUN 1
Managed directly from vCenter
App
VMFS B
Datastore Hub Managed at the individual VM-level
LUN 2
SQL
vSphere Storage Admin
Admin
Benefits
vSphere Replication
SharePoint Eliminate complex interactions between
vSphere and storage teams to set up
Web replication
Eliminate need to shuffle VMs between
App
datastores to map applications to replicated
LUNs
vSphere SQL
Admin
25
26. vSphere Replication Architecture
Tightly Integrated With SRM, vCenter and ESX
Protected Site Recovery Site
Site Site
vCenter Server Recovery Recovery vCenter Server
Manager Manager
vSphere Replication vSphere Replication
Management Server Management Server
VSR Agent vSphere
ESX
ESXi Replication ESXi
ESX Server
Any storage Any storage
supported by supported by
vSphere vSphere
26 Confidential
28. Simple Setup And Management of Recovery And Migration Plans
From Complex Runbooks… …to Simple Recovery Plans
Weeks or months to set up Simple recovery plan set up in minutes
Error-prone Fewer steps means far less room for errors
Quickly falls out of sync with apps Simple to keep in sync with changes
and infrastructure changes
28
29. Five Simple Steps To Create Recovery And Migration Plans
Create Recovery Plans …And Eliminate Manual Steps of
in 5 Steps… Traditional Recovery
Map production site resources to recovery site
• Resource pools Reconfigure individual hosts
Step 1 • vSwitches
• VM folders
Recover entire systems including OS
and application binaries
Select virtual machine protection groups
Step 2 to include in recovery Coordinate storage and replication processes
for recovery
Select low-priority VMs to suspend at • Stop replication and make replicated
recovery site LUNs writable
Step 3
• Present data to applications
• Present VMs to vSphere
Specify boot sequence of recovered VMs
Step 4
Reconfigure physical switching
infrastructure
Customize IP addresses of recovered VMs
Step 5
Optional Add messages and custom scripts
29
30. Application Consistent Recovery With SRM
Storage-based replication: application
Application Consistency Enabled consistency widely available
by Replication Provider • Enabled by replication management
software
Quiesce Replicate app- App-consistent • Typically relies on agents in the VMs to
application consistent VM VM presented properly quiesce applications
to SRM
• For both DR failover and planned
migrations
vSphere Replication: Application
consistency for planned migrations only
• File-system consistency for DR failover
Replication
via VSS requester in VMware Tools
management
30
32. Beyond DR: Disaster Avoidance And Planned Migrations
3 typical use-cases for SRM
Disaster Failover Disaster Avoidance Planned Migration
Recover from unexpected Anticipate potential Most frequent SRM use case
site failure datacenter outages • Planned datacenter
• Full or partial site failure • For example: in case of maintenance
planned hurricane, floods, • Global load balancing
The most critical but least forced evacuation, etc.
frequent use-case Streamline routine
• Unexpected site failures do Initiate preventive failover migrations across sites
not happen often for smooth migration • Test to minimize risk
• When they do, fast recovery • Leverage SRM „planned • Execute partial failovers
is critical to the business migration‟ to ensure no • Leverage SRM „planned
data-loss migration‟ to ensure no
• „Automated failback‟ data-loss
enables easy return to • „Automated failback‟
original site enables bi-directional
migrations
32
33. SRM Reduces Recovery Risk With Frequent Testing
Recovery Traditional Disaster Recovery
Risk
TESTING GAP
Time
DR Test DR Test
Recovery Site Recovery Manager
Risk
Lack of confidence
in DR process
Frequent
DR Testing
During the testing gap, organizations can‟t be sure that they
can recover the current IT environment
A failover scenario may take days or weeks to complete,
leaving the business at extreme risk Time
DR Test DR Test
SRM provides assurance that DR objectives will be met.
33
34. SRM Enables Frequent Non-Disruptive Testing
Non-disruptive Testing Overview
Recovery Site Automate test execution
• Execute recovery plan
• Customizable for testing with extra callouts
Recovery Site
and breakpoints
Isolated test • Log results of the test
environment
Isolated test environment
• Snapshot replicated LUNs
• Launch VMs in fenced network
• Reset environment after test
vSphere
Benefits
Replication Confidence and documentation that DR
requirements are satisfied
Quickly identify and remediate potential issues
Reduce cost and resources required for DR
testing
• Eliminate traditional „DR testing weekends‟
LUN snapshot
34
35. Automate DR Failover Processes
DR Failover Overview
Automatically detect site failures
Raise alert when
1 hearbeat lost Require user to manually initiate failover
Automate recovery process
2 User initiates Stop replication and present replicated LUNs to
failover vSphere
Execute user-defined recovery plan
Site A Site B
Benefits
vSphere vSphere
4 Recover VMs Ensure fast and predictable failovers and
migrations
Replication
Consistently meet business requirements
3 Minimize risk of user errors
Stop replication and
present LUNs to vSphere
35
36. Testing and Executing Recovery Plans
Steps in
recovery plan Status and time
stamps
When to execute
User
confirmation
message
36
37. Planned Migrations For App Consistency & No Data Loss
Planned Migration Overview
Two workflows can be applied to recovery plans:
DR failover
1 Shut down 3 Recover app- Planned migration
production VMs consistent VMs
Site A Site B Planned migration ensures application
consistency and no data-loss during migration
Graceful shutdown of production VMs in
application consistent state
Data sync to complete replication of VMs
vSphere vSphere Recover fully replicated VMs
Replication
Benefits
2 Better support for planned migrations
Sync data, stop replication
No loss of data during migration process
and present LUNs to vSphere
Recover „application-consistent‟ VMs at
recovery site
37
38. Automated Failback To Streamline Bi-Directional Migrations
Automated Failback Overview
Re-protect VMs from Site B to Site A
Reverse replication
Apply reverse resource mapping
Automate failover from Site B to Site A
Reverse original recovery plan Reverse original recovery plan
Restrictions
Site A Site B
Does not apply if Site A has undergone major
changes / been rebuilt
Not available with vSphere Replication
vSphere vSphere
Reverse
Benefits
Replication
Simplify failback process
Automate replication management
Eliminate need to set up new recovery plan
Streamline frequent bi-directional migarations
38
40. Successful Business Continuity Requires Careful Planning
Business Requirements / Business Impact
Analysis (BIA)
• Map service Tiers by availability requirements and cost
• For each service, identify Availability requirements,
Recovery Time Objectives (RTO), Recovery Point
Objectives (RPO)
Application Dependency Mapping
• Identify dependencies between application
components
• Weakest link in the chain? (AD, DNS, etc)
Use Professional Services
Business Continuity Design • VMware PSO
• App-specific solutions / virtualization • VMware BCDR
for HA and DR / backup only Competency partners
(300+ highly qualified
• Budget ahead of time partners)
• Project planning / phasing
40
41. SRM 5 Editions Lineup
SRM 5
Standard Enterprise
Price per protected virtual machine
$195 $495
(license only)
Scalability Limits
(1)
• Maximum protected VMs 75 virtual machines Unlimited(2)
Features
• Support for storage-based replication
• Centralized recovery plans
• Non-disruptive testing
• Automated DR failover
• vSphere Replication
• Automated failback
• Planned migration
1. Maximum of 75 VMs per site and per SRM instance
2. Subject to the product‟s technical scalability limits
New in SRM 5.0
41
42. VMware BC/DR Service Offerings
VMware vCenter Site Recovery Manager Jumpstart
• The VMware vCenter Site Recovery Manager Jumpstart provides you
with a proof-of-concept, on-site installation and configuration of SRM
• 3 days on-site, 5 participants max
Custom BCDR Plan and Design Service
• Comprehensive architectural design for BCDR, covering data protection, local
availability, and disaster recovery.
• Address customer-specific requirements
• Flexible engagement model and duration
42
Traditional business continuity solutions are complex and expensive.Business continuity can be implemented at the application level with application-specific clustering solution such as Oracle RAC, MS clustering for SQL, Exchange Database Access Groups (DAG), etc. App-level clustering usually provides best-in-class availability, but at a very high cost and complexity. There are also solutions that can be implemented at the infrastructure layer, typically backup and data replication. These tend to be more cost-efficient than app-level solutions, but can’t quite compare in terms of uptime and RTOs.
VMware provides a suite of Business Continuity solutions to offer holistic BCDR protection to all applications running on the vSphere platform. These solutions provide simple, cost-effective protection with a common solution for all your applications. The VMware BCDR solutions includes: Local availability products to protect applications against downtime of individual hosts. This includes vSphere HA and FT for unplanned downtime, as well as vMotion and Storage vMotion for planned downtime. Data protection solutions to back up entire VMs, including OS, application binaries, and application data, in a simple, non-disruptive manner. This includes the vSphere Data Recovery product, designed for smaller deployments, and the Storage APIs for Data Protection that enable third party backup vendors to integrate directly with vCenter and vSphere.vCenter Site Recovery Manager and the new vSphere Replication product to protect applications against site failures.The new Cloud Infrastructure Launch improves many of our BCDR capabilities, including vSphere HA, vSphere Data Recovery, vCenter Site Recovery Manager, and introduces the new vSphere Replication product.VMware vSphere™ is the leading virtualization platform that forms the foundation for building cloud infrastructures. It is specifically designed to holistically manage large collections of infrastructure – CPUs, storage, networking – as a seamless, flexible and dynamic resource pools that can be allocated to the right users and applications on-demand. vSphere comprises infrastructure services that transform server, storage and network hardware into a shared resource and application services that are built in and available to all applications that run on it.
VMware business continuity products provide the best of both worlds – similar RTOs and RPOs to app-level availability, but with the simplicity and cost-efficiency of shared availability services.VMware business continuity solutions provide protection at similar cost to traditional backup and replication – but with much better RTOs. For example, recovering an application from a traditional backup can take many hours or even days. Recovering an application from a VM backup only takes a few minutes, since the whole system – including OS, application binaries and data – are encapsulated as a single set of files. Similarly, doing a DR failover using traditional replication can take days, while recovering using SRM can take less than one hour.VMware business continuity provides similar RTOs to app-level availability, but at much lower cost and complexity. For example, recovering from a failed host with VMware HA takes a few minutes – similar to recovering from a host failure using MSCS. But VMware HA does not require dedicated standby systems, expensive software licenses, and complicated set up.h
We all understand that virtualization helps consolidate and reduce costs – and that is typically the first thing that comes to mind when thinking about virtualization. But it’s also interesting to note that for our existing customers, the #1 objective for virtualization is consistently to improve BCDR.VMware vSphere is inherently a better platform for BCDR that offers many new BCDR options. And customers that know VMware well make it a priority to leverage these inherent capabilities to improve their BCDR configurations.
Let’s do a quick overview of SRM for disaster recovery
Key Points:Disaster recovery in the traditional physical x86 world is hard for the following reasons:Easiest and most reliable recovery requires complex server and recovery site infrastructureThe recovery process is also typically fairly complex due to hardware dependencies, application dependencies, etc.Due to the complexity of the process, having accurate and up-to-date documentation (a disaster recovery “runbook”) is essential, as are ensuring that everyone is thoroughly trained on recovery procedures and executes them correctly when neededAs a result, recovery requirements often fail to be met: recovery takes too long, is often unsuccessful, and requires significant investment of IT $$ and staffScript:Traditional disaster recovery plans depend on a very complex set of processes and infrastructure: duplicate datacenters, duplicate server infrastructure, processes for getting data to a recovery site, processes for restarting servers, processes for reinstalling operating systems, and so on. Because disaster recovery can be some complex, organizations often find themselves unable to provide good protection to more than a privileged few of their production workloads, leaving other workloads (e.g. file/print servers, internal web servers, departmental applications) unprotected or poorly protected.Because of the complexity of disaster recovery plans and infrastructure, organizations are heavily dependent on significant amounts of personnel training, on the accuracy and completeness of thick paper “runbooks” that document the recovery process, and on perfect execution of the recovery process when an outage does occur. Because testing is disruptive and expensive, organizations have a limited ability to ensure that all of their training, documentation, and execution is practiced and can successful recover their IT services.As a result of these challenges, tests of recovery plans often fail; basic recovery of critical workloads – if successful at all – often takes days or weeks; and a significant amount of IT time and resources are consumed by managing and maintaining recovery plans. In short, most firms fail to meet the continuity requirements set by their organizations.
I want to review 3 key features of virtualization that lead to better disaster recovery.Consolidation. Server consolidation means doing more with less. You’re reducing your physical footprint, which also means you can streamline your disaster recovery plans and standardize your recovery process. Hardware Independence. This means being able to recover onto any x86 hardware. So, you have the flexibility to buy different servers for your recovery site, or even fewer servers. Continuing to virtualize your production site will free up additional machines that can then be moved over to your recovery site. Encapsulation. Because virtualization captures everything about a server into just a few files on disk, the real benefit here is mobility. The ability to move your VMs wherever you want. You can back them up in the same way you currently protect your other files. You can also replicate them to your disaster recovery site, so that they will be available when you need to recover from an outage.
SRM provides very broad application coverage.The RPO is set by the replication product in use. Since SRM supports a very broad range of replication products, the RPO can be set anywhere from days to synchronous replication, depending on business needs, replication capabilities, and network bandwidth.The RTO will depend on the actual configuration, but is typically in the range of 30 mins to a few hours. Factors that will influence this include the number of VMs that need to be recovered, the number of hosts available to recover those VMs, and the replication technology.SRM can be used to protect the vast majority of applications. The only applications that can not be protected with SRM are applications that need better RTOs than 30 minutes to a few hours. Applications that need continuous or near-continuous RTOs will need to be protected using geo-clustering.
Key Points: Site Recovery Manager can be used in a number of different failover scenarios In particular, SRM helps you to make better use of your investment in a second site—you can use that second site for other workloads when you aren’t in a disaster recovery scenario rather than just having it sit idleScript:Site Recovery Manager can be used in a number of different failover scenarios:Active-Passive: Site Recovery Manager absolutely supports the traditional active-passive DR scenario, where a production site running applications is recovered at a second site that is idle until failover is required. Although the most common configuration, this scenario also means that you are paying a lot of money for a DR site that is idle most of the time.Active-Active: To make better use of the recovery site, Site Recovery Manager also enables you to leverage your recovery site for other workloads when you aren’t using it for DR. Site Recovery Manager can be configured to automatically shutdown or suspend VMs at the recovery site as part of the failover process so that you can easily free up compute capacity for the workloads being recovered.Bidirectional: Site Recovery Manager can also provide bidirectional failover protection so that you can run active production workloads at both sites and failover to the other site in either direction. The spare capacity at the other site will be used to run the VMs that are failed over.Local Failover: Although less common, some of our customers need to be able to failover within a given “site” or campus, for example when a storage array failure occurs or when building maintenance forces you to move workloads to a different campus building. These customers are leveraging Site Recovery Manager to perform these failovers.
Today, DR coverage is often limited to Tier 1 apps in larger datacenters. In many cases, Tier 2 / 3 apps ad smaller sites do not have true DR protection, but are only protected with backups. This is because traditional DR protection is cost prohibitive and too complex to be broadly applied to all applications and sites.Unfortunately, this level of DR protection leaves a substantial business risk as day-to-day activities still rely extensively on the T2 / 3 apps and smaller sites. Ideally, organizations should have a simple, cost-effective, reliable DR plan in place for all their applications and sites.
vSphere Replication has been designed specifically to provide very cost-efficient, simple, yet powerful replication for SRM deployments.It is more cost-efficient because it reduces both storage costs and replication costs. At the storage layer, vSphere Replication eliminates the need to have higher end storage arrays at both sites. Customers can use lower end, different storage across sites, including Direct Attached Storage. For example, one popular option is to have Tier 1 storage at the production site, but lower end storage at the failover site, such as older arrays or less expensive arrays. vSphere Replication is also bundled with SRM at no additional cost, eliminating the additional cost of storage-based replication licenses.vSphere Replication is also inherently simpler than storage-based replication. Replication is managed directly from vCenter, eliminating dependencies on storage teams. And it is managed at the level of individual VMs, making the setup of SRM much simpler.Despite its simplicity and cost-efficiency, vSphere Replication is still a robust, powerful replication solution. It can provide 15 minute RPOs, and gives the flexibility to set RPOs between 15 minutes to 24 hours. It tracks changed disk areas and only replicates the latest deltas to increase network efficiency, and will scale upwards of 500 virtual machines. It does however have a few limitations, such as the lack of support for automated failback (in this initial release), file-level consistency, no support for FT, linked clones, templates, and physical-mode RDMs.
vSphere Replication is a great way to expand DR protection to Tier 2 / 3 apps and smaller sites.Storage-based DR protection is fairly expensive, with the cost mostly driven by storage capacity on Tier 1 storage arrays and additional replication licenses. The typical storage, replication and SRM cost is in the range of $2,000 per VM. While this is much better than physical DR, it is still fairly high and can be cost-prohibitive for less business-critical environments.vSphere Replication is much more cost-effective. By supporting the use of lower-end storage arrays, eliminating the need for dedicated replication licenses, and offering lower-cost SRM Standard licenses, the cost per VM is reduced by 3X to approximately $600 / VM.The much lower cost per VM enables organizations to expand their DR protection to many more applications and sites.
Setting up an SRM recovery plan is done in 5 simple steps.1 – The user maps resources at the production site to resources at the failover site. This includes resource pools, vSwitches, and VM folders. The VMs will automatically be mapped to the appropriate resources upon failover.2 – The user selects which virtual machines and virtual machine protection groups to include in the plan. This enables users to set up plans for failing over only part of the protected VMs.3 – The user selects which low-priority VMs to suspend at the recovery site – for example test and dev VMs that will be suspended to make room for the recovered VMs.4 – The user specifies the boot sequence of recovered VMs – for example, first the database, then the application tier, and finally the web tier.5 – Finally, the user can customize re-IPing of VMs to comply with the network at the failover site.Optionally, users can add messages, pauses for execution of manual steps, and custom scripts for example to update DNS servers with new IP addresses.SRM eliminates many of the traditional recovery steps by automatically coordinating recovery across infrastructure layers. For example:Individual hosts no longer need to be re-configured at the recovery site Users no longer need to worry about recovering entire systems including the OS, application binaries, and application data – since the entire system is recovered with the VM Coordinating of replication and storage is done automatically. Users no longer need to worry about stopping replication, presenting the LUNs to vSphere, and registering the VMs into vCenter There is no need to reconfigure physical switching infrastructure.
Customers often wonder whether SRM enables application consistent recoveries. In other words, whether the application can be recovered in a clean state, or whether the application is recovered in a crash-consistent state.The short answer is that it is possible to ensure application-consistent recoveries. The application consistency is enabled by the underlying replication technology. Many replication vendors have management products for this purpose – such as EMC’s Replication Manager or NetApp’sSnapManager.Typically, these products place an agent in the VM that will quiesce the VM prior to executing the replication. The quiesced VM at the recovery site is then used for recovery purposes.In this initial release, vSphere Replication will only provide file-level consistency, and not application consistency.Even when application consistency is not ensured by the replication product, the ‘planned migration’ workflow, new with SRM 5, ensures application-consistent migrations for planned migrations. This is the case for both storage-based and vSphere Replication.
SRM 5 now comes in two editions: Standard and Enterprise. Standard is designed for smaller environments, and can be used to protect up to 75 VMs (per site and per SRM instance). Enterprise is designed for larger environments with more than 75 VMs to protect. Both editions are full-featured and include vSphere Replication, automated failback, and planned migration.The tiered SRM editions provide a particularly attractive price-point for SMBs and Remote Offices – at $195 per VM. This is a steep reduction from the standard SRM 4 list price of $450 / VM. This more attractive price-point will enable smaller organizations and sites to leverage vSphere Replication and significantly reduce the overall DR costs.SRM Enterprise is priced at $495 / VM, a very small increment to SRM 4 which now includes vSphere Replication and the other new SRM 5 capabilities.
VMware provides a comprehensive set of BCDR services.The VMware SRM Jumpstart helps organizations get started with SRM. It’s a short 3-day service offering, designed to do an initial PoC / on-site installation of SRM, integrate SRM with the storage arrays, and set up a simple recovery plan.VMware also offers a custom BCDR Plan and Design service, intended to provide a holistic BCDR service covering local availability, data protection, and disaster recovery.