2. Cross-site Availability – Typical choices today
2
Two sites, treat as one = stretched cluster Two sites, treat as two = Disaster Recovery
Site A
(Active)
Site B
(Active)
Stretched Storage Site A
(Active)
Site B
(Passive)
Replicated Storage
VCENTER SRM VCENTER SRMVCENTER
3. Active-Active datacenters – use cases
• Planned maintenance of
one site without any service
downtime
• Transparent to app owners
and end users
• Avoid lengthy approval
processes
• Ability to migrate
applications back after
maintenance is complete
Planned Maintenance
• Automated initiation of VM
restart or recovery
• Very low RTO for majority
of unplanned failures
• Allows users to focus on
app health after recovery,
not how to recover VMs
Automated Recovery
• Prevent service outages
before an impending
disaster (e.g. hurricane,
rising flood levels)
• Avoid downtime, not
recover from it
• Zero data loss possible if
you have the time
Disaster Avoidance
4. What you need for an Active-Active datacenter model
Stretched Storage solution
– Storage clustering solution that supports
distributed data mirroring
– Read/write access to the same volumes
from both sites
– Some tie-break mechanism to avoid split-
brain
– Examples: EMC VPLEX, IBM SVC, NetApp
MetroCluster, etc.
– Stretched Network
Backend Array
(Site 1)
Backend Array
(Site 2)
Storage
controllers
Storage
controllers
5. Active-active datacenter network model
5
Multi-Site Single vC with Stretched Clusters
Stretched Storage
WebAppDB
Web App DB
Web App DB
N-S Connectivity N-S Connectivity
6. Active-Passive datacenters – use cases
• Recover from unexpected
site failure (full or partial)
• Most common use case
• Fast and accurate recovery
usually critical to customers
• Workflow driven
• High degree of confidence if
regular test failovers have
been performed
Unplanned Failover
• Planned datacenter
maintenance
• Global load balancing or
distribution of service
• Using test feature to
minimize risk
• Execute partial failovers
• Automated failback enables
bi-directional migrations
Planned Migration
• Anticipate potential
datacenter outages
• Initiate preventative failover
for smooth migration of
services
• Graceful shutdown of
services to be migrated,
zero data loss
Preventative Failover
7. What you need for an Active-passive datacenter model
Replicated Storage solution
– Storage or software based replication
configured between sites
– vCenter per site
– SRM server per site
– Network can be stretched or not
– Concept is referred to as Active-passive;
reality is each site is active simply acts as
the passive DR location for it counterpart
Site A
(Active)
Site B
(Passive)
Replicated Storage
VCENTER SRM VCENTER SRM
8. NSX 6.2 Integration with SRM 6.1
8
Implicit Mapping
Distributed
Switch
Distributed
Switch
SRM BSRM A
NSXUniversal Logical Switch
9. Cake and eat it
• Most common requests
– Use both stretched and non-stretched storage in same design
– Leverage operational benefits of SRM for stretched storage
– Use SRM to drive large scale migrations where needed on stretched solutions
• Can this be done?
– Prior to vSphere 6.0 the answer was NO, its one or the other
– Reaction from customers was usually this…….
9
10. • Support introduced in vSphere 6.0
• Requires vCenter & ESXi 6.0 or later
• Simultaneously changes
– Compute
– Storage
– Network
– vCenter
• vMotion without shared storage
• Increased scale
– Pool resources across vCenter servers
What has changed?– vMotion anywhere!
10
VCENTER
VMware vSphere
Stretched Networks
VCENTER
VMware vSphere
11. This cross vCenter vmotion layout seems familiar….
11
Cross vCenter vMotion layout SRM layout
Site A
(Active)
Site B
(Active)
Stretched Storage Site A
(Active)
Site B
(Passive)
Replicated Storage
VCENTER SRM VCENTER SRMVCENTERSRM VCENTERSRM
12. So what can be done now?
12
VCENTER SRM VCENTER SRM
Replicated Storage
Stretched Storage
vSphere
Replication
13. Why customers ask for SRM integration with stretched clusters
13
• vCenter Availability
– Failure of the site where vCenter is running disrupts management of both sites
• Operational Watchdogs
– Availability specific alarms, alerts and events
– Configuration validation on the fly
• DRS and HA are not site aware
– VMs are recovered and migrated to any site – may not be what you want !
– Could result in additional East-West traffic when your network is not designed to handle it
• No Orchestration or Testability
– Stretched Clusters lack a repeatable, testable procedure to handle unplanned failures
– HA will restart VMs based on VM restart order – but doesn’t give you granular control of VM
dependencies or customization
15. active-active datacenters with SRM 6.1
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1
(Full R/W access)
Volume A at Site 2
(Full R/W access)
Stretched Networks
16. scenario 1: local host failures in one site
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1
(Full R/W access)
Volume A at Site 2
(Full R/W access)
Stretched Networks
HA handles local failures
17. scenario 2: disaster avoidance at one site
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1
(Full R/W access)
Volume A at Site 2
(Full R/W access)
Stretched Networks
SRM invokes vMotion as
per VM priority and
dependencies
18. scenario 3: faster recovery from unplanned failures
VCENTER SRM VCENTER SRM
VMware vSphere VMware vSphere
Volume A at Site 1
(Full R/W access)
Volume A at Site 2
(Full R/W access)
Stretched Networks
SRM orchestrates entire
site failover including
dependencies
20. SRM with stretched storage: Initial setup
20
• New SRA interface
– Contact your array vendor for the SRA availability and
supported array models
– Most vendors will provide a single SRA to manage both
stretched and non-stretched volumes
– Existing SRAs (with no stretched storage support) will continue
to work as is with new SRM
• Configure stretched storage volumes using the array
UI/tools
• SRM will discover stretched arrays/volumes through the
SRA
– Use the SRM UI to verify stretched volumes and how they map
to datastores
– Use the SRM UI to verify the site preference for stretched
volumes
SRM
SRA
Vendor
Management
Interface
Array
Manager
Array
Manager
Replication Manager
SRA
Vendor
Management
Interface
Stretched
Storage
Non
Stretched
Non
Stretched
22. Introducing - Storage Policy Protection Groups (SPPG)
22
Profile Driven
Protection Group
• New Style Protection Group
leveraging storage profiles (SRM 6.1)
• Level of indirection and automation
compared to traditional protection
groups
• Policy based approach reduces
OpEx by handling VM protection
lifecycle automatically
• Simpler integration of VM
provisioning, migration, and
decommissioning with other solutions
such as vRealize Automation
Storage
Policy
23. SRM with stretched storage: configuring protection
23
• Stretched storage supported ONLY with Storage-Profile based
Protection Groups (SPPG)
• Configure storage profiles for stretched volumes at each site
• Configure protection groups for the storage profiles
• SRM will automatically protect all VMs assigned to the storage profiles
in the group
• Can mix stretched and non-stretched storage in the same protection
group
• Protection groups with stretched devices MUST have a preferred
direction
• Preferred direction MUST match any site preference defined at the
storage layer
• Cannot create two groups in opposing directions using same devices
24. SRM with stretched storage: configuring recovery
24
• Create one or more recovery plans for all or some protection
groups
– Same protection group can belong to multiple recovery plans
– Can mix stretched, non-stretched storage (only if they are SPPG
group types)
• Configure recovery settings for each VM
– Stretched VMs** can depend on non-stretched VMs and vice versa
– Can configure IP customization for stretched VMs that do not have
stretched networks (for DR/Test)
– Can assign scripts to stretched VMs
– Can opt out of vMotion for some VMs even if they reside on stretched
storage
** - Stretched VMs refers to VMs on Stretched Storage datastores
25. SRM with stretched storage: test failover
25
• Use test failover to make sure an unplanned failover
would succeed
• Stretched VMs are included in the test failover
• Stretched VMs are powered on in an isolated network
• SRM will perform vMotion host compatibility tests as
part of the test failover
• SRM will not perform vMotion as part of the test
failover
• Complicated environmental issues (i.e. network latency) may
remain undetected
• Test failover for stretched storage requires array
support
– Not all arrays support snapshotting of stretched devices
– Contact your array vendor for specific compatibility
requirements
VCENTER SRM VCENTER SRM
test
26. SRM with stretched storage: planned migration
26
• Use planned failover to avoid an expected outage
• Choose whether to use vMotion when initiating a
planned migration
– Enabled, SRM uses vMotion on stretched
– Disabled, SRM power off / power on as normal
• SRM will reassign site preference to recovery site
for all stretched volumes (if array supports it)
• SRM will perform a storage sync to make sure no
blocks are left at the protected site
• Planned failover with mix of stretched and non-
stretched VMs is ok
Site A
(Active)
Site B
(Passive)
Stretched Storage
VCENTER SRM VCENTER SRM
27. SRM with stretched storage: unplanned failover
27
• Use unplanned failover to recover from a disaster
– Initiated when the protected site is no longer
functional
• Stretched VMs are powered on at the recovery
site
– Stretched and non-stretched VMs are recovered
together
– Priority tiers and VM dependencies are honored
across all VMs
– SRM will coordinate with the array to guard against
any VMs still running at the protected site
• Stretched volumes are recovered faster – shorter
RTO
– Stretched volumes are already visible at the
recovery site
– No need to wait for costly surfacing, mounting and
host rescan operations
28. SRM with stretched storage: reprotect and failback
28
• Use reprotect to reverse the roles of sites after a
successful planned failover
– Repairs replication for stretched devices
– Ensures the new protected site (former recovery site)
has the site preference for stretched devices
• Reprotect after an unplanned failover
– Rerun planned failover once the protected site becomes
available
• Failback
– Initiate a planned failover after the reprotect to migrate
all VMs back to the former protected site
– It is recommended to perform a test failover first to
make sure everything is ready
29. Key Takeaways
Roadmap
• SRM is a great solution for Active Active datacenters
• SRM enhances Continuous Availability with rich
orchestration
• SRM with vMotion enables ZERO service downtime for
disaster avoidance
• No longer trade-off testability and repeatability when
choosing Active-Active model
• SRM + Live Migration is a game changer in IT
operations