vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentation-3c3c
Upcoming SlideShare
Loading in...5
×
 

vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentation-3c3c

on

  • 1,223 views

 

Statistics

Views

Total Views
1,223
Views on SlideShare
1,223
Embed Views
0

Actions

Likes
0
Downloads
42
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Traditional disaster recovery plans depend on a very complex set of processes and infrastructure: duplicate server infrastructure, identical storage infrastructures, processes for getting data to a recovery site, processes for restarting servers, processes for reinstalling operating systems and/or applications, and so on. Because of this complexity, organizations depend heavily on significant amounts of personnel training, on the accuracy and completeness of the documented recovery process, and on perfect execution of that process when an outage does occur. Testing can be disruptive and expensive; organizations have a limited ability to make sure that all of their training, documentation, and execution is practiced and successful. Using traditional storage technologies results in the requirement for 2 to 3 times the capacity, a complete copy of storage at the DR site, and in some cases a third copy of data to perform DR testing. In addition, WAN utilization levels can be unacceptable.This is why tests of recovery plans often fail; basic recovery of critical workloads – if successful at all – often takes days or weeks and a significant amount of IT time and resources. Recovery can also cause unacceptable levels of WAN usage.Most firms fail to meet the continuity requirements set by their organizations and find themselves unable to provide protection for more than a few of their production workloads, leaving other workloads (e.g. file/print servers, internal web servers, departmental applications) unprotected or poorly protected.
  • One of the most valuable features in SRM is it’s ability to enable non-disruptive DR testing. When a DR test is performed SRM will provision a private network at the recovery site, connect the VMs to that private network and power them on. SRM can automate the shutdown of any VMs at the recovery site, such as dev/test VMs, to free compute and memory resources for the DR test and for real failover. SRM integrates with NetApp FlexClone to automatically provision FlexClone volumes for use in the DR test.
  • Configuring protection, performing test failover, and failover for an unplanned outage are workflows that have been in SRM since version 1. In addition to those there are several new workflows added in SRM version 5. These new workflows are described on the following slides.
  • The most important new feature in SRM 5 is automated failback. Prior to version 5 array based replication had to be manually reversed and the SRM environment completely reconfigured to use SRM for failback to the original site. SRM 5 introduces a new workflow called Reprotect. After performing a planned failover you execute the reprotect workflow to prepare for failback. In cases of unplanned failovers the reprotect workflow can also be executed if the original storage was not permanently lost and the storage has been recovered to an online state. When the reprotect workflow is executed SRM will use the NetApp storage adapter to reverse the SnapMirror relationships and resynchronize the storage in the opposite direction. With snapmirror only the delta of new data written at the recovery site since failover must be replicated back to the original site in a case where the original storage still survives. After the reprotect workflow is executed the roles of the two SRM sites are now reversed, with the original protected site becoming the recovery site and the original recovery site becoming the protected site. Performing an automated failback is now simply a matter of executing the planned failover workflow which properly shuts down the VMs, synchronizes storage, and starts the VMs at the recovery site.
  • To enable smaller environments that might not have array based replication software to utilize some of the workflows of SRM, VMware has introduced a host base replication capability in SRM 5 called vSphere Replication. This feature uses a replication appliance (a VM) that runs at each site which drives the replication process. Replication is managed through the vCenter SRM plugin as a property of each VM allowing per VM granularity of replication. This feature does support the planned/unplanned failover and test failover workflows in SRM but it does not support the reprotect workflow required for automated failback. While SRM 5 does support ESX hosts from 3.5UX through version 5, vSphere replication requires that ESXi hosts be running vSphere ESXi version 5.Array based replication such as NetApp SnapMirror is a more efficient means of replicating multiple VMs in one replication job by way of replication of entire datastores.Both array based and vSphere Replication can be used in the same environment.
  • SRM 5 introduces some improvements to the recovery workflows that significantly reduce the time required to recover VMs. When a VM is recovered at the recovery site the VM settings (as in configuration information stored in the vm.vmx file) had to be reconfigured to proper values to be used at the recovery site, such as the unique identifier name (UUID) of the datastores the VM was stored in, and the networks the VM was to be connected to. Prior to SRM version 5 this reconfiguration was performed per VM in a serial fashion and was done during the prepare storage step of the SRM recovery plan. This meant that no VMs could begin to boot until all VMs had been reconfigured and the recovery plan moved on to the next step to start the VM startup process. In SRM 5 the VM reconfiguration process has been moved to an independent step, where multiple VMs can be reconfigured and each VM may be powered on as soon as it has been reconfigured. SRM 5 also makes use of a new vSphere API which allows multiple VMs to be started in one API request.Some environments require that the VM guest OS be reconfigured with different network information such as IP address, subnet mask, or DNS server. To perform this reconfiguration in SRM 4 a VM customization specification could be created which defined the network configuration setting which needed to be changed. The process of applying customization specifications to VMs involves booting the VM once to set the customization, then rebooting the VM once more to apply the customization and start the VM, this cause a 2X increase in recovery time for each VM. SRM 5 now makes use of the VMware VIX API which allows network configuration to be performed without an additional reboot of the VM.
  • The NetApp Disaster Recovery Adapter version 1.4.3 for SRM version 4, and the new NetApp FAS/V-Series Storage Replication version 2.0 which must be used with SRM version 5 are both multiprotocol adapters supporting FC, iSCSI, and NFS VMware storage protocols in one adapter.The NetApp adapters create fully thin provisioned FlexCone environments for SRM test failover, automatically turning off or disabling volume and LUN space guarantees in created FlexClone volumes, to prevent a requirement for 2X capacity to perform a DR test and to perform testing without interrupting replication.The NetApp adapters also supports configuration of MultiStore vFiler units as storage arrays in SRM.Support for NetApp Data ONTAP operating in Cluster-Mode is planned for future release of the NetApp adapter
  • Volume SnapMirror can also take advantage of the native WAN compression capabilities to further reduce network utilization in low bandwidth WAN environments.
  • NetApp FlexClone technology allows replicated data to be instantaneously made writable and presented to the ESX hosts for storage. This enables very quick and space efficient DR testing with VMware SRM. The SRM DR testing component leverages FlexClone functionality to create a copy of the DR data in a matter of seconds, requiring only a small percentage of additional capacity for writes that occur during testing. Because of the low capacity requirements and quick provisioning provided by FlexClone, DR test environments can be created frequently to allow more aggressive and regular DR testing schedules.
  • During recovery from a DR event performance of the system is critical. NetApp Flash Cache and FAS Deduplication provide significantly faster VM boot times which can dramatically improve overall recovery times.Deploying NetApp Flash Cache results in the need to purchase fewer expensive disk, eliminates the need to deploy expensive SSD drives, reduces overall disk I/O and provides a virtual storage tiering that requires no administrative overhead or time to configure. Shared data that is read most often by the ESX hosts is readily available in high speed cache requiring no disk overhead and increasing overall RTO times.FAS6240 and FAS6280 now come standard with NetApp Flash Cache.
  • Upgrading from SRM version 4 to version 5 requires an uninstall of the SRM 4 software then an install of the SRM 5 software and use of a configuration import utility provided by VMware. In NetApp environments it is important to note that you must uninstall the SRM 4 storage adapter prior to uninstalling SRM version 4. If SRM version 4 is uninstalled before you attempt to uninstall the NetApp adapter then the adapter uninstall will fail and you will have to manually uninstall the storage adapter by removing the software and editing the windows registry.
  • There are several ways configure storage layouts and replication in VMware environments using NetApp storage. Adhering to best practices described in documents such as TR-3749 NetApp and VMware Storage Best Practices will allow an environment to be supported by SRM. It’s recommended to follow the configuration workflows described in the NetApp FAS/V-Series Storage Replication Adapter admin guide and release notes and to make checking the environment configuration a part of configuration of the SRM environment. This helps to ensure that the first test failover attempted might be a successful one.In Microsoft Windows environments Microsoft does not recommend replication of Active Directory servers as this can lead to issues with out of sync AD databases and inability for an AD server to service login attempts. See http://support.microsoft.com/kb/875495 for information about AD issues. Instead of replicating AD servers you should have AD servers permanently provisioned at your SRM recovery site. To provide name resolution and user authentication services in the DR Test Network, clone the AD server at the recovery site just prior to running the DR test. Once the cloning is done, before powering on the VM, be sure to connect the cloned AD server to the DR test network. After the AD VM is powered on in the test network, five Flexible Single Master Operations (FSMO) roles in the Active Directory forest must be seized as per the procedure described in the following Microsoft KB: http://support.microsoft.com/kb/255504. The five roles are Schema master, Domain naming master, RID master, PDC emulator, and Infrastructure master. The cloned AD server will now be operating privately within the DR test network and can provide AD services for VMs in test failover mode.
  • Upgrading from SRM version 4 to version 5 requires an uninstall of the SRM 4 software then an install of the SRM 5 software and use of a configuration import utility provided by VMware. In NetApp environments it is important to note that you must uninstall the SRM 4 storage adapter prior to uninstalling SRM version 4. If SRM version 4 is uninstalled before you attempt to uninstall the NetApp adapter then the adapter uninstall will fail and you will have to manually uninstall the storage adapter by removing the software and editing the windows registry.
  • Today Site Recovery Manager provides no mechanism for the SRA to report which destination in a multiple-destination replication scenario is the one intended to be used by Site Recovery Manager for DR failover. For this reason each SnapMirror relationship must be replicated to only one destination. If you’re having problems with Site Recovery Manager not properly discovering a replicated datastore, use the SnapMirror status or SnapMirror destinations command on the source system to determine if there are any other SnapMirror relationships for that same volume. There might be relationships left over from a data migration or a lab setup.In a cascaded SnapMirror relationship only the first hop of the SnapMirror transfer, from A to B in an A to B to C scenario, may be used with SRM.
  • To support MultiStore vFiler units as storage arrays in SRM you must turn on the vfiler.vol_clone_zapi_allow option on the physical controller hosting the vFiler. This option allows FlexClone commands to be sent directly to the vFiler.
  • For LUNs to be recovered by SRM on a NetApp FAS/V-Series storage array you must have LUNs in an igroup type of “vmware”. Remember that RDM LUNs (LUNs which are connected to the ESX host and then provisioned to VMs) must also be in an igroup of type “vmware” but that the LUN type would be that of whatever the guest OS required.For Disaster Recovery Adapter version 1.4.3 and earlier you must pre-create igroups and add initiators at the recovery site. Typically there will already be some storage connected to the ESX hosts at the recovery site (such as the SRM datastore for temporary placeholder VMs) and so an igroup will already exist there. Do not forget if using MultiStore vFiler units as storage arrays to create igroups and add initiators, as vFilers might not already have LUNs connected and so may not have any igroups.A new feature that was introduced in NetApp Fas/V-Series SRA version 2.0 is the automatic creation of igroups. SRA 2.0 will always automatically create igroups for failover tests, and during real recovery if no igroup exists that exactly matches the initiators contained in the SRM recovery request then the SRA will create a new igroup. Note that SRA 2.0 requires SRM version 5.You should never pre-add replicated LUNs to an igroup at the recovery site. This will generate an error during recovery. SRM must be allowed to add the LUNs to the igroup.
  • In order for Site Recovery Manager and the SRA to properly map NFS mounts to ESX hosts, the exports must be listed in the /etc/exports file on the source array. Exports created manually by using the exportfs on the cli will not be detected by Site Recovery Manager. You can use the cli exportfs command with the –p option (permanent) to add the export to the /etc/exports file automatically.Exports must also contain values in the rw field of the export security settings. If a share is exported using rw with no values (rw to everyone), the Site Recovery Manager and the SRA will be unable to determine that the share is utilized specifically by ESX hosts in the Site Recovery Manager environment.SRM will not allow you to protect empty datastores. If you would like a datastore to be protected then you can create a dummy VM in that datatore, no OS required in the VM, then SRM will detect the datastore and allow you to protect it.
  • Site Recovery Manager requires a 1-to-1-to-1 relationship between [VM] – [protection group] – [array manager].In this example these layouts are supported because each VM (VM5 and VM6) has data on only one array at either site.
  • Site Recovery Manager requires a 1-to-1-to-1 relationship between [VM] – [protection group] – [array manager].In this layout VM 5 would not be supported for recovery by Site Recovery Manager because at the recovery site VM 5 has data on both Array C and a RDM on array D. This would also be unsupported if VM5 were to have data on both arrays at the protected site.The requirement is due to behaviors in both SRM and the NetApp adapter. NetApp controllers are administered via API separately. This means each controller has to be added to SRM as an array manager. SRM makes an array manager call to recover all the storage required by a set of VMs.  SRM does not support making two array manager calls to recover one VM. If a VM is configured with storage on more than one array it cannot be recovered because SRM will not make two array manager calls to recover a group of storage devices for a single VM (one call to one controller then a 2nd call to the other).
  • There are many ways to configure storage and replication in NetApp and VMware environments. The recommendation is to use volumes as NFS datastores, or to store LUNs in a volume and use volume level SnapMirror.It is possible to configure multiple qtrees in one volume each as a NFS datastore, or to provision multiple LUNs inside a volume, or multiple qtrees with a LUN in each qtree, however these configurations have implications in SRM deployments, especially as it pertains to the new failback capabilities in SRM 5.If you are using volume level SnapMirror and have provisioned multiple qtrees in a volume, and are exporting each qtree as a different NFS datastore SRM will support this for failover. However there are implications for doing this with SRM. If you failover any of the qtrees in such an environment because you are using volume level SnapMirror the SRA must perform a SnapMirror break for the whole volume including all the qtrees. However, only VMs and qtrees in the failed over recovery plan will be recovered by SRM. If you then perform a failback with SRM 5 of one subset of the qtrees in that volume there is risk of disrupting the non-failed over qtrees at the target failback site. The same is true of a configuration that uses qtrees with a LUN provisioned in each qtree. The same is true of multiple LUNs provisioned in one volume without qtrees.If you are using volume level SnapMirror, and have provisioned qtrees in the volume, and configured one NFS export at the volume level, but you mount each qtree in the volume as a separate datastore this configuration is not supported and will report an error in SRM.
  • Mixing of volume level mirroring, qtree level mirroring, storing LUNs in qtrees versus LUNs in volumes, using volume level NFS exports versus qtree level exports, can all be problematic for a SRM environment as SRM and the SRA attempt to match resources from the protected and recovery sites.The recommendation is to use the same granularity (volume or qtree) for replication, NFS export, and NFS mount point to avoid issues.
  • If you are using volume level SnapMirror and have provisioned multiple LUNs in a volume single volume you should configure all the LUNs in one volume into a single recovery plan to support failback. If you configure each LUN into a different recovery plan and failover any individual LUN, because you are using volume level SnapMirror the SRA must perform a SnapMirror break for the whole volume including all the LUNs. However, only VMs and qtrees in the failed over recovery plan will be recovered by SRM. If you then perform a failback with SRM 5 of one of the LUNs in that volume there is risk of disrupting the non-failed over LUNs at the target failback site.
  • SRA 2.0 cannot support mixed ALUA configurations. Mixed ALUA configuration is one where a single ESX host, or multiple ESX hosts in the same ESX cluster, has some initiators configured in ALUA enabled igroups and the same ESX host or hosts have other initiators configured in ALUA disabled igroups. An example of an unsupported single ESX host configuration would be one where some initiators are used in ALUA enabled igroup for VMFS LUNs and different initiators are in ALUA disabled igroup for RDM LUNs to support MSCS in a VM. An example of an unsupported ESX cluster configuration would be an ESX cluster that contains both ESX 3.5 hosts and ESX 4.x or 5.0 hosts where the ESX 3.5 initiators must be in a ALUA disabled igroup and the ESX 4.x or 5.0 hosts should be in a ALUA enabled igroup, and where SRM resource mappings are done at the cluster level.
  • The SRA in array managers, note the IP addresses from the NetApp storage networks are added into the NFS IP Addresses field. Multiple addresses are supported, separated by commas. SRM 4 and 5 supports NAS datastore connections on private storage networks and ESX host connections to a single storage controller on multiple addressesDesigns are described in TR-3749 NetApp and VMware vSphere Storage Best Practices http://media.netapp.com/documents/tr-3749.pdf
  • In this environment we have a private network for storage, where we’ve configured two subnets to use to connect to storage. Some datastores are mounted over one subnet and some over the other.
  • The storage discover process was changed in SRM 5 to report replication direction in the SRM interface. Because of this any replicated storage devices that are detected that are not part of the SRM environment may show up in the SRM interface with an error or warning on them. This can be prevented using the new volume filtering capability in SRA 2.0.
  • To prevent the undesired storage devices from showing up in the SRM interface you can use the volume include and volume exclude lists on the edit array mangers screen. If you enter a string of text in the volume include list the SRA will report only storage devices (NFS datastores or LUNs) on volumes where the string entered is contained in the volume name. If you enter a string of text in the volume exclude list the SRA will omit any storage devices (NFS datastores or LUNs) on volumes where the string entered is contained in the volume name.You can enter multiple strings separated by a comma to include or exclude multiple string patterns.
  • If you are using an IP address, connection name, or non-default host name of the source system in a snapmirror relationship then some extra configuration must be done to enable the SRA 2.0 to support this configuration.
  • This configuration information is required for SRA 2.0 in order to support use of IP addresses when the SnapMirror relationships are reversed. If the customer environment requires the use of IP addresses for connecting SnapMirror relationships (for example they cannot have name resolution for private replication networks) then the SRA at each site must have an internal way of resolving those IP addresses into hostnames. Using the use_ip_for_snapmirror_relation option and the ip_hostname_mapping.txt file provides support for this type of environment.The configuration files for the NetApp adapter are stored on each SRM server by default at C:\Program Files (x86)\VMware\VMware vCenter Site Recovery Manager\storage\sra\ONTAP Entries in the ip_hostname_mapping.txt file are case sensitive.
  • The default interval for discovery can be changed with the storage.storagePingInterval advance setting, right click on a site in the SRM sites tab, select storage then change value of storage.storagePingInterval value in seconds.
  • When reversing replication relationships SRM will request that the SRA check the configuration of the existing SnapMirror relationship to determine information about that relationships, such as the update schedule. The SRA will apply the SnapMirror update schedule to the reversed relationship. However the Data ONTAP the API call that sets relationship option does not set a value for compression or SnapMirror TCP window size (wsize). If a customer requires non-default values for compression (off) or window size (2MB) then these settings should be applied after the relationship has been reversed.
  • To clean up SnapMirror after replication reversal:1.Release snapshots on current destination: on_current_destination> snapmirror release vol_namecurrent_source_filer:vol_name (this command simply removes the locks on the un-necessary snapshots at the current destination location)2. Delete older SnapMirror snapshots from the current source:  on_current_source> snap delete snapshot name (For each snapshot to delete, there are typically two. As SnapMirror named snapshots include the name of the active destination system, you can safely delete the SnapMirror snapshots that contain the name of the current source system)3. Allow the next scheduled update to propagate the snapshot deletion to the current destination, or perform SnapMirror update on current destination:  on_current_destination> snapmirror update -S current_source_filer:vol_namecurrent_dest_filer:vol_name
  • This is the same issue described on slide titled “Mixed iSCSI and FC Environments”Because SRM will include both iSCSI and FC initiators in the same request and NetApp Data ONTAP does not allow iSCSI initiators to be in ALUA enabled igroups, then if a customer is using FC with ALUA enabled the ESX recovery host must have the iSCSI initiator disabled. SRM will include the iSCSI initiator in the failover request even if there are not iSCSI targets configured in the initiator, so the iSCSI initiator must be disabled, not simply un-configured.Note from that slide repeated here: SRA 2.0 cannot support mixed ALUA configurations. Mixed ALUA configuration is one where a single ESX host, or multiple ESX hosts in the same ESX cluster, has some initiators configured in ALUA enabled igroups and the same ESX host or hosts have other initiators configured in ALUA disabled igroups. An example of an unsupported single ESX host configuration would be one where some initiators are used in ALUA enabled igroup for VMFS LUNs and different initiators are in ALUA disabled igroup for RDM LUNs to support MSCS in a VM. An example of an unsupported ESX cluster configuration would be an ESX cluster that contains both ESX 3.5 hosts and ESX 4.x or 5.0 hosts where the ESX 3.5 initiators must be in a ALUA disabled igroup and the ESX 4.x or 5.0 hosts should be in a ALUA enabled igroup, and where SRM resource mappings are done at the cluster level.
  • By default the NetApp adapter recovers FlexVol volumes to the last replication point transferred by NetApp SnapMirror. The 1.4.3 release of the NetApp SRM adapter provides the capability to recover NetApp volume snapshots created by NetApp SnapManager for Virtual Infrastructure. This feature is not available in NetApp FAS/V-Series Storage Replication Adapter 2.0 for SRM 5.This functionality is currently limited to support for snapshots created without using the option to create a VMware consistency snapshot when the SMVI backup is created. Considering that by default the adapter recovers the same type of image as that created by a non-quiesced SMVI created snapshot, this feature currently has limited use cases. An example use case for this functionality would be that an application requires recovery to the specific point in time that was created by the SMVI backup. For example, an application runs in a VM that cannot be recovered unless it is recovered from a specific state. A custom script is used with SMVI to place the application into this state and during this state the SMVI backup is performed creating the NetApp Snapshot. The NetApp Snapshot now recovered by the adapter will contain the VM with the application in the required state. non-quiesced SMVI snapshot recovery Configuration rulesThe following configuration limitations apply when using the option to recover non-quiesced SMVI snapshots with the NetApp adapter:This feature requires SMVI version 2.0 or newer.Only recovery to the most recently created SMVI snapshot is allowed.The option to create a VMware consistency snapshot when the SMVI backup is created must be disabled for the SMVI job that creates the backup used for this purpose. (The NetApp adapter will determine if this option was disabled before allowing use of the snapshot.)The option is only supported with volume level Asynchronous SnapMirror.There should be only one VMware datastore in each NetApp FlexVol being recovered.A SnapRestore license is required on the NetApp system.This is a global option that is set for all recovery plans executed while the option is set. To enable it for specific recovery plans the option must be changed before running the desired plan.

vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentation-3c3c vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentation-3c3c Presentation Transcript

  • Larry Touchette Technical Marketing VMware Site Recovery Manager and NetApp FAS/V-Series SE Technical Presentation 1
  • Agenda  DR Challenges & VMware Site Recovery Manager  New features in SRM version 5  NetApp Value in VMware SRM environments  System and Software Requirements  SRM Workflows and Array Interaction  Best Practices and Configuration Rules  SRM and SRA Configuration Workflows  Limitations 2
  • 3 DR Challenges & VMware Site Recovery Manager
  • Traditional Disaster Recovery  Involves: – Complex processes and infrastructure – Precise training, documentation, and execution  Requires: – Dedicated, identical hardware – Significant consumption of time and resources – 2x to 3x the capacity used for production – Unacceptable levels of WAN utilization  Results in: – Inability to test or frequently failed tests – Recovery times of days or weeks – Ability to protect only a few important workloads 4
  • VMware Site Recovery Manager  Advanced workflow automation for DR setup, testing and failover, and failback vCenter™ SRM VMware® ESX® vSphere™ SRM vCenter VMware ESX vSphere − Allows dual purposing of hardware for production or test/dev − Protects more of the environment for less cost − Integrates with NetApp SnapMirror and NetApp FlexClone® Recovery SiteProtected Site NetApp SnapMirror® 5
  • VMware SRM Failover  Configure protection groups at primary site  Build recovery plans at the DR site  After disaster execute recovery plan at DR site  SnapMirror® break automatically performed Protected Site Recovery Site Protection Groups Recovery Plan ® NetApp SnapMirror NetApp SnapMirror® 6
  • NetApp SnapMirror® VMware SRM DR Testing  SRM DR testing: verifies that DR plan is reliable without interrupting production  Automatically creates private network and FlexClone® volumes for testing Protected Site Recovery Site Protection Groups Recovery Plan 7 ®
  • VMware Site Recovery Manager  SRM is bidirectional − Sites can protect each other Protected / Recovery Site Protected / Recovery Site Protection Group Recovery Plan Protection Group Recovery Plan NetApp SnapMirror® 8 ®
  • Site Recovery Manager Major Features  Protect  Test failover  Failover (unplanned)  Centralized administration  vSphere™ replication (host-based replication)  Failover performance improvements  Test failover with storage synchronization  Planned failover with storage synchronization  Automated failback New in SRM 5 9
  • Automated Failback in SRM 5  Reverses the SnapMirror® replication relationships  Resynchronizes storage replication in opposite direction  Reverses the roles of the two sites (only for the VMs in the affected recovery plan)  Then failback is simply the planned failover workflow NetApp FAS Controller Recovery Site NetApp® FAS Controller Protected Site FlexVol Volume 3 LUN4 FlexVol Volume 4 LUN5 FlexVol Volume 7 FlexVol Volume 8 LUN4 LUN5 NetApp SnapMirror NetApp SnapMirror FlexVol® Volume 3 LUN4 FlexVol Volume 4 LUN5 NetApp SnapMirror 10
  • Centralized SRM 5 Administration  SRM 5 administration for both sites can be performed by connecting to either site’s vSphere™ client Protected Site Recovery Site vCenter™ SRM VMware ESX vSphere SRM vCenter VMware ESX® vSphere vSphere Client SRM Administrator 11
  • vSphere Replication in SRM 5 12 SnapMirror vSphere Replicatio n Per VM granularity of replication Datastore granularity of replication Support for automated failback Supports ESX hosts of different versions Supports Physical Mode RDMs Supports Fault Tolerance, Linked Clones Supports powered off VMs Can be used in same environment
  • SRM 5 Performance Improvements  VM reconfiguration step removed from prepare storage step – VMs can start power on as soon as each VM is reconfigured  Multiple VMs powered on with one request – Improves serialization of VM startup  New method for reconfiguration of VM IP addresses – Does not require additional reboots of VMs 13
  • 14 NetApp Value in SRM Environments
  • NetApp® FAS Array RDM RDM Vol2Vol1 APP OS ESX® Cluster VMFS APP OS RDM Pointers F: L: NFS NetApp FAS/V-Series Storage Replication Adapter  Multiprotocol support for FC, iSCSI, and NFS in one adapter  Fully thin-provisioned FlexClone® DR test environments  Support for MultiStore® vFiler® units as SRM storage arrays 15
  • SnapMirror and FAS Deduplication  FAS deduplication on primary storage  Only unique data is replicated to the DR site Protected Site Recovery Site New Data Written Before Dedupe After Dedupe Data Deduplication NetApp SnapMirror® 16
  • SnapMirror Network Compression  SnapMirror® native compression reduces WAN utilization Recovery SiteProtected Site After Dedupe Compression Decompression NetApp SnapMirror® 17
  • FlexClone: Space-Efficient DR Testing  NetApp FlexClone® – Allows frequent nondisruptive testing – Reduces capacity needed for DR testing to only that written during tests Aggregate capacity Storage used by replicated datastores Storage used for FlexClone volume creation (metadata only) Storage used for writes during DR testing 18
  • Virtual Storage Tiering with NetApp Flash Cache  Provides the performance boost needed during critical recover times 19 VMware® ESX®  Faster boot time  Less physical disks required  No SSD required  Less disk I/O performed  Virtual tiering without configuration overhead
  • 20 System and Software Requirements
  • VMware Requirements for SRM in vSphere  Installed at both protected and recovery sites: − A vSphere™ vCenter™ Server − A vSphere Site Recovery Manager Server − SRM 4.1 requires vCenter Server 4.1 − SRM 5.0 requires vCenter Server 5.0 − ESX® Servers − Multiple ESX versions from 3.5UX to 5.0 with a mix of update releases are supported with both SRM 4 and 5; see compatibility matrix for appropriate SRM version at www.vmware.com/support/pubs/srm_pubs.html 21
  • NetApp Adapter Requirements  The NetApp® Storage Replication Adapter (SRA) is free software available to VMware® SRM customers. Obtain the SRA from: Software download page on now.netapp.com or VMware SRM download page www.vmware.com/go/download-srm  NetApp licenses required on protected and recovery site storage − SnapMirror® − iSCSI, FCP, or NFS − FlexClone® 22
  • NetApp Adapter Requirements  All NetApp® FAS and V-Series platforms qualified with VMware® vSphere™ are supported – See supported NetApp platforms at www.vmware.com/resources/compatibility: select Storage/SAN from What are you looking for box, select NetApp from Partner Name box, and click the Update button  For SRM storage support per SRM version, see www.vmware.com/pdf/srm_storage_partners.pdf 23
  • NetApp Adapter Requirements  NetApp Data ONTAP® version support − 7.2.4 or greater required − 7.3.2 or greater required for MultiStore® vFiler® support − Includes NetApp Data ONTAP 8 operating in 7-Mode  Support for NetApp Data ONTAP operating in Cluster- Mode is planned for future release of the NetApp adapter 24
  • Data ONTAP 7-Mode and Adapter Version Dependencies NetApp Adapter Version Minimum Data ONTAP* Version Supported SRM Version 1.4 NAS 7.2.2 4.x 1.4.2 SAN 7.2.4 4.x 1.4.3 (unified) 7.2.4 4.x 1.4.3 (using vFiler®) 7.3.2 4.x 2.0** (unified) 7.2.4 5.0 2.0** (using vFiler) 7.3.2 5.0 Current as of September 2011. Please check latest documentation for up-to-date support. * 7-Mode only, including version 8. Support for Cluster-Mode is planned for a future version of the NetApp® SRA. ** SRA 2.0 requires SRM 5 and cannot be used with SRM version 4. 25
  • Replication Software Support  Supported Replication Products – Volume SnapMirror® – Qtree SnapMirror  Unsupported Replication Products – SnapVault® – Failover between MetroClusterTM nodes is not supported however MetroCluster can be the source or destination for SnapMirror with SRM – Support for NetApp Data ONTAP operating in Cluster-Mode is planned for future release of the NetApp® adapter 26
  • Upgrading from SRM 4 to SRM 5  VMware® supports upgrade from SRM 4 to SRM 5 – It is not an upgrade process, but a remove- and-import process. Uninstall SRM 4, install SRM 5, use import utility to import configuration into SRM 5  In a NetApp® environment the SRM 4 adapter must be uninstalled before uninstalling SRM 4 – Otherwise later uninstall of SRM 4 adapter will fail and require manual uninstall 27
  • 28 SRM and Array Interaction
  • Test Failover with Storage Update  Test Recovery Workflow – SRM optionally requests update of replication – NetApp® SRA performs SnapMirror® update as requested – SRM requests a temporary copy of replica images – NetApp SRA creates FlexClone volumes – SRA adds LUNs to igroups or creates NFS exports 29
  • Planned Failover with Storage Update  Planned Failover Workflow − SRM requests SnapMirror® update of replication − SRM shuts down VMs at protected site − SRM requests second update of replication − SRM requests promotion of replica images − SRA breaks SnapMirror relationships, making storage writable − SRA adds LUNs to igroups or creates NFS exports − SRM recovers VMs at protected site 30
  • Reprotect for Automated Failback  Reprotect Workflow (to prepare for failback) − SRM requests reversal of replication − SRA performs SnapMirror® resync in reverse direction (which synchronizes replication) − SRM reverses roles of protected and recovery sites for affected protection groups − SRM administrator may now do planned failover to fail back to original site 31
  • 32 Best Practices and Configuration Rules
  • SRM Best Practices  Following SRM best practices means following required practices, described below, to have a successful SRM test failover – The first few tests usually fail – Follow the prescribed setup workflows – Make configuration checking part of setup before attempting test failover  Clone AD servers for DR testing – Microsoft best practice is to not replicate AD servers 33
  • Upgrading from SRM 4 to SRM 5  VMware® supports upgrade from SRM 4 to SRM 5 – It is not an upgrade process, but a remove- and-import process. Uninstall SRM 4, install SRM 5, use import utility to import configuration into SRM 5  In a NetApp® environment the SRM 4 adapter must be uninstalled before uninstalling SRM 4 – Otherwise later uninstall of SRM 4 adapter will fail and require manual uninstall 34
  • Required Practices for NetApp Adapters  Source volume must be replicated to only one destination − Volume fanout with SnapMirror® is not supported − Failover to second or further destination in a SnapMirror cascade relationship is not supported. For example: In A  B  C cascade, failover between A and B is supported, failover between A and C is not supported. 35
  • Required Practices for NetApp Adapters  MultiStore® vFiler® support requires zapi option enabled on physical controller >options vfiler.vol_clone_zapi_allow on 36
  • Required Practices for NetApp Adapters  LUNs at source must be in igroup of type “vmware” Note: RDMs use LUN type of Guest OS, igroup type of “vmware”  Adapter 1.4.x and earlier requires igroups preexist at recovery site – Don’t forget about creating igroups in destination vFiler® units  Adapter 2.0 for SRM 5 automatically creates igroups during failover and test failover  Replicated LUNs must not be preadded to igroups; SRM adds them for test and failover 37
  • Required Practices for NetApp Adapters  Exports must be in /etc/exports file − Temporary manual exports are not discovered  Exports must use values in RW security field − Exports RW to all are not discovered Discoverable: /vol/vol1 -rw=192.168.2.0/24,root=192.168.2.0/24 Not discoverable: /vol/vol1 -rw,root=192.168.2.0/24  Datastores must have VMs in them to be discovered 38
  •  Each NetApp controller or vFiler® unit is a separate array in Site Recovery Manager  A VM must have data on only one array in each site NetApp® FAS Array A VM 5 FAS HA Pair NetApp FAS Array B VM 6 NetApp FAS Array C VM 5 FAS HA Pair NetApp FAS Array D VM 6 Supported Replication Layouts Protected Site Recovery Site 39
  • NetApp® FAS Array A VM 5 FAS HA Pair NetApp FAS Array B NetApp FAS Array C FAS HA Pair NetApp FAS Array D Unsupported Replication Layouts  A VM with data on more than one array at either site cannot be protected with SRM Protected Site Recovery Site VM 5 RDM RDM 40
  • Using Qtrees with SRM  If using volume SnapMirror (VSM) with multiple qtrees exported as NFS datastores or each containing LUNs – Single qtree failover is possible but not recommended, use one recovery plan for all qtrees – Failback of one qtree in a volume with multiple qtrees is not supported as this could affect other VMs at the failback target site  Using VSM replication with volume-level export, but qtree in volume as mount point is not supported 41
  • Using Qtrees with SRM Recommendation  Use same level for replication and datastore – If using VSM, export and mount the volume or store LUN in the volume – If using QSM, export and mount the qtree or store LUN in the qtree 42
  • Multiple LUNs In One Volume  With multiple LUNs in one volume all LUNs in that volume should be failed over in the same recovery plan – Failback of one LUN in a volume with multiple LUNs is not supported as this could affect other VMs at the failback target site when the VSM relationship is reversed 43
  • Mixed iSCSI and FC Environments  Supported: Failover in either direction between sites where one site is using FC and the other site is using iSCSI is supported  Not Supported: Failover to ESX® hosts having a mix of iSCSI and FC in same cluster or recovery group is not supported by VMware® or NetApp® 44
  • 45 Configuration Workflows
  • Prerequisites and Recommendations 1. There is VMware® infrastructure at each site – vCenter™ server and ESX® servers – VMware licensing 2. Install VMware SRM application at each site – Typically installed on its own VM – Can share a database server with vCenter – Enable HTTP access between SRM servers (port 80) 3. Install SRA on the SRM server at each site 4. Supporting infrastructure at each site – Active Directory for authentication – DNS for name resolution – Create a VM placeholder datastore at each site 46
  • Configuration Workflows  Perform configuration checking as a part of the setup workflow At protected site: 1. Verify LUNs are in igroup of type “vmware” 2. Verify NFS exports have –rw security entries 3. Verify proper SnapMirror® relationships exist 47 NetApp
  • Implementation Workflows At recovery site: 1. Verify controller (or vFiler®) has igroup with OS type “vmware” (not needed for version 5) 2. Verify proper SnapMirror® relationships exist 3. Verify storage network connectivity between NetApp® storage ports and ESX® VMkernel ports (Ethernet VLANs, FC zoning, etc) 4. Provision storage for placeholder VMs 5. Create private DR testing network if required 6. Check host VM ownership if not using DRS (if not using VMware® DRS, VMs are started on the ESX host that owns the placeholder VM) NetApp vCenter 48
  • Implementation Workflows  SRM 5 has clickable workflows in the vSphere™ client interface on the SRM Getting Started tab 49  Follow the steps in order for a successful SRM setup
  • Using the NFS IP Addresses Field  When adding the NetApp® controller in the Array Manager, enter the controller NFS addresses in the NFS IP addresses  See network layout example on following slide 50
  • Using the NFS IP Addresses Field 192.168.50.50 192.168.51.50 FAS Controller Storage IPs Private Storage Network FAS Controller Admin IP 192.168.10.50 NAS Shared Storage Admin Network 51 Enter into NFS IP Addresses field
  • Volume Filtering in NetApp SRA 2.0  In SRM 5, replicated volumes that are not part of the VMware® environment may be reported with an error or warning in the SRM interface  In the above example the vmcoe volumes are not a desired part of this SRM environment 52
  • Volume Filtering in NetApp SRA 2.0  The volume filter fields on the array manager configuration screen can be used to include or exclude certain volumes from SRM discovery 53 Volumes containing the text “vmcoe” are excluded
  • SnapMirror by IP Address with SRA 2.0  If SnapMirror® relationships are created on the destination controller using source IP address as shown here: At protected site SnapMirror status shows: Source Destination State Lag Status f3170a:volsrc f3170c:voldst Source 00:05:04 Idle At recovery site SnapMirror status shows: Source Destination State Lag Status 10.72.192.75:volsrc f3170c:voldst Snapmirrored 00:09:29 Idle 54 IP address instead of host name of source controller
  • SnapMirror by IP Address with SRA 2.0  Then you must configure the use_ip_for_snapmirror_relation option in the ontap_config.txt file at each site  And configure the IP address to hostname mapping in the ip_hostname_mapping.txt file at each site as shown here: f3170a = 10.72.192.75 f3170c = 10.72.192.78 (entries are case sensitive)  Configuration files are by default at C:Program Files (x86)VMwareVMware vCenter Site Recovery ManagerstoragesraONTAP 55
  • 56 Limitations
  • Limitations  Automated Storage DRS Considerations – SRM 5 is not yet integrated with vSphereTM 5 Automated Storage DRS – If Storage DRS performs a migration of a VM from a replicated datastore to a non-replicated datastore the migrated VM will no longer be protected 57
  • Limitations  When reversing SnapMirror® relationships SRA will configure same replication schedule on new destination – However, currently, compression and tcp window size cannot be set by SRA and must be set manually after reversal if nondefault setting is required 58
  • Limitations  After reversing SnapMirror® relationship, SRA 2.0 does not remove SnapMirror Snapshot™ copies that were used for replication in the other direction – After replication reversal administrator can remove snapshots (see process in notes) – A solution is being planned for a future SRA release 59
  • Limitations  iSCSI initiators should be disabled in the ESX® recovery hosts if those hosts are also using FC and ALUA – If an FC connected ESX host has the iSCSI initiator enabled, then SRM will include both the FC and iSCSI initiators in the failover connection request – Data ONTAP® does not support adding a LUN to an iSCSI igroup and an FC ALUA-enabled igroup at the same time – This configuration is also not supported by VMware® SRM 60
  • Limitations  Non-quiesced SVMI snapshot recovery feature – Not available in SRM 5 adapter – Supported only with SRM 4 adapter 1.4.3 – Has very limited use cases – Has specific configuration requirements (See appendix of TR-3671 and notes below) 61
  • Field Resources  SE Technical Presentation on Field Portal https://fieldportal.netapp.com/viewcontent.asp?qv=1&docid=36857 – Describes NetApp capabilities, values, best practices, requirements, and limitations in a SRM environment – Contains links to matrices, docs, and articles  Customer Presentation on Field Portal https://fieldportal.netapp.com/viewcontent.asp?qv=1&docid=24728 – Sales enablement presentation covering NetApp SnapMirror integration with SRM – Contains a subset of SE deck slides 62
  • Resources  NetApp SRA Administration Guide and Release Notes in SRA package and on the NOW® site  SRM compatibility matrices for SRM, VC, ESX/ESXi www.vmware.com/support/pubs/srm_pubs.html  For SRM storage support per SRM version www.vmware.com/pdf/srm_storage_partners.pdf  VMware SRM download page www.vmware.com/go/download-srm  Supported NetApp platforms www.vmware.com/resources/compatibility: select Storage/SAN from What are you looking for box, select NetApp from Partner Name box, and click the Update button  VMware SRM Documentation Site www.vmware.com/support/pubs/srm_pubs.html 63
  • Additional Resources  NetApp TR-3671: VMware vSphere Site Recovery Manager in a NetApp Environment media.netapp.com/documents/tr-3671.pdf (SRM 4 only, work in progress for SRM 5 update)  RBAC rights for NetApp SRM version 4 Adapters https://kb.netapp.com/support/index?page=content&id=1010829  RBAC rights for NetApp SRM version 5 Adapter https://kb.netapp.com/support/index?page=content&id=1013325 64
  • 65 © 2011 NetApp, Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of NetApp, Inc. Specifications are subject to change without notice. NetApp, the NetApp logo, Go further, faster, Data ONTAP, FlexClone, FlexVol, MetroCluster, MultiStore, NOW, SnapMirror, Snapshot, SnapVault, and vFiler are trademarks or registered trademarks of NetApp, Inc. in the United States and/or other countries. VMware and ESX are registered trademarks and vCenter and vSphere are trademarks of VMware, Inc. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such.