White PaperUSING VPLEX™ METRO WITH VMWAREHIGH AVAILABILITY AND FAULTTOLERANCE FOR ULTIMATE AVAILABILITY              Abstr...
September 2012         Copyright © 2012 EMC Corporation. All Rights Reserved.         EMC believes the information in this...
Table of ContentsExecutive summary ..........................................................................................
Path loss handling semantics (PDL and APD)........................................................... 57   Cross-connect T...
Executive summaryThe EMC® VPLEX™ family removes physical barriers within, across, andbetween datacenters. VPLEX Local prov...
and/or High Availability discussing design, features, functionality andbenefits. This paper also highlights the key techni...
•   The configuration is in full compliance with VPLEX best practice       found here:       http://powerlink.emc.com/km/l...
IntroductionIncreasingly, more and more customers wish to protect their businessservices from any event imaginable that wo...
category provides the automatic (non-decision based) benefits of FT andHA, but allows them to be leveraged over distance b...
EMC VPLEX technologyVPLEX encapsulates traditional physical storage array devices and appliesthree layers of logical abstr...
VPLEX terms and GlossaryTerm                       DefinitionVPLEX Virtual              Unit of storage presented by theVo...
elements at a peer level over                        distance enabling mobility,                        availability and c...
EMC VPLEX architectureEMC VPLEX represents the next-generation architecture for data mobilityand information access. The n...
EMC VPLEX Metro overviewVPLEX Metro brings mobility and access across two locations separated byan inter-site round trip t...
Understanding VPLEX Metro active/active distributed volumesUnlike traditional legacy replication where access to a replica...
required to be in a cluster or utilize a cluster file system so they caneffectively coordinate locking to ensure the volum...
can be treated just like any other volume, the only difference being it isnow distributed and available in two locations a...
VPLEX Witness – An introductionAs mentioned previously, VPLEX Metro goes beyond the realms of legacyactive/passive replica...
Preference     VPLEX CLUSTER PARTITION         SITE A FAILS               SITE B FAILS   Rule / scenario        Site A    ...
Figure 7 below shows the high level topology of VPLEX Witness               Figure 7 VPLEX configured for VPLEX WitnessAs ...
Figure 8 below shows a more detailed connectivity diagram of VPLEXWitness                                                 ...
For more in depth VPLEX Witness architecture details please refer to theVPLEX HA Techbook that can be found here:http://ww...
Note: At the time of writing, the FT configuration on VPLEX Witness is onlywithin one location and not a stretched / feder...
VPLEX Metro HAAs discussed in the two previous sections, VPLEX Metro is able to provideactive/active distributed storage, ...
A                           A   A                                        OPTIONAL                      A                  ...
Unique VPLEX benefits for availability and I/O responsetimeVPLEX is built from the ground up to perform block storage dist...
generally be connected to both directors in a HA configuration so if onefailed the other one would continue to process I/O...
Fabric A – Stretched via ISL                                                                                              ...
The steps below correspond to the numbers in the diagram.   1. I/O is generated by the host at site A and sent to the acti...
1       6                                                                           Fabric A – Stretched via ISL          ...
at the active controller site due to the need to mirror the disk and cacheas well as send the I/O in the first place acros...
Front End           Front End                                                Front End           Front End                ...
4                                                                                                                     1   ...
numbers are additional overhead when compared to a local storagesystem of the same hardware, since I/O now has to be sent ...
SITE A                   Site B     WAN bandwidth used for a 128KB IO                  read        write      read        ...
Paths in standby                          Front End           Front End                                                   ...
Front End           Front End                                                       Front End           Front End         ...
site A over the cross-connect, thereby turning the standby path into anactive path.In summary, VPLEX can use ‘forced unifo...
Combining VPLEX HA with VMware HA and/or FTDue to its core design, EMC VPLEX Metro provides the perfect foundationfor VMwa...
http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf for additional information around:...
Note: A design consideration to take into account if DRS is desired within asolution is to ensure that there are enough co...
Failure scenarios and recovery using federated HAThis section addresses all of the different types of failures and shows h...
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability
Upcoming SlideShare
Loading in...5
×

White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability

3,303

Published on

This white paper discusses using best of breed technologies from VMware and EMC to create federated continuous availability solutions. The following topics are reviewed: choosing between federated FT or federated HA, design considerations and constraints, and operational best practice.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
3,303
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
113
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "White Paper: Using VPLEX Metro with VMware High Availability and Fault Tolerance for Ultimate Availability "

  1. 1. White PaperUSING VPLEX™ METRO WITH VMWAREHIGH AVAILABILITY AND FAULTTOLERANCE FOR ULTIMATE AVAILABILITY Abstract This white paper discusses using best of breed technologies from VMware® and EMC® to create federated continuous availability solutions. The following topics are reviewed  Choosing between federated Fault Tolerance or federated High Availability  Design considerations and constraints  Operational Best Practice
  2. 2. September 2012 Copyright © 2012 EMC Corporation. All Rights Reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. The information in this publication is provided “as is.” EMC Corporation makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com.USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 2VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  3. 3. Table of ContentsExecutive summary ............................................................................................. 5 Audience ......................................................................................................................... 6 Document scope and limitations................................................................................. 6Introduction .......................................................................................................... 8EMC VPLEX technology ..................................................................................... 10 VPLEX terms and Glossary ........................................................................................... 11 EMC VPLEX architecture.............................................................................................. 13 EMC VPLEX Metro overview ........................................................................................ 14 Understanding VPLEX Metro active/active distributed volumes ........................... 15 VPLEX Witness – An introduction................................................................................. 18 Protecting VPLEX Witness using VMware FT .............................................................. 22 VPLEX Metro HA ............................................................................................................ 24 VPLEX Metro cross cluster connect ............................................................................ 24Unique VPLEX benefits for availability and I/O response time ...................... 26 Uniform and non-uniform I/O access ........................................................................ 26 Uniform access (non-VPLEX) ....................................................................................... 26 Non-Uniform Access (VPLEX IO access pattern)...................................................... 31 VPLEX with cross-connect and non-uniform mode ................................................. 35 VPLEX with cross-connect and forced uniform mode ............................................ 36Combining VPLEX HA with VMware HA and/or FT .......................................... 39 vSphere HA and VPLEX Metro HA (federated HA) .................................................. 39 Use Cases for federated HA ....................................................................................... 40 Datacenter pooling using DRS with federated HA.................................................. 40 Avoiding downtime and disasters using federated HA and vMotion .................. 41 Failure scenarios and recovery using federated HA ............................................... 42 vSphere FT and VPLEX Metro (federated FT) ............................................................ 45 Use cases for a federated FT solution ........................................................................ 45 Failure scenarios and recovery using federated FT ................................................. 46 Choosing between federated availability or disaster recovery (or both) ........... 49 Augmenting DR with federated HA and/or FT ......................................................... 51 Environments where federated HA and/or FT should not replace DR ................. 52Best Practices and considerations when combining VPLEX HA with VMwareHA and/or FT....................................................................................................... 54 VMware HA and FT best practice requirements ...................................................... 55 Networking principles and pre-requisites .................................................................. 55 vCenter placement options ....................................................................................... 56 USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 3 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  4. 4. Path loss handling semantics (PDL and APD)........................................................... 57 Cross-connect Topologies and Failure Scenarios. ................................................... 58 Cross-connect and multipathing ............................................................................... 60 VPLEX site preference rules ......................................................................................... 60 DRS and site affinity rules ............................................................................................. 61 Additional best practices and considerations for VMware FT ............................... 61 Secondary VM placement considerations............................................................... 62 DRS affinity and cluster node count. ......................................................................... 63 VPLEX preference rule considerations for FT............................................................. 64 Other generic recommendations for FT .................................................................... 64Conclusion ......................................................................................................... 66References ......................................................................................................... 67Appendix A - vMotioning over longer distances (10ms) .............................. 69 USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 4 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  5. 5. Executive summaryThe EMC® VPLEX™ family removes physical barriers within, across, andbetween datacenters. VPLEX Local provides simplified management andnon-disruptive data mobility for heterogeneous arrays. VPLEX Metro andGeo provide data access and mobility between two VPLEX clusters withinsynchronous and asynchronous distances respectively. With a uniquescale-out architecture, VPLEX’s advanced data caching and distributedcache coherency provide workload resiliency, automatic sharing,balancing and failover of storage domains, and enable both local andremote data access with predictable service levels.VMware vSphere makes it simpler and less expensive to provide higherlevels of availability for important applications. With vSphere, organizationscan easily increase the baseline level of availability provided for allapplications, as well as provide higher levels of availability more easily andcost-effectively. vSphere makes it possible to reduce both planned andunplanned downtime. The revolutionary VMware vMotion™ (vMotion)capabilities in vSphere make it possible to perform planned maintenancewith zero application downtime.VMware High Availability (HA), a feature of vSphere, reduces unplanneddowntime by leveraging multiple VMware ESX® and VMware ESXi™ hostsconfigured as a cluster, to provide automatic recovery from outages aswell as cost-effective high availability for applications running in virtualmachines.VMware Fault Tolerance (FT) leverages the well-known encapsulationproperties of virtualization by building fault tolerance directly into the ESXihypervisor in order to deliver hardware style fault tolerance to virtualmachines. Guest operating systems and applications do not requiremodifications or reconfiguration. In fact, they remain unaware of theprotection transparently delivered by ESXi and the underlying architecture.By leveraging distance, VPLEX Metro builds on the strengths of VMware FTand HA to provide solutions that go beyond traditional “DisasterRecovery”. These solutions provide a new type of deployment whichachieves the absolute highest levels of continuous availability overdistance for today’s enterprise storage and cloud environments. Whenusing such technologies, it is now possible to provide a solution that hasboth zero Recovery Point Objective (RPO) with zero "storage" RecoveryTime Objective (RTO) (and zero "application" RTO when using VMware FT).This white paper is designed to give technology decision-makers a deeperunderstanding of VPLEX Metro in conjunction with VMware Fault Tolerance USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 5 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  6. 6. and/or High Availability discussing design, features, functionality andbenefits. This paper also highlights the key technical considerations forimplementing VMware Fault Tolerance and/or High Availability with VPLEXMetro technology to achieve "Federated Availability" over distance.AudienceThis white paper is intended for technology architects, storageadministrators and EMC professional services partners who are responsiblefor architecting, creating, managing and using IT environments that utilizeEMC VPLEX and VMware Fault Tolerance and/or High Availabilitytechnologies (FT and HA respectively). The white paper assumes that thereader is familiar with EMC VPLEX and VMware technologies andconcepts.Document scope and limitationsThis document applies to EMC VPLEX Metro configured with VPLEX Witness.The details provided in this white paper are based on the followingconfigurations: • VPLEX Geosynchrony 5.1 (patch 2) or higher • VPLEX Metro HA only (Local and Geo are not supported with FT or HA in a stretched configuration) • VPLEX Clusters are within 5 milliseconds (ms) of each other for VMware HA • Cross-connected configurations can be optionally deployed for VMware HA solutions (not mandatory). • For VMware FT configurations VPLEX cross cluster connect is in place (mandatory requirement). • VPLEX Clusters are within 5 millisecond (ms) round trip time (RTT) of each other for VMware HA • VPLEX Clusters are within 1 millisecond (ms) round trip time (RTT) of each other for VMware FT • VPLEX Witness is deployed to a third failure domain (Mandatory). The Witness functionality is required for “VPLEX Metro” to become a true active/active continuously available storage cluster. • ESXi and vSphere 5.0 Update 1 or later are used • Any qualified pair of arrays (both EMC and non-EMC) listed on the EMC Simple Support Matrix (ESSM) found here: https://elabnavigator.emc.com/vault/pdf/EMC_VPLEX.pdf USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 6 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  7. 7. • The configuration is in full compliance with VPLEX best practice found here: http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Tech nical_Documentation/h7139-implementation-planning-vplex-tn.pdfPlease consult with your local EMC Support representative if you areuncertain as to the applicability of these requirements.Note: While out of scope for this document, it should be noted that inaddition to all best practices within this paper, that all federated FT and HAsolutions will carry the same best practices and limitations imposed by theVMware HA and FT technologies too. For instance at the time of writingVMware FT technology is only capable of supporting a single vCPU per VM(VMware HA does not carry the same vCPU limitation) and this limitationwill prevail when federating a VMware FT cluster. Please ensure to reviewthe VMware best practice documentation as well as the limitations andconsiderations documentation (please see the References section) forfurther information. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 7 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  8. 8. IntroductionIncreasingly, more and more customers wish to protect their businessservices from any event imaginable that would lead to downtime.Previously (i.e. prior to VPLEX) solutions to prevent downtime fell into twocamps: 1. Highly available and fault tolerant systems within a datacenter 2. Disaster recovery solutions outside of a datacenter.The benefit of FT and HA solutions are that they provide automaticrecovery in the event of a failure. However, the geographical protectionrange is limited to a single datacenter therefore not protecting businessservices from a datacenter failure.On the other hand, disaster recovery solutions typically protect businessservices using geographic dispersion so that if a datacenter fails, recoverywould be achieved using another datacenter in a separate fault domainfrom the primary. Some of the drawbacks with a disaster recoverysolutions, however, are that they are human decision based (i.e. notautomatic) and typically require a 2nd disruptive failback once the primarysite is repaired. In other words, should a primary datacenter fail thebusiness would need to make a non-trivial decision to invoke disasterrecovery.Since disaster recovery is decision-based (i.e. manually invoked), it canlead to extended outages since the very decision itself takes time, and thisis generally made at the business level involving key stakeholders. As mostsite outages are caused by recoverable events (e.g. an elongated poweroutage), faced with the “Invoke DR” decision some businesses choose notto invoke DR and to ride through the outage instead. This means thatcritical business IT services remain offline for the duration of the event.These types of scenarios are not uncommon in these "disaster" situationsand non-invocation can be for various reasons. The two biggest ones are: 1. The primary site that failed can be recovered within 24-48 hours therefore not warranting the complexity and risk of invoking DR. 2. Invoking DR will require a “failback” at some point in the future which in turn will bring more disruption.Other potential concerns to invoking disaster recovery include complexity,lack of testing, lack of resources, lack of skill sets and lengthy recoverytime.To avoid such pitfalls, VPLEX and VMware offer a more comprehensiveanswer to safeguarding your environments. By combining the benefits ofHA and FT, a new category of availability is created. This new type of USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 8 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  9. 9. category provides the automatic (non-decision based) benefits of FT andHA, but allows them to be leveraged over distance by using VPLEX Metro.This brings the geographical distance benefits normally associated withdisaster recovery to the table enhancing the HA and FT propositionssignificantly.The new category is known as “Federated Availability” and enables bulletproof availability which in turn significantly lessens the chance of downtimefor both planned and unplanned events. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 9 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  10. 10. EMC VPLEX technologyVPLEX encapsulates traditional physical storage array devices and appliesthree layers of logical abstraction to them. The logical relationships of eachlayer are shown in Figure 1.Extents are the mechanism VPLEX uses to divide storage volumes. Extentsmay be all or part of the underlying storage volume. EMC VPLEXaggregates extents and applies RAID protection in the device layer.Devices are constructed using one or more extents and can be combinedinto more complex RAID schemes and device structures as desired. At thetop layer of the VPLEX storage structures are virtual volumes. Virtualvolumes are created from devices and inherit the size of the underlyingdevice. Virtual volumes are the elements VPLEX exposes to hosts using itsFront End (FE) ports. Access to virtual volumes is controlled using storageviews. Storage views are comparable to Auto-provisioning Groups on EMCSymmetrix® or to storage groups on EMC VNX®. They act as logicalcontainers determining host initiator access to VPLEX FE ports and virtualvolumes. Figure 1 EMC VPLEX Logical Storage Structures USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 10 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  11. 11. VPLEX terms and GlossaryTerm DefinitionVPLEX Virtual Unit of storage presented by theVolume VPLEX front-end ports to hostsVPLEX Distributed A single unit of storage presented byVolume the VPLEX front-end ports of both VPLEX clusters in a VPLEX Metro configuration separated by distanceVPLEX Director The central processing and intelligence of the VPLEX solution. There are redundant (A and B) directors in each VPLEX EngineVPLEX Engine Consists of two directors and is the unit of scale for the VPLEX solutionVPLEX cluster A collection of VPLEX engines in one rack.VPLEX Metro The cooperation of two VPLEX clusters, each serving their own storage domain over synchronous distance forming active/active distributed volume(s)VPLEX Metro HA As per VPLEX Metro, but configured with VPLEX Witness to provide fully automatic recovery from the loss of any failure domain. This can also be thought of as an active/active continuously available storage cluster over distance.Access Anywhere The term used to describe a distributed volume using VPLEX Metro which has active/active characteristicsFederation The cooperation of storage USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 11 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  12. 12. elements at a peer level over distance enabling mobility, availability and collaborationAutomatic No human intervention whatsoever (e.g. HA and FT)Automated No human intervention required once a decision has been made (e.g. disaster recovery with VMwares SRM technology) USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 12 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  13. 13. EMC VPLEX architectureEMC VPLEX represents the next-generation architecture for data mobilityand information access. The new architecture is based on EMC’s morethan 20 years of expertise in designing, implementing, and perfectingenterprise-class intelligent cache and distributed data protection solutions.As shown in Figure 2, VPLEX is a solution for vitalizing and federating bothEMC and non-EMC storage systems together. VPLEX resides betweenservers and heterogeneous storage assets (abstracting the storagesubsystem from the host) and introduces a new architecture with theseunique characteristics: • Scale-out clustering hardware, which lets customers start small and grow big with predictable service levels • Advanced data caching, which utilizes large-scale SDRAM cache to improve performance and reduce I/O latency and array contention • Distributed cache coherence for automatic sharing, balancing, and failover of I/O across the cluster • A consistent view of one or more LUNs across VPLEX clusters separated either by a few feet within a datacenter or across synchronous distances, enabling new models of high availability and workload relocation Physical Host Layer A A A A A A Virtual Storage Layer (VPLEX) A A Physical Storage Layer Figure 2 Capability of an EMC VPLEX local system to abstract Heterogeneous Storage USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 13 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  14. 14. EMC VPLEX Metro overviewVPLEX Metro brings mobility and access across two locations separated byan inter-site round trip time of up to 5 milliseconds (host applicationpermitting). VPLEX Metro uses two VPLEX clusters (one at each location)and includes the unique capability to support synchronous distributedvolumes that mirror data between the two clusters using write-throughcaching.Since a VPLEX Metro Distributed volume is under the control of the VPLEXMetro advanced cache coherency algorithms, active data I/O access tothe distributed volume is possible at either VPLEX cluster. VPLEX Metrotherefore is a truly active/active solution which goes far beyond traditionalactive/passive legacy replication solutions.VPLEX Metro distributes the same block volume to more than one locationand ensures standard HA cluster environments (e.g. VMware HA and FT)can simply leverage this capability and therefore can be easily andtransparently deployed and over distance too.The key to this is to make the host cluster believe there is no distancebetween the nodes so they behave identically as they would in a singledata center. This is known as “dissolving distance” and is a key deliverableof VPLEX Metro.The other piece to delivering truly active/active FT or HA environments is anactive/active network topology whereby the Layer 2 of the same networkresides in each location giving truly seamless datacenter pooling. Whilstlayer 2 network stretching is a pre-requisite for any FT or HA solution basedon VPLEX Metro, it is outside of the scope of this document. Going forwardthroughout this document it is assumed that there is a stretched layer 2network between datacenters where a VPLEX Metro resides.Note: Please see further information on Cisco Overlay TransportVirtualization (OTV) found herehttp://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/whitepaper/DCI_1.html and Brocade Virtual Private LAN Service(VPLS)found herehttp://www.brocade.com/downloads/documents/white_papers/Offering_Scalable_Layer2_Services_with_VPLS_and_VLL.pdf technology forstretching a layer 2 network over distance. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 14 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  15. 15. Understanding VPLEX Metro active/active distributed volumesUnlike traditional legacy replication where access to a replicated volumeis either in one location or another (i.e. an active/passive only paradigm)VPLEX distributes a virtual device over distance which ultimately meanshost access is now possible in more than one location to the same(distributed) volume.In engineering terms the distributed volumes that is presented from VPLEXMetro is said to have “single disk semantics” meaning that in every way(including failure) the disk will behave as one object as any traditionalblock device would. This therefore means that all the rules associated witha single disk are fully applicable to a VPLEX Metro distributed volume.For instance, the following figure shows a single host accessing a singleJBOD type volume: Datacenter Figure 3 Single host access to a single diskClearly the host in the diagram is the only host initiator accessing the singlevolume.The next figure shows a local two node cluster. Cluster of hosts coordinate for access Datacenter Figure 4 Multiple host access to a single diskAs shown in the diagram there are now two hosts contending for the singlevolume. The dashed orange rectangle shows that each of the nodes is USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 15 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  16. 16. required to be in a cluster or utilize a cluster file system so they caneffectively coordinate locking to ensure the volume remains consistent.The next figure shows the same two node cluster but now connected to aVPLEX distributed volume using VPLEX cache coherency technology. Cluster of hosts coordinate for access VPLEX AccessAnywhere™ Datacenter Datacenter Figure 5 Multiple host access to a VPLEX distributed volumeIn this example there is no difference to the fundamental dynamics of thetwo node cluster access pattern to the single volume. Additionally as far asthe hosts are concerned they cannot see any different between this andthe previous example since VPLEX is distributing the device betweendatacenters via AccessAnywhere™ (which is a type of federation).This means that the hosts are still required to coordinate locking to ensurethe volume remains consistent.For ESXi this mechanism is controlled by the cluster file system VirtualMachine File System (VMFS) within each datastore. In this case eachdistributed volume will be imported into VPLEX and formatted with theVMFS file system.The figure below shows a high-level physical topology of a VPLEX Metrodistributed device. A A A A A A SITE A SITE B AccessAnywhere™ A LINK A Figure 6 Multiple host access to a VPLEX distributed volumeThis figure is a physical representation of the logical configuration shown inFigure 5. Effectively, with this topology deployed, the distributed volume USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 16 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  17. 17. can be treated just like any other volume, the only difference being it isnow distributed and available in two locations at the same time.Another benefit of this type of architecture is “extreme simplicity” since it isno more difficult to configure a cluster across distance that it is in a singledata center.Note: VPLEX Metro can use either 8GB FC or native 10GB Ethernet WANconnectivity (Where the word link is written). When using FC connectivitythis can be configured with either a dedicated channel (i.e. separate nonmerged fabrics) or ISL based (i.e. where fabrics have been merged acrosssites). It is assumed that any WAN link will have a second physicallyredundant circuit.Note: It is vital that VPLEX Metro has enough bandwidth between clustersto meet requirements. EMC can assist in the qualification of this throughthe Business Continuity Solution Designer (BCSD) tool. Please engage yourEMC account team to perform a sizing exercise.For further details on VPLEX Metro architecture, please see the VPLEX HA Techbookfound here: http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdf USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 17 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  18. 18. VPLEX Witness – An introductionAs mentioned previously, VPLEX Metro goes beyond the realms of legacyactive/passive replication technologies since it can deliver trueactive/active storage over distance as well as federated availability.There are three main items that are required to deliver true "FederatedAvailability". 1. True active/active fibre channel block storage over distance. 2. Synchronous mirroring to ensure both locations are in lock step with each other from a data perspective. 3. External arbitration to ensure that under all failure conditions automatic recovery is possible.In the previous sections we have discussed 1 and 2, but now we will look atexternal arbitration which is enabled by VPLEX Witness.VPLEX Witness is delivered as a zero cost VMware Virtual Appliance (vApp)which runs on a customer supplied ESXi server. The ESXi server resides in aphysically separate failure domain to either VPLEX cluster and usesdifferent storage to the VPLEX cluster.Using VPLEX Witness ensures that true Federated Availability can bedelivered. This means that regardless of site or link/WAN failure a copy ofthe data will automatically remain online in at least one of the locations.When setting up a single or a group of distributed volumes the user willchoose a “preference rule” which is a special property that eachindividual or group of distributed volumes has. It is the preference rule thatdetermines the outcome after failure conditions such as site failure or linkpartition. The preference rule can either be set to cluster A preferred,cluster B preferred or no automatic winner.At a high level this has the following effect to a single or group ofdistributed volumes under different failure conditions as listed below: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 18 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  19. 19. Preference VPLEX CLUSTER PARTITION SITE A FAILS SITE B FAILS Rule / scenario Site A Site B Site A Site B Site A Site BCluster A ONLINE SUSPENDED FAILED SUSPENDED ONLINE FAILEDPreferred GOOD BAD (by design) GOODCluster B SUSPENDED ONLINE FAILED ONLINE SUSPENDED FAILEDpreferred GOOD GOOD BAD (by design) Noautomatic SUSPENDED (by design) SUSPENDED (by design) SUSPENDED (by design) winner Table 1 Failure scenarios without VPLEX Witness As we can see in Table 1(above) if we only used the preference rules without VPLEX Witness then under some scenarios manual intervention would be required to bring the volume online at a given VPLEX cluster(e.g. if site A is the preferred site, and site A fails, site B would also suspend). This is where VPLEX Witness assists since it can better diagnose failures due to the network triangulation, and ensures that at any time at least one of the VPLEX clusters has an active path to the data as shown in the table below:Preference VPLEX CLUSTER PARTITION SITE A FAILS SITE B FAILS Rule Site A Site B Site A Site B Site A Site BCluster A ONLINE SUSPENDED FAILED ONLINE ONLINE FAILEDPreferred GOOD GOOD GOODCluster B SUSPENDED ONLINE FAILED ONLINE ONLINE FAILEDpreferred GOOD GOOD GOOD Noautomatic SUSPENDED (by design) SUSPENDED (by design) SUSPENDED (by design) winner Table 2 Failure scenarios with VPLEX Witness As one can see from Table 2 VPLEX Witness converts a VPLEX Metro from an active/active mobility and collaboration solution into an active/active continuously available storage cluster. Furthermore once VPLEX Witness is deployed, failure scenarios become self-managing (i.e. fully automatic) which makes it extremely simple since there is nothing to do regardless of failure condition! USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 19 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  20. 20. Figure 7 below shows the high level topology of VPLEX Witness Figure 7 VPLEX configured for VPLEX WitnessAs depicted in Figure 7 we can see that the Witness VM is deployed in aseparate fault domain (as defined by the customer) and connected intoboth VPLEX management stations via an IP network.Note: Fault domain is decided by the customer and can range fromdifferent racks in the same datacenter all the way up to VPLEX clusters 5msof distance away from each other (5ms measured round trip time latencyor typical synchronous distance). The distance that VPLEX witness can beplaced from the two VPLEX clusters can be even further. The currentsupported maximum round trip latency for this is 1 second. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 20 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  21. 21. Figure 8 below shows a more detailed connectivity diagram of VPLEXWitness IMPORTANT / REQUIREMENT! SEPARATE FAULT DOMAIN! Figure 8 Detailed VPLEX Witness network layoutThe witness network is physically separate from the VPLEX inter-clusternetwork and also uses storage that is physically separate from either VPLEXcluster. As stated previously, it is critical to deploy VPLEX Witness into a thirdfailure domain. The definition of this domain changes depending on wherethe VPLEX clusters are deployed. For instance if the VPLEX Metro clustersare to be deployed into the same physical building but perhaps differentareas of the datacenter, then the failure domain here would be deemedthe VPLEX rack itself. Therefore VPLEX Witness could also be deployed intothe same physical building but in a separate rack.If, however, each VPLEX cluster was deployed 50 miles apart in totallydifferent buildings then the failure domain here would be the physicalbuilding and/or town. Therefore in this scenario it would makes sense todeploy VPLEX Witness in another town altogether; and since the maximumround trip latency can be as much as one second then you couldeffectively pick any city in the world, especially given the bandwidthrequirement is as low as 3Kb/sec. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 21 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  22. 22. For more in depth VPLEX Witness architecture details please refer to theVPLEX HA Techbook that can be found here:http://www.emc.com/collateral/hardware/technical-documentation/h7113-vplex-architecture-deployment.pdfNote: Always deploy VPLEX Witness in a 3rd failure domain and ensure thatall distributed volumes reside in a consistency group with the witnessfunction enabled. Also ensure that EMC Secure Remote Support (ESRS)Gateway is fully configured and the witness has the capability to alert if itfor whatever reason fails (no impact to I/O if witness fails).Protecting VPLEX Witness using VMware FTUnder normal operational conditions VPLEX Witness is not a vitalcomponent that is required to drive active/active I/O (i.e. if the Witness isdisconnected or lost, I/O still continues).It does however become a crucialcomponent to ensure availability in the event of site loss at either of thelocations where the VPLEX clusters reside.If, for whatever reason, the VPLEX Witness was lost and soon after therewas a catastrophic site failure at a site containing a VPLEX cluster then thehosts at the remaining site would also lose access to the remaining VPLEXvolumes since the remaining VPLEX would think it was isolated as the VPLEXWitness is also unavailable.To minimize this risk, it is considered best practice to disable the VPLEXWitness function if it has been lost and will remain offline for a long time.Another way to ensure availability is to minimize the risk of a VPLEX Witnessloss in the first place by increasing the availability of the VPLEX Witness VMrunning in the third location.A way to significantly boost availability for this individual VM is to useVMware FT to protect VPLEX Witness at the third location. This ensures thatthe VPLEX Witness remains unaffected at the third failure domain should ahardware failure occur to the ESXi server in the third failure domain that issupporting the VPLEX Witness VM.To deploy this functionality, simply enable ESXi HA clustering for the VPLEXWitness VM across two or more ESXi hosts (in the same location), and oncethis has been configured right click the VPLEX Witness VM and enable faulttolerance. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 22 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  23. 23. Note: At the time of writing, the FT configuration on VPLEX Witness is onlywithin one location and not a stretched / federated FT configuration. Thestorage that the VPLEX Witness uses should be physically contained withinthe boundaries of the third failure domain on local (i.e. not VPLEX Metrodistributed) volumes. Additionally it should be noted that currently HAalone is not supported, only FT or unprotected. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 23 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  24. 24. VPLEX Metro HAAs discussed in the two previous sections, VPLEX Metro is able to provideactive/active distributed storage, however we have seen that in somecases depending on failure, loss of access to the storage volume couldoccur if the preferred site fails for some reason causing the non-preferredsite to suspend too. Using VPLEX Witness overcomes this scenario andensures that access to a VPLEX cluster is always maintained regardless ofwhich site fails.VPLEX Metro HA describes a VPLEX Metro solution that has also beendeployed with VPLEX Witness. As the name suggests VPLEX Metro HAeffectively delivers truly available distributed Storage volumes overdistance and forms a solid foundation for additional layers of VMwaretechnology such as HA and FT.Note: It is assumed that all topologies discussed within this white paper useVPLEX Metro HA (i.e. use VPLEX Metro and VPLEX Witness). This ismandatory to ensure fully automatic (i.e. decision less) recovery under allthe failure conditions outlined within this document.VPLEX Metro cross cluster connectAnother important feature of VPLEX Metro that can be optionallydeployed within a campus topology (i.e. up to 1ms) is cross clusterconnect.Note: At the time of writing cross-connect is a mandatory requirement forVMware FT implementations.This feature pushes VPLEX HA into an even greater level of availability thanbefore since now an entire VPLEX cluster failure at a single location wouldnot cause an interruption to host I/O at either location (using eitherVMware FT or HA)Figure 9 below shows the topology of a cross-connected configuration: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 24 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  25. 25. A A A OPTIONAL A A A X – CONNECT SITE A SITE B AccessAnywhere™ A LINK A VPLEX WITNESS IP IP Figure 9 VPLEX Metro deployment with cross-connectAs we can see in the diagram the cross-connect offers an alternate path or pathsfrom each ESXi server to the remote VPLEX.This ensures that if for any reason an entire VPLEX cluster were to fail (whichis unlikely since there is no single-point-of-failure) there would be nointerruption to I/O since the remaining VPLEX cluster will continue to serviceI/O across the remote cross link (alternate path)It is recommended when deploying cross-connect that rather thanmerging fabrics and using an Inter Switch Link (ISL), additional host busadapters (HBAs) should be used to connect directly to the remote datacenters switch fabric. This ensures that fabrics do not merge and spanfailure domains.Another important note to remember for cross-connect is that it is onlysupported for campus environments up to 1ms round trip time.Note: When setting up cross-connect, each ESXi server will see double thepaths to the datastore (50% local and 50% remote). It is best practice toensure that the pathing policy is set to fixed and mark the remote pathsacross to the other cluster as passive. This ensures that the workloadremains balanced and only committing to a single cluster at any one time. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 25 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  26. 26. Unique VPLEX benefits for availability and I/O responsetimeVPLEX is built from the ground up to perform block storage distribution overlong distances at enterprise scale and performance. One of the uniquecore principles of VPLEX that enables this, is its underlying and extremelyefficient cache coherency algorithms which enable an active/activetopology without compromise.Since VPLEX is architecturally unique from other virtual storage products,two simple categories are used to easily distinguish between thearchitectures.Uniform and non-uniform I/O accessEssentially these two categories are a way to describe the I/O accesspattern from the host to the storage system when using a stretched ordistributed cluster configuration. VPLEX Metro (under normal conditions)follows what is known technically as a non-uniform access pattern,whereas other products that function differently from VPLEX follow what isknown as a uniform I/O access pattern. On the surface, both types oftopology seem to deliver active/active storage over distance, however atthe simplest level it is only the non-uniform category that delivers trueactive/active within the non-uniform category which carries somesignificant benefits over uniform type solutions.The terms are defined as follows: 1. Uniform access All I/O is serviced by the same single storage controller therefore all I/O is sent to or received from the same location, hence the term "uniform". Typically this involves "stretching" dual controller active/passive architectures. 2. Non Uniform access I/O can be serviced by any available storage controller at any given location; therefore I/O can be sent to or received from any storage target location, hence the term "non-uniform". This is derived from "distributing" multiple active controllers/directors in each location.To understand this in greater detail and to quantify the benefits of non-uniformaccess we must first understand uniform access.Uniform access (non-VPLEX)Uniform Access works in a very similar way to a dual controller array thatuses an active/passive storage controller. With such an array a host would USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 26 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  27. 27. generally be connected to both directors in a HA configuration so if onefailed the other one would continue to process I/O. However since thesecondary storage controller is passive, no write or read I/O can bepropagated to it or from it under normal operations since it remainspassive. The other thing to understand here is that these types ofarchitectures typically use cache mirroring whereby any write I/O to theprimary controller/director is synchronously mirrored to the secondarycontroller for redundancy.Next imagine taking a dual controller active/passive array and physicallysplitting the nodes/controllers apart therefore stretching it over distance sothat the active controller/node resides in site A and the secondarycontroller/node resides in site B.The first thing to note here is that we now only have a single controller ateither location so we have already compromised the local HA ability ofthe solution since each location now has a single point of failure.The next challenge here is to maintain host access to both controllers fromeither location.Lets suppose we have an ESXi server in site A and a second one in site B. Ifthe only active storage controller resides at A, then we need to ensure thathosts in both site A and site B have access to the storage controller in site A(uniform access). This is important since if we want to run a host workloadat site B we will need an active path to connect it back to the activedirector in site A since the controller at site B is passive. This may behandled by a standard FC ISL which stretches the fabric across sites.Additionally we will also require a physical path from the ESXi hosts in site Ato the passive controller at site B. The reason for this is just in case there is acontroller failure at site A, the controller at site B should be able to serviceI/O.As discussed in the previous section this type of configuration is known as"Uniform Access" since all I/O will be serviced uniformly by the exact samecontroller for any given storage volume, passing all I/O to and from thesame location. The diagram in Figure 10 below shows a typical example ofa uniform architecture. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 27 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  28. 28. Fabric A – Stretched via ISL Fabric B Stretched via ISL A A Front End A SPLIT CONTROLLERS Front End Single Controller Single Controller A A A A Communication Communication (Active) (Passive) Proprietary Cache Cache SITE A (Mirrored) or Dedicated (Mirrored) SITE B A Backend ISL Backend A A A (Mirrored) (Passive) A Figure 10 A typical non-uniform layoutAs we can see in the above diagram, hosts at each site connect to bothcontrollers by way of the stretched fabric; however the active controller(for any given LUN) is only at one of the sites (in this case site A).While not as efficient (bandwidth and latency) as VPLEX, under normaloperating conditions (i.e. where the active host is at the same location asthe active controller) this type of configuration functions satisfactorily,however this type of access pattern starts to become sub-optimal if theactive host is propagating I/O at the same location where the passivecontroller resides.Figure 11 shows the numbered sequence of I/O flow for a host connectedto a uniform configuration at the local (i.e. active) site. 5 Fabric A – Stretched via ISL Fabric B Stretched via ISL A 1 Front End A A Front End Single Controller Single Controller A A A A SPLIT CONTROLLERS Communication Communication (Active) (Passive) Cache Cache SITE A (Mirrored) All cache mirrored synchronously (Mirrored) SITE B A Backend 2 Backend A A A (Mirrored) (Passive) A 3 4 4 Figure 11 Uniform write I/O Flow example at local site USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 28 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  29. 29. The steps below correspond to the numbers in the diagram. 1. I/O is generated by the host at site A and sent to the active controller in site A. 2. The I/O is committed to local cache, and synchronously mirrored to remote cache over the WAN. 3. The local/active controller’s backend now mirrors the I/O to the back end disks. It does this by committing a copy to the local array as well as sending another copy of the I/O across the WAN to the remote array. 4. The acknowledgment from back end disk returns to the owning storage controller. 5. Acknowledgement is received by the host and the I/O is complete.Now, lets look at a write I/O initiated from the ESXi host at location B wherethe controller for the LUN receiving I/O resides at site A.The concern here is that each write at the passive site B will have totraverse the link and be acknowledged back to site A. Before theacknowledgement can be given back to the host at site B from thecontroller at site A, the storage system has to synchronously mirror the I/Oback to the controller in site B (both cache and disk), thereby incurringmore round trips of the WAN. This ultimately increases the response time(i.e. negatively impacts performance) and bandwidth utilization.The numbered sequence in Figure 12 shows a typical I/O flow of a hostconnected to a uniform configuration at the remote (i.e. passive) site. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 29 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  30. 30. 1 6 Fabric A – Stretched via ISL Fabric B Stretched via ISL A 2 Front End A A Front End Single Controller Single Controller A A A A SPLIT CONTROLLERS Communication Communication (Active) (Passive) Cache Cache SITE A (Mirrored) All cache mirrored synchronously (Mirrored) SITE B A Backend 3 Backend A A A (Mirrored) (Passive) A 4 5 5 Figure 12 Uniform write I/O flow example at remote siteThe following steps correspond to the numbers in the diagram. 1. I/O is generated by the host at site B and sent across the ISL to the active controller at site A. 2. The I/O is received at the controller at site A from the ISL 3. The I/O is committed to local cache, and mirrored to remote cache over the WAN and acknowledged back to the active controller in site A. 4. The active controllers’ back end now mirrors the I/O to the back end disks at both locations. It does this by committing a copy to the local array as well as sending another copy of the I/O across the WAN to the remote array (this step may sometimes be asynchronous). 5. Both write acknowledgments are sent back to the active controller (back across the ISL) 6. Acknowledgement back to the host and the I/O is complete.Clearly if using a uniform access device from a VMware datastoreperspective with ESXi hosts at either location, I/O could be propagated toboth locations perhaps simultaneously (e.g. if a VM were to be vMotionedto the remote location leaving at least one VM online at the previouslocation in the same datastore). Therefore in a uniform deployment, I/Oresponse time at the passive location will always be worse (perhapssignificantly) than I/O response time at the active location. Additionally,I/O at the passive site could use up to three times the bandwidth of an I/O USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 30 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  31. 31. at the active controller site due to the need to mirror the disk and cacheas well as send the I/O in the first place across the ISL.Non-Uniform Access (VPLEX IO access pattern)While VPLEX can be configured to provide uniform access, the typicalVPLEX Metro deployment uses non-uniform access. VPLEX was built fromthe ground up for extremely efficient non-uniform access. This means ithas a different hardware and cache architecture relative to uniformaccess solutions and, contrary to what you might have already read aboutnon-uniform access clusters, provides significant advantages over uniformaccess for several reasons: 1. All controllers in a VPLEX distributed cluster are fully active. Therefore if an I/O is initiated at site A, the write will happen to the director in site A directly and be mirrored to B before the acknowledgement is given. This ensures minimal (up to 3x better compared to uniform access) response time and bandwidth regardless of where the workload is running. 2. A cross-connection where hosts at site A connect to the storage controllers at site B is not a mandatory requirement (unless using VMware FT). Additionally, with VPLEX if a cross-connect is deployed, it is only used as a last resort in the unlikely event that a full VPLEX cluster has been lost (this would be deemed a double failure since a single VPLEX cluster has no SPOFs) or the WAN has failed/been partitioned. 3. Non-uniform access uses less bandwidth and gives better response times when compared to uniform access since under normal conditions all I/O is handled by the local active controller (all controllers are active) and sent across to the remote site only once. It is important to note that read and write I/O is serviced locally within VPLEX Metro. 4. Interestingly, due to the active/active nature of VPLEX, should a full site outage occur VPLEX does not need to perform a failover since the remaining copy of the data was already active. This is another key difference when compared to uniform access since if the primary active node is lost a failover to the passive node is required.The diagram below shows a high-level architecture of VPLEX whendistributed over a Metro distance: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 31 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  32. 32. Front End Front End Front End Front End VPLEX Cluster B Communication A A A VPLEX Cluster A A A A Communication (Active) (Active) (Active) (Active) Cache Cache Cache Cache I P or (Distributed) (Distributed) (Distributed) (Distributed) FC Backend A Backend Backend A Backend A A A SITE A SITE B Figure 13 VPLEX non-uniform access layoutAs we can see in Figure 13, each host is only connected to the local VPLEXcluster ensuring that I/O flow from whatever location is always serviced bythe local storage controllers. VPLEX can achieve this because all of thecontrollers (at both sites) are in an active state and able to service I/O.Some other key differences to observe from the diagram are: 1. Storage devices behind VPLEX are only connected to each respective local VPLEX cluster and are not connected across the WAN, dramatically simplifying fabric design. 2. VPLEX has dedicated redundant WAN ports that can be connected natively to either 10GB Ethernet or 8GB FC. 3. VPLEX has multiple active controllers in each location ensuring there are no local single points of failure. With up to eight controllers in each location, VPLEX provides N+1 redundancy. 4. VPLEX uses and maintains single disk semantics across clusters at two different locations.I/O flow is also very different and more efficient when compared touniform access too as the diagram below highlights. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 32 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  33. 33. 4 1 Front End Front End Front End Front End VPLEX Cluster B Communication A A A VPLEX Cluster A A A A Communication (Active) (Active) (Active) (Active) 2 Cache Cache Cache Cache (Distributed) (Distributed) Inter Cluster Com (Distributed) (Distributed) Backend A Backend Backend A Backend A A A 3 3 SITE A SITE B Figure 14 High level VPLEX non-uniform write I/O flowThe steps below correspond to the numbers in the Figure 14: 1. Write I/O is generated by the host at either site and sent to one of the local VPLEX controllers (depending on path policy). 2. The write I/O is duplicated and sent to the remote VPLEX cluster. 3. Each VPLEX cluster now has a copy of the write I/O which is written through to the backend array at each location. Site A VPLEX does this for the array in site A, while site B VPLEX does this for the array in site B. 4. Once the remote VPLEX cluster has acknowledged back to the local cluster the acknowledgement is sent to the host and the I/O is complete.Note: Under some conditions depending on the access pattern, VPLEXmay encounter what is known as a local write miss condition. This does notnecessarily cause another step as the remote cache page owner isinvalidated as part of the write through caching activity. In effect, VPLEX isable to accomplish several distinct tasks through a single cache updatemessaging step.The table below shows a broad comparison of the expected increase inresponse time (in milliseconds) for I/O flow for both uniform and non-uniform layouts if using an FC link with a 3 ms response time (and withoutany form of external WAN acceleration / fast write technology). These USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 33 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  34. 34. numbers are additional overhead when compared to a local storagesystem of the same hardware, since I/O now has to be sent across the link. (Based on 3ms RTT and 2 round trips per IO) SITE A Site B Additional RT overhead (ms) read write read write Full Uniform (sync mirror) 0 12 6 18 Full Uniform (async Mirror) 0 6 6 12 Non-Uniform (owner hit) 0 6* 0 6* * This is comparable to standard synchronous Active/Passive replication Key Optimal Acceptable, but not efficient Sub-optimal Table 3 Uniform vs. non-uniform response time increaseNote: Table 3 Only shows the expected additional latency of the IO on theWAN and does not include any other overheads such as datapropagation delay or additional machine time at either location forremote copy processing. Your mileage will vary.As we can see in Table 3, topologies that use a uniform access patternand a synchronous disk mirror can add significantly more time to each I/O,increasing the response time by as much at 3x compared to non-uniform.Note: VPLEX Metro environments can also be configured using native IPconnectivity between sites. Using this type of topology caries furtherresponse time efficiencies since each and every IO across the WAN onlytypically incurs a single round trip.Another factor to consider when comparing the two topologies is also theamount of WAN bandwidth used. The table below shows a comparisonbetween a full uniform topology and a VPLEX non-uniform topology forbandwidth utilization. The IO size example is 128KB and the results are alsoshown in KB. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 34 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  35. 35. SITE A Site B WAN bandwidth used for a 128KB IO read write read write Full Uniform (sync or async mirror) 0 256 128 384 Non-Uniform 0 128* 0 128* * This is comparable to standard synchronous Active/Passive replication Key Optimal Acceptable, but not efficient Sub-optimal Table 4 Uniform vs. non-uniform bandwidth usageAs one can see from Table 4, non-uniform always performs local reads andalso only has to send the data payload once across the WAN for a writeI/O regardless of where the data was written. This is in stark contrast to auniform topology, especially if the write occurs at the site with the passivecontroller, since now the data has to be sent once to across the WAN (ISL)to the controller where it will both mirror the cache page (synchronouslyover the WAN again)as well as mirror the underlying storage again backover the WAN giving an overall 3x increase in WAN traffic when comparedto non-uniform.VPLEX with cross-connect and non-uniform modeWhen using VPLEX Metro with a cross cluster connect configuration (up to1ms round-trip time) is sometimes referred to as "VPLEX in uniform mode"since each ESXi host is now connected to both the local and remoteVPLEX clusters.While on the surface this does look similar to uniform mode it still typicallyfunctions in a non-uniform mode. This is because under the covers allVPLEX directors remain active and able to serve data locally, maintainingthe efficiencies of the VPLEX cache coherent architecture. Additionallywhen using cross-connected clusters, it is recommended to configure theESXi servers so that the cross-connected paths are only standby paths.Therefore even with a VPLEX cross-connected configuration, I/O flow is stilllocally serviced from each local VPLEX cluster and does not traverse thelink.The diagram below shows an example of this: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 35 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  36. 36. Paths in standby Front End Front End Front End Front End VPLEX Cluster B Communication A A A VPLEX Cluster A A A A Communication (Active) (Active) (Active) (Active) Cache Cache Cache Cache I P or (Distributed) (Distributed) (Distributed) (Distributed) FC Backend A Backend Backend A Backend A A A SITE A SITE B Figure 15 High-level VPLEX cross-connect with non-uniform I/O accessIn Figure 15, each ESXi host now has an alternate path to the remote VPLEXcluster. Compared to the typical uniform diagram in the previous section,however, we can still see that the underlying VPLEX architecture differssignificantly since it remains identical to the non-uniform layout, servicingI/O locally at either location.VPLEX with cross-connect and forced uniform modeAlthough VPLEX functions primarily in a non-uniform model, there arecertain conditions where VPLEX can sustain a type of uniform accessmode. One such condition is if cross-connect is used and certain failuresoccur causing the uniform mode to be forced.One of the scenarios where this may occur is when VPLEX and the cross-connect network are using physically separate channels and the VPLEXclusters are partitioned while the cross-connect network remains in place.The diagram below shows an example of this: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 36 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  37. 37. Front End Front End Front End Front End VPLEX Cluster B Communication A A A VPLEX Cluster A A A A Partition Communication (Active) (Active) (Passive) (Passive) Cache Cache Cache Cache (Distributed) (Distributed) (Distributed) (Distributed) Backend A Backend Backend A Backend A A A SITE A SITE B Figure 16 forced uniform mode due to WAN partitionAs illustrated in Figure 16 , VPLEX will invoke the "site preference rule"suspending access to a given distributed virtual volume at one of thelocations (in the case site B). This ultimately means that I/O at site B has totraverse the link to site A since the VPLEX controller path in site B is nowsuspended due to the preference rule.Another scenario where this might occur is if one of the VPLEX clusters ateither location becomes isolated or destroyed. The diagram below showsan example of a localized rack failure at site B which has taken the VPLEXcluster offline at site B. Front End Front End Front End Front End VPLEX Cluster B Communication A A A VPLEX Cluster A A A A Communication (Active) (Active) (offline) (offline) Localized Cache Cache Cache Cache (Distributed) (Distributed) I P or FC rack failure (Distributed)(Distributed) Backend A Backend Backend A Backend A A A SITE A SITE B Figure 17 VPLEX forced uniform mode due to cluster failureIn this scenario the VPLEX cluster remains online at site A (through VPLEXWitness) and any I/O at site B will automatically access the VPLEX cluster at USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 37 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  38. 38. site A over the cross-connect, thereby turning the standby path into anactive path.In summary, VPLEX can use ‘forced uniform’ mode as a failsafe to ensurethat the highest possible level of availability is maintained at all times.Note: Cross-connected VPLEX clusters are only supported with distancesup to 1 ms round trip time. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 38 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  39. 39. Combining VPLEX HA with VMware HA and/or FTDue to its core design, EMC VPLEX Metro provides the perfect foundationfor VMware Fault Tolerance and High Availability clustering over distanceensuring simple and transparent deployment of stretched clusters withoutany added complexity.vSphere HA and VPLEX Metro HA (federated HA)VPLEX Metro takes a single block storage device in one location and“distributes” to provide single disk semantics across two locations. Thisenables a “distributed” VMFS datastore to be created on that virtualvolume.On top of this, if the layer 2 network has also been “stretched” then asingle instance vSphere (including a single logical datacenter) can nowalso be “distributed” into more than one location and HA enabled for anygiven vSphere cluster! This is possible since the storage federation layer ofthe VPLEX is completely transparent to ESXi. It therefore enables the user toadd ESXi hosts at two different locations to the same HA cluster.Stretching a HA failover cluster (such as VMware HA) with VPLEX creates a“Federated HA” cluster over distance. This blurs the boundaries betweenlocal HA and disaster recovery since the configuration has the automaticrestart capabilities of HA combined with the geographical distancetypically associated with synchronous DR. ESX Distributed ESX HA Cluster A A A A ESX A A VPLEX WAN VPLEX A A A A Heterogeneous IP IP Heterogeneous Storage VPLEX Storage WITNESS SITE A SITE B Figure 18 VPLEX Metro HA with vSphere HAFor detailed technical setup instruction please see the VPLEX Proceduregenerator - Configuring a distributed volume as well as the " VMwarevSphere® Metro Storage Cluster Case Study " white paper found here: USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 39 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  40. 40. http://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf for additional information around: • Setting up Persistent Device Loss (PDL) handling • vCenter placement options and considerations • DRS enablement and affinity rules • Controlling restart priorities (High/Medium/Low)Use Cases for federated HAA federated HA solution is an ideal fit if a customer has two datacentersthat are no more than 5ms (round trip latency) apart and wants to enablean active/active datacenter design whilst also significantly enhancingavailability.Using this type of solution brings several key business continuity items intothe solution including downtime and disaster avoidance as well as fully-automatic service restart in the event of a total site outage. This type ofconfiguration would need to also be deployed with a stretched layer 2network to ensure seamless capability regardless of which location the VMruns in.Datacenter pooling using DRS with federated HAA nice feature of the federated HA solution is the ability for VMware DRS(Dynamic Resource Scheduler) to be enabled and function relativelytransparently within the stretched cluster.Using DRS effectively means that the vCenter/ESXi server load can bedistributed over two separate locations driving up utilization and using allavailable, formerly passive, assets. Effectively with DRS enabled, theconfiguration can be considered as two physical datacenters acting as asingle logical datacenter. This has some significant benefits since it bringsthe ability to utilize what were once passive assets at a remote locationinto a fully-active state.To enable this functionality DRS can simply be switched on within thestretched cluster and configured by the user to the desired automationlevel. Depending on the setting, VMs will then automatically start todistribute between the datacenters (Please readhttp://www.vmware.com/files/pdf/techpaper/vSPHR-CS-MTRO-STOR-CLSTR-USLET-102-HI-RES.pdf for more details). USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 40 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  41. 41. Note: A design consideration to take into account if DRS is desired within asolution is to ensure that there are enough compute and networkresources at each location to take the full load of the business servicesshould either site fail.Avoiding downtime and disasters using federated HA and vMotionAnother nice feature of a federated HA solution with vSphere is the abilityto avoid planned downtime as well as unplanned downtime. This isachievable using the vMotion ability of vCenter to move a running VM (orgroup of VMs) to any ESXi server in another (physical) datacenter. Sincethe vMotion ability is now federated over distance, planned downtimecan be avoided for events that affect an entire datacenter location.For instance, lets say that we needed to perform a power upgrade atdatacenter A which will result in the power being offline for 2 hours.Downtime can be avoided since all running VMs at site A can be movedto site B before the outage. Once the outage has ended, the VMs can bemoved back to site A using vMotion while keeping everything completelyonline.This use case can also be employed for anticipated, yet unplannedevents.For instance, a hurricane may be in close proximity to your datacenter, thissolution brings the ability to move the VMs elsewhere avoiding anypotential disaster.Note: During a planned event where power will be taken offline it is best toengage EMC support to bring the VPLEX down gracefully. However, in theevent of a scenario where time does not permit (perhaps a hurricane) itmay not be possible to involve EMC support. In this case if site A wasdestroyed there would still be no interruption assuming the VMs werevMotioned ahead of time since VPLEX Witness would ensure that the sitethat remains online keeps full access to the storage volume once site A hasbeen powered off. Please see the Failure scenarios and recovery usingfederated HA below for more details. USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 41 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY
  42. 42. Failure scenarios and recovery using federated HAThis section addresses all of the different types of failures and shows how ineach case VMware HA is able to continue or restart operations ensuringmaximum uptime.The configuration below is a representation of a typical federated HAsolution: STRETCHED VSPHERE CLUSTER (DRS + HA) ESX ESX optional cross connect A A A A A A SITE A SITE B VPLEX VPLEX A A WAN IP IP VPLEX WITNESS Figure 19 Typical VPLEX federated HA layout (multi-node cluster)The table below shows the different failure scenarios and the outcome:Failure VMs at A VMs at B NotesStorage failure at Remain online / Remain online / Cache read miss atsite A uninterrupted uninterrupted sire A now incurs additional link latency, cache read hits remain the same as do write I/O response timesStorage failure at Remain online / Remain online / Cache read miss atsite B uninterrupted uninterrupted site B now incurs additional link latency, cache read hits remain the same as do write I/O response timesVPLEX Witness failure Remain online / Remain online / Both VPLEX clusters uninterrupted uninterrupted dial homeAll ESXi hosts fail at A All VMs are restarted Remain online / Once the ESXi hosts automatically on are recovered, DRS USING VMWARE FAULT TOLERANCE AND HIGH AVAILABILITY WITH 42 VPLEX™ METRO HA FOR ULTIMATE AVAILABILITY

×