Learn about how the MetroCluster architecture in Clustered ONTAP differs from the previous version in 7-mode, when and why MetroCluster should be used, and how to transition MetroCluster from 7-mode to Clustered ONTAP.
1. GAJAH ANNUAL REPORT 2015 | 1
Next up:
MetroCluster in Clustered Data ONTAP
Webinar
Follow along on Twitter!
@FastLaneUS | #FLMC16
2. GAJAH ANNUAL REPORT 2015 | 2
MetroCluster in
Clustered Data
ONTAP
Presented by Tia Williams
Follow along on Twitter!
@FastLaneUS | #FLMC16
3. FAST LANE 2016 | 3
Why MetroCluster for Clustered Data ONTAP
MetroCluster Architecture
Two-Node MetroCluster
MetroCluster Non-Disruptive Operations
Transitioning MetroCluster to Clustered Data ONTAP 8.3
Agenda
@FastLaneUS | #FLMC16
4. FAST LANE 2016 | 4
Why MetroCluster
for Clustered Data
ONTAP Need for Continuous Availability Solution
Types of outages
Approximately 85% are planned events
Approximately 15% are unplanned events (1% natural disasters)
With shared infrastructure, negotiating a downtime is next to impossible
Approximately 70% of unplanned events are due to internal data center failures
Mission-critical applications demand no data loss
Downtime equals loss of revenue and reputation
85%
Planned events
14%
Reasons for Storage Outage
18%
9%
9%
9%
18%
18%
18%
Other
Superstorm Sandy
Vendor Patch
Vendor Software
Human Error
Power Failure
Vendor Hardware
Source: The InfoPro, Storage Wave-17, 2013
@FastLaneUS | #FLMC16
5. FAST LANE 2016 | 5
Why MetroCluster
for Clustered Data
ONTAP
The Clustered Data ONTAP® operating system provides
NDO within the data center
Ability to withstand component failures
Ability to perform maintenance operations without
disruption
Ability to perform technology refresh without
disruptionCluster Data ONTAP
Data Center A
MetroCluster™ technology enables business continuity and
continuous availability beyond the data center
MetroCluster
Cluster B in
Data Center B
Cluster A in
Data Center A
MetroCluster Extends Non-disruptive Operations
Beyond the Data Center
@FastLaneUS | #FLMC16
6. FAST LANE 2016 | 6
MetroCluster maintains the availability of your storage infrastructure
Why MetroCluster
for Clustered Data
ONTAP
Non-disruptive operations leading to zero data loss
Set-it-once simplicity
Zero change management
Lower cost and complexity of competitive solutions
Seamless integration with storage efficiency, SnapMirror,
NDO, virtualized storage
Unified: supports both SAN and NAS
Native Continuous Availability for Business-Critical Applications
Up to 200km
@FastLaneUS | #FLMC16
7. FAST LANE 2016 | 7
Why MetroCluster
for Clustered Data
ONTAP
Control failure
Storage or rack failure
Network failure
Local data center failure
Complete site failure
MetroCluster Protects Against:
“Zero minutes of planned and
unplanned downtime since 2009.”
Jack Wolfskin
Site/Bldg A
Site/Bldg B
Up to 200 km
@FastLaneUS | #FLMC16
8. FAST LANE 2016 | 8
Why MetroCluster
for Clustered Data
ONTAP
Non-disruptive operations
Data ONTAP upgrade or platform
refresh does not require an outage
Site switchover required only for disasters
and site-wide events
All local component failures handled
locally
Most workflows do not require site-
level switchover
All nodes actively serve data to applications
MetroCluster Leverages Local HA Failover
@FastLaneUS | #FLMC16
9. FAST LANE 2016 | 9
Why MetroCluster
for Clustered Data
ONTAP MetroCluster for VMware Environments
Failed
Server
VMware ESXi VMware ESXi
No Reboot,
Seamless Cutover
NetApp®
MetroCluster™
Fault Tolerance
Operating
Server
Site 1 Site 2
Virtualization makes the infrastructure mission critical
Completes VMware HA/FT
Same levels of availability
for storage that VMware
®
HA and FT provide for VMs
Simplifies operations
Zero interdependencies
No application or OS agents
Deploys with confidence
Tested and documented interoperability
On vMSC HCL since 5.0
NFS, iSCSI, FC, FCoE
Only certified NAS solution
@FastLaneUS | #FLMC16
10. FAST LANE 2016 | 10
Why MetroCluster
for Clustered Data
ONTAP
Comprehensive Protection with MetroCluster,
SnapMirror and SnapVault
MetroCluster™ with
SnapMirror® and
SnapVault® provides
continuous availability
within the data center and
disaster recovery
protection at unlimited
distances. It also provides
the ability to remotely back
up and archive to tape for
a fully integrated zero-
data-loss 3-way DR
solution.
SnapMirror®
Unlimited Distance
Disaster
Recovery Site
Backup and
Recovery Site
SnapVault®
Unlimited
Distance
Unlimited Distance
Up to 200 km
MetroCluster™
Multiple Recovery Points
with Snapshot™ copies
Local Datacenter, Campus, Metro Area
@FastLaneUS | #FLMC16
11. FAST LANE 2016 | 11
MetroCluster
Architecture
Two separate two-node clusters, one on each site, separated by up to 200 km
Two clusters are connected through redundant fabrics
NVRAM is mirrored to the local HA partner and DR partner on the remote site, sharing the same ISL fabric as the
storage replication
Data written to the primary copy and synchronously replicated to secondary copy at the remote site
Works on an aggregate level and each aggregate consists of two “plexes”, one local and the other remote
Writes are performed synchronously to both plexes and reads are performed from the local storage (by default)
Cluster peering interconnect mirrors cluster configurations
Cluster A
Data Center A
Cluster B
Data Center B
200 km
Synchronous mirroring
NVRAM mirroringNode A1
Node A2
Node B1
Node B2
NVRAM
mirroring
Cluster peering
ISL
NVRAM
mirroring
@FastLaneUS | #FLMC16
12. FAST LANE 2016 | 12
MetroCluster
Architecture MetroCluster Replication Mechanism
Three different replication streams between two HA pairs across sites
NVRAM mirrored to HA partner and DR partner
All disk traffic mirrored at the aggregate level
Dedicated switch fabric and ISLs required
Cluster configuration replicated via peered network
• All cluster configuration information is mirrored to the remote site
• Can leverage existing, shared network infrastructure
Fibre Channel
IP Network
@FastLaneUS | #FLMC16
14. FAST LANE 2016 | 14
MetroCluster
Architecture
Primary
Secondary
FC-VI
2 FC Initiators
Node A1
Node A2
Node B1
Node B2
Cluster
Interconnect
ATTO: FC-SAS Bridge
ATTO: FC-SAS Bridge
DS4243
Backend Fabric
Backend Fabric
Cluster Peering Network
Site 1
Primary
Site 1
Secondary
Site 2
Primary
Site 2
Secondary
Site 1 Site 2
3 Links
FC
FC - ISL link
SAS
FC-VI
Ethernet
@FastLaneUS | #FLMC16
15. FAST LANE 2016 | 15
MetroCluster
Architecture
Local failover/failback
For workflows such as tech refreshes and
Data ONTAP upgrades
For component failures
Simple, non-disruptive switchover/switchback
No application/host scripting/action required
Planned or unplanned
One-command switchover
Three-command switchback SO/SB
MetroCluster Failover Characteristics
@FastLaneUS | #FLMC16
16. FAST LANE 2016 | 16
MetroCluster
Architecture MetroCluster Simplifies Management
Site 1 Site 2
New Volume
MetroCluster
Others
Automatically
protected
ExchangeExchange
SharePointSharePoint
OracleOracle
New Volume
Create LUN Create replica
LUN
Aggregate
Mirroring
Create volume
Set up
replication
@FastLaneUS | #FLMC16
17. FAST LANE 2016 | 17
MetroCluster
Architecture MetroCluster Simplifies Management
Site 1 Site 2
New Volume
MetroCluster
Others
CFOD Command
ExchangeExchange
SharePointSharePoint
OracleOracle
New Volume
Aggregate
SyncMirror®
1. Break Mirror
2. Bring Online
3. Break Mirror
4. Bring Online
5. Break Mirror
6. Bring Online
7. Break Mirror
8. Bring Online
@FastLaneUS | #FLMC16
18. FAST LANE 2016 | 18
MetroCluster
Requirements
FCP
FCVI ISL Links
FC Switches FC Switches
Cluster Peering Interconnect IP
Switches: Dedicated switches (Cisco®, Brocade)
ISL/FC Link: Dedicated fibre or dedicated wavelength (with DWDM)
Cluster peering: IP network
Platform rules:
MetroCluster™ only supported in midrange and high-end controllers; FlexArray supported
All nodes in a MetroCluster DR group need to be identical (platform, storage, switches)
@FastLaneUS | #FLMC16
19. FAST LANE 2016 | 19
Two-Node
MetroCluster
Distances of up to 200 km between sites
All storage is fabric-attached and visible to both nodes
Same level of protection of an HA-pair
Random read/write performance with inline compression enabled
Switchover and switchback transfer the cluster’s entire workload between sites
Support stretch MetroCluster (500 meters)
*Four-node MetroCluster highly recommended because of local-HA failover support
@FastLaneUS | #FLMC16
20. FAST LANE 2016 | 20
MetroCluster Supported Two-Node MetroCluster Configurations
@FastLaneUS | #FLMC16
21. FAST LANE 2016 | 21
Two-Node
MetroCluster Two-Node Automatic Unplanned Switchover
Two-Node Automatic Unplanned Switchover (AUSO) is default in two node
MetroCluster configuration
Automatic failover triggered by node panic, reboot, power loss, power down
Disk ownership is transferred to the DR partner
Manual switchback required to return to normal operations
*Not available in 4-Node MetroCluster
@FastLaneUS | #FLMC16
22. FAST LANE 2016 | 22
MetroCluster
Non-Disruptive
Operations
For any failures (unplanned) on a single node in a cluster an automatic local HA failover is performed
Local HA failover is also performed for planned events
Nondisruptive software upgrades
Nondisruptive controller refreshes
Addition of new HBAs, Flash Cache™ intelligent caching, etc.
Cluster A
Data Center A
Cluster B
Data Center B
Synchronous Mirroring
ISL
@FastLaneUS | #FLMC16
23. FAST LANE 2016 | 23
Transitioning
MetroCluster to
Clustered Data ONTAP Data ONTAP 8.3.0 MetroCluster Considerations
MetroCluster™ system size with Data ONTAP® 8.3 operating system:
4 nodes (2 nodes per site)
All aggregates have to be synchronously mirrored
You cannot convert clustered Data ONTAP into MetroCluster with data in place
Site switchover is for the entire cluster
Most failure scenarios are covered by local HA
@FastLaneUS | #FLMC16
24. FAST LANE 2016 | 24
Transitioning
MetroCluster to
Clustered Data ONTAP
Technology used:
NetApp transition tool
SnapMirror®-based transition (7MTT)
Downtime of minutes plus time to stop and restart apps on the “active” nodes
(similar to other clustered Data ONTAP® transitions)
Customer-selected tools
Application-level migration tools
Hypervisor-level migration tools
Operating system volume manager tools
MetroCluster Transition Overview
@FastLaneUS | #FLMC16
25. FAST LANE 2016 | 25
Transitioning
MetroCluster to
Clustered Data ONTAP MetroCluster Transition Process
Primary Site Secondary Site
Data ONTAP® 8.3
Existing
ISLs can be shared during transition (limits apply)
@FastLaneUS | #FLMC16
26. FAST LANE 2016 | 26
Transitioning
MetroCluster to
Clustered Data ONTAP MetroCluster Consolidation Savings
Consolidate two existing instances into 4-node
MetroCluster™ DR group
Opex savings
50% savings from ISL sharing
(approximately $100K annually)
Non-disruptive upgrades/tech
refreshes (approximately $3K annually)
Space, power, and cooling
savings from consolidation
Capex savings
Extend useful life of hardware
(approximately $10K annually)
FAS80XX
FAS32XX/62XX
@FastLaneUS | #FLMC16
27. FAST LANE 2016 | 27
Transitioning
MetroCluster to
Clustered Data ONTAP MetroCluster 7-Mode and Clustered Data ONTAP Comparison
Feature Prior to Data
ONTAP 8.3
Data ONTAP
8.3
Synchronous data protection Yes Yes
Nondisruptive component failure and
replacement
Yes Yes
HA and DR Yes (2 nodes) Yes (2 or 4 nodes)
“Set and forget” ease of use Yes Yes
Cross-site switchover Single command Single command
Support/Compatible with all key Data
ONTAP® features
Dedupe, SM, SV,
tape, etc.
Same + QoS,
VolMove, SVM
Clustered Data ONTAP value proposition No Data mobility, NDO
Telecom costs No ISL sharing 4 nodes, ISL sharing
Local HA No Yes
Maximum distance 200 km 200 km
@FastLaneUS | #FLMC16
28. FAST LANE 2016 | 28
MetroCluster in
Clustered Data ONTAP MetroCluster Resources
NetApp TR-4375 MetroCluster for Clustered Data ONTAP 8.3.1
Clustered Data ONTAP 8.3 MetroCluster Installation and Configuration Guide
MetroCluster Service Guide
@FastLaneUS | #FLMC16
29. FAST LANE 2016 | 29
Summary
Why MetroCluster for clustered Data ONTAP
MetroCluster Architecture
Two Node MetroCluster
MetroCluster Non-Disruptive Operations
Transitioning MetroCluster to clustered Data ONTAP 8.3
@FastLaneUS | #FLMC16
30. FAST LANE 2016 | 30
Related Courses Clustered Data ONTAP
CIFS Administration
NFS Administration
Data ONTAP Cluster Administration
SAN Scaling and Architecting
(CIFS)
(NFS)
(DCADM)
(SANSA)
2 days
1 day
5 days
2 days
@FastLaneUS | #FLMC16
31. FAST LANE 2016 | 31
Join our
Loyalty Program!
Fast Lane Receive Prizes and Other Offers…
Check out our
current promotions!
@FastLaneUS | #FLMC16
You may ask, If I have clustered data ONTAP, why do I need MetroCluster?
Well, clustered Data ONTAP provides nondisruptive operations within the data center. Clustered Data ONTAP’s local HA capabilities allow you withstand component failures and provide maintenance and upgrades without disruption.
On the other hand, MetroCluster enables business continuity and continuous availability beyond the data center, protecting you from events that are beyond the control of the IT organization, such as natural disasters (fires, floods, hurricanes) and site impacting failures (network outage, power loss, unrecoverable corruption). With MetroCluster, your organization remains up and running by leveraging the synchronously replicated copy at the secondary site.
MetroCluster protects against events such as:
Controller failure, in which local HA failover is leveraged for nondisruptive operation
Storage or rack failure, which makes the data inaccessible
Network failure, which makes the cluster or building inaccessible
Local data center failure; for example, if you have two data centers on one campus and one data center is unavailable because of a power, cooling, or networking issue
Complete site failure; for example, a natural disaster that requires evacuation of the entire campus, city, or county
Having a clustered Data ONTAP node pair on each side of the MetroCluster provides certain benefits.
First and foremost, you benefit from nondisruptive operations and nondisruptive upgrades. You can move volumes and LUNs between nodes in a cluster without downtime.
Second, the only time you need to switch over to the remote site is for real disasters and site-wide events. All local component failures are handled locally, and most workflows do not require site-level switchover.
Finally, these are active-active HA pairs, so all nodes actively serve data to applications.
Virtualization challenges our thinking about what is mission critical. In highly virtualized infrastructures running hundreds of non-mission-critical applications, the enterprise would likely be severely affected if all of those applications became unavailable simultaneously. In that case, it is the infrastructure that is mission critical, requiring zero data loss and recovery within minutes rather than hours.
MetroCluster has deep integration with VMware, and actually is necessary to complete a VMware HA/FT environment. Although VMware manages high availability for its virtual machines and can transparently relocate them to a second site, if the storage is not available at that second site the increase in latency or total unavailability of that data makes the failover incomplete. NetApp completes the switchover to the other site by providing local storage to the relocated virtual machines.
Since there are zero interdependencies and no application or operating system agents are required, MetroCluster is a very simple solution for maintaining data availability in the event of a switchover.
NetApp works very closely with VMware to enable appropriate testing and documentation of our interoperability. In fact, not only have we been on the vMSC compatibility list since version 5.0, we are the only certified NAS solution for VMware vMSC.
In combination with MetroCluster, customers can take data protection a step further.
<CLICK>
Customers can achieve continuous availability and protection from local data center disasters with MetroCluster and can further enhance their disaster recovery protection with SnapMirror, which enables them to asynchronously replicate data over any distance. From there data can be stored on disks for faster recovery or backed up to tape for archiving or near-line storage. This capability is sometimes referred to as three-way DR or zero data loss disaster recovery.
<CLICK>
MetroCluster can also be backed up remotely to disk and then tape using SnapVault. This option provides an even lower cost long-term archiving solution for data.
<CLICK>
For a fully integrated business continuity solution with disaster recovery and backup, all three can be implemented. This provides the range of data storage and protection options needed to meet the most stringent enterprise demands.
Now let’s have a look at the basic MetroCluster architecture.
<run the animation for each bullet>
There are two separate 2-node clusters, 1 on each site, separated by up to 200 km
The clusters are connected through redundant fabrics
NVRAM is mirrored to the local HA partner and the DR partner on the remote site, sharing the same ISL fabric as the storage replication
Data is written to the primary copy and synchronously replicated to the secondary copy in the remote site
MetroCluster works on an aggregate level and each aggregate consists of two “plexes,” one local and the other remote
Writes are performed synchronously to both plexes and reads are performed from the local storage (by default), but they can be configured to read from both local and remote storage; this can be useful when the two clusters are close enough that latency is not an issue, with the benefit that read performance can be increased
Cluster peering interconnect mirrors cluster configurations
There are three replication streams between the HA pairs that make up a MetroCluster system.
Nonvolatile RAM (NVRAM) is a key component of all NetApp FAS systems. MetroCluster requires that all data written to a controller be synchronously mirrored not only to its own HA pair, but also to the remote HA pair. In addition, the actual data written to disk is mirrored at the aggregate level. Both of these functions require Fibre Channel connectivity, with a dedicated switch fabric and dedicated ISLs.
In addition, the cluster configuration is replicated over a shared IP network.
-----------------------
NOTES:
The IP link is new for clustered Data ONTAP to enable cluster peering; it was not needed for 7-Mode.
The cluster peering used is the same as that used between clusters for SnapMirror and SnapVault (the point is that it’s not something new and untested).
Animate to build the diagram.
As noted, local HA failover typically handles most issues facing a data center, including tech refresh and Data ONTAP® upgrades.
But even when a switchover is required, it is a rather seamless process with MetroCluster. No application or host scripting is required, and no action is required on the part of the host. Whether the switchover is planned or unplanned, only a single command is required to switch over and only three commands are required to switch back.
Note that a planned switchover will have lower outage time for applications, because the MetroCluster systems at both sites negotiate for clean and fast switchover.
This slide compares MetroCluster to other vendor solutions. Looking at the top half of the slide, with a competitive vendor solution, when you set up a replication relationship you have to create a new volume on the primary data center storage. Then you have to create a destination volume on the second storage and set up the replication relationship. That’s three steps. MetroCluster mirrors at the aggregate level, so when you create the volume on the primary storage it is automatically set up on the second storage, including the replication relationship. That’s one step compared to three. So, any time you need to make config changes and so on, management is a lot simpler.
It’s just as easy when it comes to addressing unplanned downtime. With other vendor solutions, if there’s a failure on the primary, you have to break the mirror for every volume that you have and bring the secondary volume online in the second data center. Imagine if you had hundreds of volumes—that could increase your downtime significantly, plus you could be exposed to human error, for example, forgetting to bring a volume up online. With MetroCluster, a single command initiates the failover and brings all the volumes online at the secondary site. With OnCommand Workflow Automation, you can even script switchover and switchback.
No need to move anything over from primary to secondary (everything is the same on both sides). At the host level you may need to move things over (if switching over to a secondary site you may need to move virtual server images); otherwise, MetroCluster can pick up the IP addresses of the primary host so no changes need to be made to redirect the host to the secondary site.
Benefit: Reduced unplanned downtime and human error.
Getting into more detail on the connectivity requirements:
Dedicated switches are required, and MetroCluster supports equipment from both Cisco and Brocade
For the ISL/FC Links: Dedicated fiber or dedicated wavelength (with DWDM)
Cluster peering uses a shared IP network
And, as you can see, everything is redundant.
Platform rules:
MetroCluster is only supported in midrange and high-end controllers, and NetApp FlexArray virtualization software is fully supported, so you can leverage your investment in non-FAS storage arrays for your continuous availability needs.
All nodes in a MetroCluster DR group need to be identical (platform, storage, switches)
NOTES:
Links between FC switches: 1 supported, 2 recommended
MetroCluster DR group = 4 nodes total = 2 separate clustered Data ONTAP HA pairs
You can disable AUSO with the metrocluster modify -automatic-switchover-onfailure fals
Follow the animation…
(self-explanatory)
If you are an existing MetroCluster customer upgrading from 7-Mode MetroCluster, here are some things to consider.
First, we recommend refreshing your platform as well as the underlying storage. This is true for any 7-Mode to clustered Data ONTAP migration.
As far as migration is concerned, there are a number of choices. The method you use depends on the type of application and environment you have.
In general, you can use the 7-Mode transition tool (7MTT) to replicate data from your current system to your new MetroCluster system. As with any upgrade from 7-Mode to clustered Data ONTAP, some down time is required.
In addition, you can make use of application-level, hypervisor-level, or operating system–level tools that may be able to migrate some of your applications without disruption.
From a high-level perspective, what you need to do is:
CLICK
Implement your new MetroCluster, then replicate your data to the new MetroCluster. In this example we show SnapMirror replicating the data as part of the process using the 7-Mode Transition Tool.
CLICK
Then you reconfigure your applications to stop using your old MetroCluster and
CLICK
start using the data on the new MetroCluster.
CLICK
Note that ISLs can be shared during transition, but you need to verify that the existing ISLs will be able to handle the increased load of both MetroCluster systems running at the same time and that they can support the new MetroCluster ISL speeds (4, 8, or 16 Gbps).
Switches can also be shared; however:
The switches must be supported by the new MetroCluster
They must have unused switch ports
A separate VLAN is required for each MetroCluster
Hide/skip for customers with only one 7-Mode MetroCluster
Another benefit of upgrading to MetroCluster 8.3 is the savings you can get from consolidating more than one existing MetroCluster into a single new system.
First, you get a number of operational expense savings:
Since you only need one set of ISLs, you end up saving 50% on those costs.
You also get the benefits of nondisruptive operations and nondestructive technical refreshes from moving to clustered Data ONTAP.
Finally, consolidation typically saves you on space, power, and cooling costs
From a capital expense perspective, you will save money by extending the useful life of your hardware, another benefit of moving to clustered Data ONTAP. This is because you can keep using it until the end of its term, when you can upgrade without disruption. In the past you would have had to refresh months in advance in order to plan for downtime across all of your stakeholders.
---------------------
NOTE:
Again, this is savings from moving to clustered Data ONTAP, for which your planning cycle for upgrades is much shorter because your upgrade is nondisruptive. That means that instead of planning six to eight months in advance of term’s end and migrating three to five months in advance of term’s end, you can plan three months in advance and migrate within weeks of the end of your term. So a 3-year term that really only lasted 2.5 years in the past meant a refresh cycle of 2.5 years, losing 6 months of the useful life of the equipment due to the necessary early refresh. Now you get to use your equipment for nearly the full term of your investment.fsD
This table illustrates some of the differences between MetroCluster in Data ONTAP 8.3 and MetroCluster prior to Data ONTAP 8.3.
The primary differences are:
MetroCluster 8.3 supports all of the key clustered Data ONTAP features such as nondisruptive operations, data mobility, SVM, NDO, and so on
ISL sharing
Local HA
----------------------------------------
NOTE:
ISL sharing prior to Data ONTAP 8.3 was only for Twin MetroCluster.