SlideShare a Scribd company logo
1 of 8
Download to read offline
WHITE PAPER
WHITE PAPER | Uninterrupted Access to
Cluster SharedVolumes (CSVs) Synchronously
Mirrored Across Metropolitan Hot Sites
Failover Cluster support inWindows Server 2008 R2 with Hyper-V provides
a powerful mechanism to minimize the effects of planned and unplanned
server downtime. It coordinates live migrations and failover of workloads
between servers through a Cluster SharedVolume (CSV).The health of
the cluster depends on maintaining continuous access to the CSV and the
shared disk on which it resides.
In this paper you will learn how DataCore Software solves a
longstanding stumbling block to clustered systems spread across
metropolitan sites by providing uninterrupted access to the CSV despite
the many technical and environmental conditions that conspire to
disrupt it.
CSV shared disk exposes cluster to single
point of failure & disruption
A glaring shortcoming with many cluster configurations comes from
their dependence on a single, shared disk subsystem to house their CSV.
Despite standard precautions like RAID sets and internally redundant
controllers, a single “bullet proof” storage subsystem cannot be solely
counted on during:
a.	 Planned maintenance (major firmware upgrades, major hardware
repairs)
b.	 Facility problems (power failure, loss of air conditioning, water leak)
c.	 Technician error (unintentional power down)
d.	 Site-wide disaster
Synchronous replication alone doesn’t prevent downtime
Some IT organizations seeking protection against a single point of failure go to great expense to
synchronously mirror the CSV between a primary online disk at one location and an offline mirrored
replica at a different location, while spreading servers across sites. When one site’s storage is taken out-of-
service, they use special scripts to remap the offline replica to the clustered servers. While this precaution
protects against data loss, it does not prevent the cluster from going down and taking with it numerous
applications (Figure 1).
Simply put, the process of cutting over to the offline replica disrupts cluster operations causing workloads
to suffer downtime from the moment access to the primary copy of the CSV is lost. Only after the offline
replica is brought online and mapped to the surviving servers can the environment be restarted. For this
reason, the procedure is considered a Disaster Recovery alternative.
Switching workloads non-disruptively to another site when the main site’s storage device goes down
proves to be a far more elusive, business continuity objective.
Figure 1. Application Downtime
Non-stop Access to CSV from DataCore Software
DataCore Software plays a unique role in Microsoft Failover Clusters configurations by providing
uninterrupted access to CSVs that are synchronously mirrored between two sites.
Figure 2. Conceptual view of CSV Shared disk provided by DataCore Software
Like other replication technologies, DataCore storage virtualization software synchronously mirrors CSV
updates to separate physical disks stored at each location, but with the following important distinctions:
The multi-site mirrors behave as a single virtual shared disk from the cluster’s perspective. Each1.	
cluster node designates a preferred path and an alternate path to the virtual disk on which the CSV is
stored using a standard Multi-path I/O (MPIO) driver. The preferred path routes I/O requests to the
mirror image closest to the node (at the same site), while the alternate path directs requests to the
mirror image at the other site.
When the disk or disk subsystem housing the CSV at one site is taken out-of-service, the clustered2.	
node’s MPIO driver will merely receive an error on the preferred path and automatically fail-over
to the alternate path. The mirror image of the virtual disks at the other site transparently fields the
I/O request without hesitation. Neither the cluster nor its applications are aware of the automatic
redirection. There is no downtime waiting for an offline copy to go online and be remapped at the
other location.
Figure 3. Single, shared disk view of CSV spread across metropolitan Hot-Hot sites
Figure 4. Transparent switchover to CSV mirror at Hot site
There are no host scripts, or custom coding changes to take advantage of the shared virtual disk split3.	
across sites because the mirrored virtual disk appears as a well behaved, single multi-ported, shared
device.
All synchronous mirroring is performed outside the disk subsystems allowing different models and4.	
brands to be used at either end.
The synchronous mirroring software and associated virtualization stack does not compete for host5.	
cycles when run on its own physical nodes and takes advantage of large caches to speed up the
response.
The storage virtualization/mirroring software may be run in any one of 3 places:6.	
a.	 On a dedicated node at each site acting as a front-end to that site’s storage pool
b.	On a virtual machine (VM) alongside other VM workloads in a physical server at each site.
c.	 Layered on top of Hyper-V accessible by multiple VMs on the same physical server at each site.
Non-stop Access to CSV from DataCore Software
CombiningWindowsFailoverClustercapabilitieswithDataCoremulti-sitesharedvirtualdiskredundancy
prevents a site-wide outage or disaster from disrupting mission critical applications. It uniquely enables
hot-hot sites within metropolitan distances.
Rather than declaring a “disaster” when one site’s cluster servers and storage subsystems are taken
offline, the cluster fails over applications to the alternate site servers. Those servers’ will simply continue
to execute using the same virtual shared disks. Only now, the I/O requests for the CSV and the virtual
shared disks are fielded by the surviving storage subsystems at the alternate site.
Figure 5. Cluster switchover to Hot site using mirrored CSV
Restoring Original Conditions
How the original conditions are restored following a site-wide outage depends on whether the outage
was planned or unplanned. It also depends on whether equipment was damaged or not. To better
understand the restoration behavior, it’s useful to see how DataCore software handles an I/O to the CSV
stored on a virtual disk. For any mirrored virtual disk, one DataCore node owns the primary copy and
another holds the secondary copy. They maintain these copies in lock step by synchronously mirroring
updates in real-time from the primary copy to the secondary copy.
In this Figure 6, Site A owns primary mirror labeled “P” and Site “B” holds the secondary copy labeled “S”.
The preferred path from the cluster node in Site A to the CSV is assigned to the DataCore node that holds
the primary copy of the mirrored set. Under normal operation, all read and write requests issued from
Site A cluster servers to the CSV will be serviced by the primary copy local to that site. The secondary
copy need only keep up with new updates. Generally, DataCore nodes are configured to control primary
copies for some virtual disks and secondary for others, thereby evenly balancing their read workloads.
In step 1, the cluster server writes to the virtual disk on the preferred path, the DataCore node at
Site A caches it and immediately retransmits it to the partner node responsible for the secondary copy
via step 2. The partner node treats it like any other write request, putting the data into its cache and
acknowledging it in step 3.
When Node A receives the acknowledgement from Node B, it signals I/O complete to the cluster server
in step 4 having met the conditions for multi-cast stable storage. Sometime later, based on destaging
algorithms, each of the nodes stores the cached block onto their respective physical disks as illustrated
by steps 5 and 6. But these slower disk writes don’t figure directly into the response time from cache,
so the applications sense much faster write behavior. An important note: DataCore nodes do not share
state to maintain the mirrored copies synchronized as one does in the clustered server design.
From a physical stand point, best practices call for the DataCore nodes to be maintained in a separate
chassis at different locations with their respective portion of the disk pool so that each node can benefit
from separate power, cooling and uninterruptible power supplies (UPS). The physical separation reduces
the possibility that a single mishap will affect both members of the mirrored set. The practical limit for
synchronous mirroring today is around 35 kilometers using Fibre Channel connections between nodes
over dark fiber.
Figure 6. Synchronous mirroring between DataCore storage virtualization nodes
Restoration Following a Planned Site-Wide Outage
For a planned site-wide outage, say for air conditioning repairs, or in anticipation of a forecasted storm,
the site’s mirror disks may be gracefully taken out of service while the other site takes over. In this
scenario, the DataCore software at the alternate site keeps track of what disk blocks were updated
during the planned outage.
When the equipment is put back into service, the DataCore software will copy the latest information
from those changed block to the corresponding mirror image until both sites are back in sync. At that
point, the cluster can be restored to run across both sites. The time to restore the site is determined by
how many distinct blocks changed during the outage. Nevertheless, applying only incremental changes
shortens the restoration period significantly.
Figure 7. Keeping track of changes when one site is taken-out-service
Figure 8. Resynchronization of mirrored CSV during site restoration
Restoration Following an Unplanned Site-Wide Outage
Should a more catastrophic event occur where the equipment at one
sitecannotbegracefullytakenoutofservice,asimilar,butalengthier
restoration process occurs. There are two levels of severity. First
let’s consider the case when the servers and storage virtualization
nodes crashed, but the storage devices were not damaged. When the
storage virtualization nodes are rebooted, the DataCore software
takes inventory of its last known state, particularly of cached disk
blocks that were not captured on the disks. The alternate site must
then update all those “dirty” blocks along with all blocks known
to have changed during the outage before the two sites can fully
resynchronize.
In a more severe case, some or all of the physical disks at one site may
be damaged and must be completely reinitialized using the latest
image from the alternate hot site. Clearly, this will add to the time it
takes to restore normal operations. However, in both scenarios, the
urgency to do so has been reduced because the alternate hot site
is continuing to service applications without any major disruption.
And in neither case are any special host scripts required to bring
back the original conditions.
Bottom line- No storage-related downtime
IT organizations derive numerous operational benefits from
combining Windows Failover Cluster capabilities with DataCore’s
device-independent approach to storage virtualization, particularly
when the cluster is split between metropolitan sites. The most far
reaching is the elimination of single points of failure and disruption
that today is responsible for storage-related downtime.
Figure 9. Resume Normal Operations
©2010 DataCore Software Corporation All rights reserved.
DataCore, the DataCore logo, SANsymphony, and SANmelody
are trademarks or registered trademarks of DataCore Software
Corporation. All other products, services and company names
mentioned herein may be trademarks of their respective owners.
For more information on storage
virtualization, please visit:
www.datacore.com

More Related Content

What's hot

Hyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son VuHyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son Vu
vncson
 
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
Principled Technologies
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
David Walker
 

What's hot (16)

Build Your 2008R2 2-Node Cluster
Build Your 2008R2 2-Node ClusterBuild Your 2008R2 2-Node Cluster
Build Your 2008R2 2-Node Cluster
 
Double-Take Availability - Technical Presentation
Double-Take Availability - Technical PresentationDouble-Take Availability - Technical Presentation
Double-Take Availability - Technical Presentation
 
Windows server 2012 r2 Hyper-v Component architecture
Windows server 2012 r2 Hyper-v Component architecture Windows server 2012 r2 Hyper-v Component architecture
Windows server 2012 r2 Hyper-v Component architecture
 
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
HIGH AVAILABILITY AND LOAD BALANCING FOR POSTGRESQL DATABASES: DESIGNING AND ...
 
Rha cluster suite wppdf
Rha cluster suite wppdfRha cluster suite wppdf
Rha cluster suite wppdf
 
Double-Take Software
Double-Take SoftwareDouble-Take Software
Double-Take Software
 
Hyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son VuHyper V In Windows Server 2008 R2.Son Vu
Hyper V In Windows Server 2008 R2.Son Vu
 
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
VMware vSphere memory overcommitment delivered greater VM density than Red Ha...
 
Red Hat OpenShift Container Platform delivers enterprise-grade application co...
Red Hat OpenShift Container Platform delivers enterprise-grade application co...Red Hat OpenShift Container Platform delivers enterprise-grade application co...
Red Hat OpenShift Container Platform delivers enterprise-grade application co...
 
SQL Server Clustering Part1
SQL Server Clustering Part1SQL Server Clustering Part1
SQL Server Clustering Part1
 
Modern Travelling Sales Man- Solution through travel time interpolation
Modern Travelling Sales Man- Solution through travel time interpolationModern Travelling Sales Man- Solution through travel time interpolation
Modern Travelling Sales Man- Solution through travel time interpolation
 
cloud computing: Vm migration
cloud computing: Vm migrationcloud computing: Vm migration
cloud computing: Vm migration
 
EOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - PaperEOUG95 - Client Server Very Large Databases - Paper
EOUG95 - Client Server Very Large Databases - Paper
 
ResumeJagannath
ResumeJagannathResumeJagannath
ResumeJagannath
 
Evaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xenEvaluating the performance and behaviour of rt xen
Evaluating the performance and behaviour of rt xen
 
White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices   White Paper: DB2 and FAST VP Testing and Best Practices
White Paper: DB2 and FAST VP Testing and Best Practices
 

Viewers also liked

Viewers also liked (6)

ESG Datacore SANsymphony-V Whitepaper
ESG Datacore SANsymphony-V WhitepaperESG Datacore SANsymphony-V Whitepaper
ESG Datacore SANsymphony-V Whitepaper
 
Optimizing The Economics of Storage: It's All About the Benjamins
Optimizing The Economics of Storage: It's All About the BenjaminsOptimizing The Economics of Storage: It's All About the Benjamins
Optimizing The Economics of Storage: It's All About the Benjamins
 
Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?Can $0.08 Change your View of Storage?
Can $0.08 Change your View of Storage?
 
Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs Integrating Hyper-converged Systems with Existing SANs
Integrating Hyper-converged Systems with Existing SANs
 
DataCore At VMworld 2016
DataCore At VMworld 2016DataCore At VMworld 2016
DataCore At VMworld 2016
 
Fighting the Hidden Costs of Data Storage
Fighting the Hidden Costs of Data StorageFighting the Hidden Costs of Data Storage
Fighting the Hidden Costs of Data Storage
 

Similar to Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored Across Metropolitan Hot Sites

Virtualization_EETimes_SalesConf2011
Virtualization_EETimes_SalesConf2011Virtualization_EETimes_SalesConf2011
Virtualization_EETimes_SalesConf2011
ijlalshah
 

Similar to Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored Across Metropolitan Hot Sites (20)

Fundamentals Of Transaction Systems - Part 2: Certainty suppresses Uncertaint...
Fundamentals Of Transaction Systems - Part 2: Certainty suppresses Uncertaint...Fundamentals Of Transaction Systems - Part 2: Certainty suppresses Uncertaint...
Fundamentals Of Transaction Systems - Part 2: Certainty suppresses Uncertaint...
 
Hyper v r2 deep dive
Hyper v r2 deep diveHyper v r2 deep dive
Hyper v r2 deep dive
 
Storage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 NotesStorage Area Networks Unit 2 Notes
Storage Area Networks Unit 2 Notes
 
Installation Guide
Installation GuideInstallation Guide
Installation Guide
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
Dynamic Storage Mobility for Hyper V
Dynamic Storage Mobility for Hyper VDynamic Storage Mobility for Hyper V
Dynamic Storage Mobility for Hyper V
 
Performance management in the virtual data center
Performance management in the virtual data centerPerformance management in the virtual data center
Performance management in the virtual data center
 
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized APIImplementing a Solution to the Cloud Vendor Lock-In Using Standardized API
Implementing a Solution to the Cloud Vendor Lock-In Using Standardized API
 
XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME JP BLAKE, ASSURED INFORMATION SE...
XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME  JP BLAKE, ASSURED INFORMATION SE...XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME  JP BLAKE, ASSURED INFORMATION SE...
XPDS13: VIRTUAL DISK INTEGRITY IN REAL TIME JP BLAKE, ASSURED INFORMATION SE...
 
Virtualization_EETimes_SalesConf2011
Virtualization_EETimes_SalesConf2011Virtualization_EETimes_SalesConf2011
Virtualization_EETimes_SalesConf2011
 
A Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and ContainersA Survey of Performance Comparison between Virtual Machines and Containers
A Survey of Performance Comparison between Virtual Machines and Containers
 
Benchmarking a Scalable and Highly Available Architecture for Virtual Desktops
Benchmarking a Scalable and Highly Available Architecture for Virtual DesktopsBenchmarking a Scalable and Highly Available Architecture for Virtual Desktops
Benchmarking a Scalable and Highly Available Architecture for Virtual Desktops
 
HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic
HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan BaljevicHP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic
HP-UX 11i LVM Mirroring Features and Multi-threads by Dusan Baljevic
 
Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...Resumption of virtual machines after adaptive deduplication of virtual machin...
Resumption of virtual machines after adaptive deduplication of virtual machin...
 
Vmware documentação tecnica
Vmware documentação tecnicaVmware documentação tecnica
Vmware documentação tecnica
 
New Features For Your Software Defined Storage
New Features For Your Software Defined StorageNew Features For Your Software Defined Storage
New Features For Your Software Defined Storage
 
Gridstore's Software-Defined-Storage Architecture
Gridstore's Software-Defined-Storage ArchitectureGridstore's Software-Defined-Storage Architecture
Gridstore's Software-Defined-Storage Architecture
 
HA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybridHA/DR options with SQL Server in Azure and hybrid
HA/DR options with SQL Server in Azure and hybrid
 
Operating System extension.docx
Operating System extension.docxOperating System extension.docx
Operating System extension.docx
 
Virtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A SurveyVirtual Machine Migration Techniques in Cloud Environment: A Survey
Virtual Machine Migration Techniques in Cloud Environment: A Survey
 

More from DataCore Software

More from DataCore Software (20)

Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
Software-Defined Storage Accelerates Storage Cost Reduction and Service-Level...
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
 
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined StorageZero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
Zero Downtime, Zero Touch Stretch Clusters from Software-Defined Storage
 
From Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the UnexpectedFrom Disaster to Recovery: Preparing Your IT for the Unexpected
From Disaster to Recovery: Preparing Your IT for the Unexpected
 
How to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANsHow to Integrate Hyperconverged Systems with Existing SANs
How to Integrate Hyperconverged Systems with Existing SANs
 
How to Avoid Disasters via Software-Defined Storage Replication & Site Recovery
How to Avoid Disasters via Software-Defined Storage Replication & Site RecoveryHow to Avoid Disasters via Software-Defined Storage Replication & Site Recovery
How to Avoid Disasters via Software-Defined Storage Replication & Site Recovery
 
Cloud Infrastructure for Your Data Center
Cloud Infrastructure for Your Data CenterCloud Infrastructure for Your Data Center
Cloud Infrastructure for Your Data Center
 
Building a Highly Available Data Infrastructure
Building a Highly Available Data InfrastructureBuilding a Highly Available Data Infrastructure
Building a Highly Available Data Infrastructure
 
TUI Case Study
TUI Case StudyTUI Case Study
TUI Case Study
 
Thorntons Case Study
Thorntons Case StudyThorntons Case Study
Thorntons Case Study
 
Top 3 Challenges Impacting Your Data and How to Solve Them
Top 3 Challenges Impacting Your Data and How to Solve ThemTop 3 Challenges Impacting Your Data and How to Solve Them
Top 3 Challenges Impacting Your Data and How to Solve Them
 
Business Continuity for Mission Critical Applications
Business Continuity for Mission Critical ApplicationsBusiness Continuity for Mission Critical Applications
Business Continuity for Mission Critical Applications
 
Dynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data CenterDynamic Hyper-Converged Future Proof Your Data Center
Dynamic Hyper-Converged Future Proof Your Data Center
 
Community Health Network Delivers Unprecedented Availability for Critical Hea...
Community Health Network Delivers Unprecedented Availability for Critical Hea...Community Health Network Delivers Unprecedented Availability for Critical Hea...
Community Health Network Delivers Unprecedented Availability for Critical Hea...
 
Case Study: Mission Community Hospital
Case Study: Mission Community HospitalCase Study: Mission Community Hospital
Case Study: Mission Community Hospital
 
Emergency Communication of Southern Oregon
Emergency Communication of Southern OregonEmergency Communication of Southern Oregon
Emergency Communication of Southern Oregon
 
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in ComputingThe Need for Speed: Parallel I/O and the New Tick-Tock in Computing
The Need for Speed: Parallel I/O and the New Tick-Tock in Computing
 
Solutions for Healthcare IT
Solutions for Healthcare ITSolutions for Healthcare IT
Solutions for Healthcare IT
 
Delivering First Class performance and Availability for Virtualized Tier 1 Apps
Delivering First Class performance and Availability for Virtualized Tier 1 Apps Delivering First Class performance and Availability for Virtualized Tier 1 Apps
Delivering First Class performance and Availability for Virtualized Tier 1 Apps
 
How to Sell Storage Virtualization to The CIO
How to Sell Storage Virtualization to The CIOHow to Sell Storage Virtualization to The CIO
How to Sell Storage Virtualization to The CIO
 

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
Max Lee
 

Recently uploaded (20)

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
INGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by DesignINGKA DIGITAL: Linked Metadata by Design
INGKA DIGITAL: Linked Metadata by Design
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Crafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM IntegrationCrafting the Perfect Measurement Sheet with PLM Integration
Crafting the Perfect Measurement Sheet with PLM Integration
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)JustNaik Solution Deck (stage bus sector)
JustNaik Solution Deck (stage bus sector)
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
AI Hackathon.pptx
AI                        Hackathon.pptxAI                        Hackathon.pptx
AI Hackathon.pptx
 
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024Automate your OpenSIPS config tests - OpenSIPS Summit 2024
Automate your OpenSIPS config tests - OpenSIPS Summit 2024
 
Microsoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdfMicrosoft365_Dev_Security_2024_05_16.pdf
Microsoft365_Dev_Security_2024_05_16.pdf
 
Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024Secure Software Ecosystem Teqnation 2024
Secure Software Ecosystem Teqnation 2024
 
A Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data MigrationA Guideline to Zendesk to Re:amaze Data Migration
A Guideline to Zendesk to Re:amaze Data Migration
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 

Uninterrupted access to Cluster Shared volumes (CSVs) Synchronously Mirrored Across Metropolitan Hot Sites

  • 1. WHITE PAPER WHITE PAPER | Uninterrupted Access to Cluster SharedVolumes (CSVs) Synchronously Mirrored Across Metropolitan Hot Sites Failover Cluster support inWindows Server 2008 R2 with Hyper-V provides a powerful mechanism to minimize the effects of planned and unplanned server downtime. It coordinates live migrations and failover of workloads between servers through a Cluster SharedVolume (CSV).The health of the cluster depends on maintaining continuous access to the CSV and the shared disk on which it resides. In this paper you will learn how DataCore Software solves a longstanding stumbling block to clustered systems spread across metropolitan sites by providing uninterrupted access to the CSV despite the many technical and environmental conditions that conspire to disrupt it. CSV shared disk exposes cluster to single point of failure & disruption A glaring shortcoming with many cluster configurations comes from their dependence on a single, shared disk subsystem to house their CSV. Despite standard precautions like RAID sets and internally redundant controllers, a single “bullet proof” storage subsystem cannot be solely counted on during: a. Planned maintenance (major firmware upgrades, major hardware repairs) b. Facility problems (power failure, loss of air conditioning, water leak) c. Technician error (unintentional power down) d. Site-wide disaster
  • 2. Synchronous replication alone doesn’t prevent downtime Some IT organizations seeking protection against a single point of failure go to great expense to synchronously mirror the CSV between a primary online disk at one location and an offline mirrored replica at a different location, while spreading servers across sites. When one site’s storage is taken out-of- service, they use special scripts to remap the offline replica to the clustered servers. While this precaution protects against data loss, it does not prevent the cluster from going down and taking with it numerous applications (Figure 1). Simply put, the process of cutting over to the offline replica disrupts cluster operations causing workloads to suffer downtime from the moment access to the primary copy of the CSV is lost. Only after the offline replica is brought online and mapped to the surviving servers can the environment be restarted. For this reason, the procedure is considered a Disaster Recovery alternative. Switching workloads non-disruptively to another site when the main site’s storage device goes down proves to be a far more elusive, business continuity objective. Figure 1. Application Downtime
  • 3. Non-stop Access to CSV from DataCore Software DataCore Software plays a unique role in Microsoft Failover Clusters configurations by providing uninterrupted access to CSVs that are synchronously mirrored between two sites. Figure 2. Conceptual view of CSV Shared disk provided by DataCore Software Like other replication technologies, DataCore storage virtualization software synchronously mirrors CSV updates to separate physical disks stored at each location, but with the following important distinctions: The multi-site mirrors behave as a single virtual shared disk from the cluster’s perspective. Each1. cluster node designates a preferred path and an alternate path to the virtual disk on which the CSV is stored using a standard Multi-path I/O (MPIO) driver. The preferred path routes I/O requests to the mirror image closest to the node (at the same site), while the alternate path directs requests to the mirror image at the other site.
  • 4. When the disk or disk subsystem housing the CSV at one site is taken out-of-service, the clustered2. node’s MPIO driver will merely receive an error on the preferred path and automatically fail-over to the alternate path. The mirror image of the virtual disks at the other site transparently fields the I/O request without hesitation. Neither the cluster nor its applications are aware of the automatic redirection. There is no downtime waiting for an offline copy to go online and be remapped at the other location. Figure 3. Single, shared disk view of CSV spread across metropolitan Hot-Hot sites Figure 4. Transparent switchover to CSV mirror at Hot site
  • 5. There are no host scripts, or custom coding changes to take advantage of the shared virtual disk split3. across sites because the mirrored virtual disk appears as a well behaved, single multi-ported, shared device. All synchronous mirroring is performed outside the disk subsystems allowing different models and4. brands to be used at either end. The synchronous mirroring software and associated virtualization stack does not compete for host5. cycles when run on its own physical nodes and takes advantage of large caches to speed up the response. The storage virtualization/mirroring software may be run in any one of 3 places:6. a. On a dedicated node at each site acting as a front-end to that site’s storage pool b. On a virtual machine (VM) alongside other VM workloads in a physical server at each site. c. Layered on top of Hyper-V accessible by multiple VMs on the same physical server at each site. Non-stop Access to CSV from DataCore Software CombiningWindowsFailoverClustercapabilitieswithDataCoremulti-sitesharedvirtualdiskredundancy prevents a site-wide outage or disaster from disrupting mission critical applications. It uniquely enables hot-hot sites within metropolitan distances. Rather than declaring a “disaster” when one site’s cluster servers and storage subsystems are taken offline, the cluster fails over applications to the alternate site servers. Those servers’ will simply continue to execute using the same virtual shared disks. Only now, the I/O requests for the CSV and the virtual shared disks are fielded by the surviving storage subsystems at the alternate site. Figure 5. Cluster switchover to Hot site using mirrored CSV
  • 6. Restoring Original Conditions How the original conditions are restored following a site-wide outage depends on whether the outage was planned or unplanned. It also depends on whether equipment was damaged or not. To better understand the restoration behavior, it’s useful to see how DataCore software handles an I/O to the CSV stored on a virtual disk. For any mirrored virtual disk, one DataCore node owns the primary copy and another holds the secondary copy. They maintain these copies in lock step by synchronously mirroring updates in real-time from the primary copy to the secondary copy. In this Figure 6, Site A owns primary mirror labeled “P” and Site “B” holds the secondary copy labeled “S”. The preferred path from the cluster node in Site A to the CSV is assigned to the DataCore node that holds the primary copy of the mirrored set. Under normal operation, all read and write requests issued from Site A cluster servers to the CSV will be serviced by the primary copy local to that site. The secondary copy need only keep up with new updates. Generally, DataCore nodes are configured to control primary copies for some virtual disks and secondary for others, thereby evenly balancing their read workloads. In step 1, the cluster server writes to the virtual disk on the preferred path, the DataCore node at Site A caches it and immediately retransmits it to the partner node responsible for the secondary copy via step 2. The partner node treats it like any other write request, putting the data into its cache and acknowledging it in step 3. When Node A receives the acknowledgement from Node B, it signals I/O complete to the cluster server in step 4 having met the conditions for multi-cast stable storage. Sometime later, based on destaging algorithms, each of the nodes stores the cached block onto their respective physical disks as illustrated by steps 5 and 6. But these slower disk writes don’t figure directly into the response time from cache, so the applications sense much faster write behavior. An important note: DataCore nodes do not share state to maintain the mirrored copies synchronized as one does in the clustered server design. From a physical stand point, best practices call for the DataCore nodes to be maintained in a separate chassis at different locations with their respective portion of the disk pool so that each node can benefit from separate power, cooling and uninterruptible power supplies (UPS). The physical separation reduces the possibility that a single mishap will affect both members of the mirrored set. The practical limit for synchronous mirroring today is around 35 kilometers using Fibre Channel connections between nodes over dark fiber. Figure 6. Synchronous mirroring between DataCore storage virtualization nodes
  • 7. Restoration Following a Planned Site-Wide Outage For a planned site-wide outage, say for air conditioning repairs, or in anticipation of a forecasted storm, the site’s mirror disks may be gracefully taken out of service while the other site takes over. In this scenario, the DataCore software at the alternate site keeps track of what disk blocks were updated during the planned outage. When the equipment is put back into service, the DataCore software will copy the latest information from those changed block to the corresponding mirror image until both sites are back in sync. At that point, the cluster can be restored to run across both sites. The time to restore the site is determined by how many distinct blocks changed during the outage. Nevertheless, applying only incremental changes shortens the restoration period significantly. Figure 7. Keeping track of changes when one site is taken-out-service Figure 8. Resynchronization of mirrored CSV during site restoration
  • 8. Restoration Following an Unplanned Site-Wide Outage Should a more catastrophic event occur where the equipment at one sitecannotbegracefullytakenoutofservice,asimilar,butalengthier restoration process occurs. There are two levels of severity. First let’s consider the case when the servers and storage virtualization nodes crashed, but the storage devices were not damaged. When the storage virtualization nodes are rebooted, the DataCore software takes inventory of its last known state, particularly of cached disk blocks that were not captured on the disks. The alternate site must then update all those “dirty” blocks along with all blocks known to have changed during the outage before the two sites can fully resynchronize. In a more severe case, some or all of the physical disks at one site may be damaged and must be completely reinitialized using the latest image from the alternate hot site. Clearly, this will add to the time it takes to restore normal operations. However, in both scenarios, the urgency to do so has been reduced because the alternate hot site is continuing to service applications without any major disruption. And in neither case are any special host scripts required to bring back the original conditions. Bottom line- No storage-related downtime IT organizations derive numerous operational benefits from combining Windows Failover Cluster capabilities with DataCore’s device-independent approach to storage virtualization, particularly when the cluster is split between metropolitan sites. The most far reaching is the elimination of single points of failure and disruption that today is responsible for storage-related downtime. Figure 9. Resume Normal Operations ©2010 DataCore Software Corporation All rights reserved. DataCore, the DataCore logo, SANsymphony, and SANmelody are trademarks or registered trademarks of DataCore Software Corporation. All other products, services and company names mentioned herein may be trademarks of their respective owners. For more information on storage virtualization, please visit: www.datacore.com