Microsoft SQL Server Clustering vs. VMware HA


Published on

Pros and cons of using Microsoft SQL Clustering for High Availability vs. VMware HA with Symmantec HA.

Personally I prefer clustering.

Published in: Education, Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Microsoft SQL Server Clustering vs. VMware HA

  1. 1. High Availability Microsoft SQL Server Database ArchitectureVM HA and Symantec Application Availability vs. Microsoft Clustering February 2012
  2. 2. The ProblemDifference of opinion building a “High Availability” database environment. Infrastructure Team prefers: VMware High Availability & Symantec ApplicationHA Architecture Team prefers: Microsoft Failover Clustering
  3. 3. Factors driving the difference of opinion Infrastructure Team Architecture Team• Prefer VM HA / ApplicationHA because, out of the box, it • Prefer MS Clustering because it is well integrated at provides high availability without the cost or complexity application level and industry best practice of traditional clustering solutions • Unfamiliar with VMware HA and Symantec• Unfamiliar with MS Clustering Services ApplicationHA• Restricts use of VMotion dynamic scaling. Moving • Concerned that ApplicationHA will not recognize all Clustered Applications between Blades will require Guest circumstances that cause application unavailability OS Downtime • Undefined scripting effort required for application• Clustering adds complexity to backup procedures monitoring with VM HA and continuing M&O will be required to support scripts • Concerned that VM HA and present M&O support will not deliver required solution availability during hours of operation
  4. 4. HA Drivers (subset) AvailabilityClinical Application Business Days Hours of Use RequirementsAppeals Tracking 7 Days 0700 - 1900 99.999Document Management System 7 Days 0600 - 1800 99.999SharePoint 7 Days 0700- 1900 99.99Clinical Operations Review System 5 Days 0800 - 1700 99.999Dental Imaging Clustering Mandated by VendorDictation and Transcription 7 Days 24 Hours 99.99Digital Signature 7 Days 24 Hours 99.99Information Portal 7 Days 24 Hours 99.99Radiology Information System Clustering Mandated by Vendor Business days, hours of use, and availability requirements were obtained from available business requirements documents and verbally from user leadership.
  5. 5. Microsoft Clustering Pros • Supports application level awareness • Will survive a single node OS system crash • Redundant Node in the event of a SQL Node failure • Minimizes downtime • Permits an automatic response to a failed server or software (no human intervention) • Supports upgrades without forcing users off the system for extended periods of time • Applications connected to SQL remain available while maintenance/patching is performed on the redundant Node • Doesn’t require any servers to be renamed - when failover occurs, it is transparent to end-users • Faster recovery during HA events i.e.. Node BSOD, SQL connection or authentication failures • Failing back is quick, and can be done once the primary server if fixed and put back on- line • Is a Microsoft supported solution • Works without snapshots
  6. 6. Microsoft Clustering Cons • Additional Cost to deploy and maintain the redundant Nodes • Potential added environment cost for active/passive implementations • Decreased use of VM functionality (no VMotion…) • Added implementation and management complexity • Requires more experienced DBAs and network administrators • Complexity added to SQL and VMware environment • Any HA event requires server admin and or DBA interaction, anywhere from Node reboot to rebuild [not self healing] • In a situation where both Nodes have failed recovery time may be greatly increased due to the added complexity • No Snapshot or Full Virtual Machine backup option available – a Node or Cluster loss could require a rebuild (RTO=days not hours) – This is a wash, backup / recovery options exactly the same for HA vs. Clustering due to SQL not supporting snapshots • VMware Host patching/maintenance would have to be done after hours and would require DBA participation - Would potentially require a DBA, would NOT require after hours (failover can be forced) • VMware Functionality is reduced for all Clustered SQL Nodes i.e.. Snapshot, vMotion, DRS, Storage DRS, Storage vMotion – Snapshotting not supported
  7. 7. VM HA + ApplicationHA Pros • Eliminates the need for dedicated standby hardware and the installation of additional software • Less infrastructure implementation effort • Supports full range of VM functionality (leads to maximized resource utilization) • Reduced implementation and management complexity • Application agnostic • Reduced Cost due to the fact that no redundant Node is necessary for HA • Reduced Complexity for SQL and VMWare environments – This is not accurate, if you add in the Symantec ApplicationHA, at best this is a wash, at worst you’ve create a new development M&O project which is infinitely more complex than additional hardware. • In a situation where the SQL Server has failed entirely recovery time is much shorter since we will leverage a complete Virtual Machine recovery option through Symantec NetBackup (RTO=minutes or hours) – See note before, this is a wash, bare metal recovery will be required in either situation since snapshots aren’t supported. • VMWare Host patching/maintenance could be accomplished without after hour maintenance windows or DBA participation • In many HA events i.e.. SQL connection or authentication failures, Application HA can take action against individual Windows and SQL components eliminated reboot as the only option for resolution [self healing] – This concept of self healing vs. not self healing is a red herring, if the server dies and anything except a reboot is required, neither setup is “self healing” • Full VMWare Functionality can be realized for the SQL Servers i.e.. Snapshot, vMotion, DRS, Storage DRS, Storage vMotion - Again, snapshots not supported
  8. 8. VM HA + ApplicationHA Cons • Added application dev implementation effort to support application awareness, and continuing M&O (additional coverage required) • Added complexity – multiple components of HA solution • OS crash will result in down time and requires human intervention • If VMHA fails to recognize system crash, human intervention is required • Added application dev implementation effort to support application awareness, and continuing M&O (additional coverage required) • Requires snapshotting – (Snapshotting of SQL and SharePoint is not supported by Microsoft due to data corruption issues) • Some HA events may require the Server to be restarted which could take approximately 30-60 seconds i.e.. BSOD, SQL connection or authentication failures that Application HA was not able to resolve • Applications connected to SQL are not available while maintenance/patching is performed on the SQL Server during scheduled maintenance windows. If something happens to the server during patching, full recovery must be executed before service availability returns. • To adhere to VMware recommended best practices to achieve true HA, a hot standby database server with SQL Server running and replication established between the two databases, must be running. In the event of a failure, an application developer must manually redirect the application. This is added DBA complexity and added reliance on AppDev
  9. 9. Business Sets Availability Requirements! Availability Downtime Downtime 90% (1-nine) 36.5 days/year 99% (2-nines) 3.65 days/year 99.9% (3-nines) 8.76 hours/year 10 minutes/week 99.99% (4-nines) 52 minutes/year 1 minute/week 99.999% (5-nines) 5 minutes/year 6 seconds/week 99.9999% (6-nines) 31 seconds/year !Need to determine if availability is measured:1) During operational time (i.e. expected use) which does not included schedule maintenance windows2) On a 24 hr basis which includes non-operational time