Professional Association for SQL Server




SQL Server 2012
High Availability and DR
Joey D’Antoni
2200 GMT
Thank You to our Sponsors
About Me

•   @jdanton on Twitter
•   Principal Architect SQL Server, Comcast Cable
•   Joedantoni.wordpress.com
•   Videos and Blogs at SSWUG.org
•   Vice President of the Philadelphia SQL Server User
    Group
    – SQL Saturday #121 Philadelphia—June 9th
Agenda

• SQL Server 2008 to 2012—What’s Changed in HA and
  DR
• Geo-Clustering
• All about Availability Groups
Learning Objectives

•   SQL Server HA and DR
•   What’s involved in SQL Clustering
•   How clustering and Availability Groups work
•   What’s new in 2012 HA/DR
Licensing (What’s New)

• The Availability Group features will require the Enterprise
  Edition of SQL Server
• The licensing model for SQL Enterprise Edition has
  changed. Consult your friendly Microsoft sales
  representative for more details
• AlwaysOn read-only replicas will need to be licensed
Windows Core Support

• No GUI version of Windows
• Allows for fewer patches
• Uses PowerShell and MMCs for support
Windows Core
High Availability (HA) and Disaster
Recovery (DR) Options in SQL 2008

 •   Backup and Recovery
 •   Failover Cluster Instances (FCI)
 •   Mirroring
 •   Log Shipping
 •   Replication
 •   SAN Replication*
 •   Virtualization*
High Availability (HA) and Disaster
Recovery (DR) Options in SQL Server 2012

• Backup and Recovery
• Failover Cluster Instances (FCI)
• Mirroring
• Availability Groups (2012)
•   Log Shipping
•   Replication
•   SAN Replication*
•   Virtualization*
What’s new in SQL Server 2012 HA/DR

•   AlwaysOn Availability Groups
•   SMB Support for Failover Cluster Instances
•   Multi-subnet clustering is supported
•   Flexible Failover
SQL Server Failover Clustering
Architecture
SQL Failover Clustering in 2008

• SQL Clustering required 1 subnet to be used across the
  whole cluster
• Cluster failover is controlled by isAlive/looksAlive
  processes, which check the SQL service and run
  @@servername
SQL Failover Clustering in 2012

•   Full support for geo-distributed clusters
•   SMB Storage (File Shares) Supported for FCI
•   Flexible failover model based on sp_server_diagnostics
•   TempDB on Non-shared Disk Resource
    – Makes PCI-based Solid State Drive an option
Quorum



 It’s not just bad cologne
 anymore
Quorum
                     Are you
                     there?




         Why Yes I
         am here
Understanding Quorum

• There are a several slides on this topic—it is critical!
  – In a nutshell, you cluster has to be able to talk to itself to keep the
    cluster service up in running
  – This applies to both SQL Server Failover Cluster Instances and
    AlwaysOn Availability Groups
Quorum

• Quorum is critical—contains master copy of the cluster’s
  configuration
• Serves as a tiebreaker if network communications
  between cluster nodes fail
• If Quorum fails—cluster is shut down until it’s restored
Quorum Models

•   Node and Disk Majority (Default)
•   Node Majority
•   No Majority (Quorum Disk Only)
•   Node and File Share Majority (Good for Geo Clusters)
Quorum Failure Tolerance



 Number of Nodes                          2     3     4     5     6         7
 Node Majority                            0     1     1     2     2         3
 Node and Disk/File Share Majority        1     2     2     3     3         4

  • Assuming Disk is Up Calculation is: Cluster Up = RoundUp(Total # of
    Nodes/2)
  • Assuming Disk is Down Calculation is: ClusterUp = RoundUp (Total # of
    Nodes/2)-1
Why Do Clusters Failover?

 • Initiated by failures
   in hardware or
   software

 • Checked by
   isAlive/LooksAlive
   processes (in
   2008R2 and below)
Flexible Failover—New for 2012

• Replaces looksAlive/isAlive functionality in SQL Clusters
  (and is used for Availability Groups)
• Now runs sp_server_diagnostics
  – Accepts two parameter
    • HealthCheckTimeout (Default 60 sec/Minimum 15 sec)
    • Failover Condition Level
Flexible Failover Policies for
Clusters

 Level         Condition             Description

               No automatic          •       Indicates that no failover or restart will be
 0
               failover or restart           triggered automatically on any failure conditions.
               Failover or restart
 1                                   •       SQL Server service is down.
               on server down
                                     •       SQL Server instance is not responsive (Resource
               Failover or restart
                                             DLL cannot receive data from
 2             on server
                                             sp_server_diagnostics within the
               unresponsive
                                             HealthCheckTimeout settings).
               Failover or restart
                                   •         System stored procedure sp_server_diagnostics
 3 (Default)   on critical server
                                             returns ‘system error’. (Critical errors > 20)
               errors
               Failover or restart
                                   •         System stored procedure sp_server_diagnostics
 4             on moderate server
                                             returns ‘resource error’. (Moderate errors > 17)
               errors
               Failover or restart
                                     •       System stored procedure sp_server_diagnostics
 5             on any qualified
                                             returns ‘query_processing error’. (Deadlock)
               failure conditions
What is Stretch Clustering

• Also known as Geo-Clustering
Geo Cluster
Geo-Distributed Clustering

• Requires SAN replication ($$$$)
• Two of everything
• Requires really fast network connection
• Requires some trickery at the network/DNS level for
  connectivity
• Witness Disk (Quorum)
    – Can be physical (SAN) disk, or cluster file share
Geo-distributed Failover Clustering

• Was available in SQL 2008, but easier to implement in
  2012
• Won’t be used by most organizations due to cost and
  complexity
Review—DR Options in SQL 2008

• Mirroring
  – Allowed automatic failover, but only one target
  – Mirror target is unreadable
• Log Shipping
  – Allowed multiple targets, but failover a manual process, requiring a
    connection string change
• Replication
AlwaysOn Availability Groups
AlwaysOn Requirements

•   Windows Enterprise (Clustering is a requirement)
•   SQL Server Enterprise Edition
•   Windows Cluster
•   No shared storage is required
•   Quorum Disk (File Share if multi-site or local storage)
AlwaysOn Architecture
Flexible AG Failover

• Similar to how a failover clustered instance fails over
• Connects to instance every 30 seconds to perform health
  check
• Also, similar quorum model to Windows Failover
  Clustering
Allows for SAN-Less HA/DR

• This is not a huge thing for SQL Server in larger
  organizations, but big win for medium sized businesses
• Allows much easier native SQL DR in Virtual
  Environments
Considerations for Availability Groups

• All SQL servers (including the secondary in the
  DR site) in the same Windows domain
• All the databases must be in FULL recovery
  model
• The unit of failover (for local HA, as well as DR)
  is at the AG level, i.e., group of databases – not
  the instance
Failover Scenarios



                                        Synchronous-       Synchronous-
                        Asynchronous-   commit mode with   commit mode with
                        commit mode     manual-failover    automatic-failover
                                        mode               mode

   Automatic failover   No              No                 Yes

   Manual failover      No              Yes                Yes

   Forced failover      Yes             Yes                No
Read Only Replicas

• Can have up to 3
• SQL Client 2012 will allow for this routing specifically
• Can take backups from read-only copies*
  – Copy Only Backups (only full copy, does not affect primary log)
• Indexing must be same on replicas
• Bad queries can affect status of replica
Client Connections in This Model

• Availability Group Listener
  – Works just like a failover clustering instance (single
    instance, single IP)
  – Creates a VCO (AD Virtual Computer Object)—similar to a cluster
    virtual object


• Read-only Connections
  – Requires 2012 native ODBC client
Client Connections
Client Connections

• Always specify Multi-Subnet Failover=True in listener
  connection
• From Books Online

   “will significantly reduce failover time
  for single and multi-subnet AlwaysOn
  topologies.”

• SQL Server Failover Cluster Instances as well
Turning On AlwaysOn
Availability Groups Demo
Summary

•   Lots of Change in the HA/DR Space
•   Licensing also changes—talk to your MS rep
•   SQL Server Failover Clusters still a good HA option
•   AlwaysOn Availability Groups add a lot more flexibility to
    DR
Contact Info

• Twitter: @jdanton
• jdanton1@yahoo.com
• Blog: joedantoni.wordpress.com
Thank You to our Sponsors

Sql server 2012 ha dr 24_hop_final

  • 1.
    Professional Association forSQL Server SQL Server 2012 High Availability and DR Joey D’Antoni 2200 GMT
  • 2.
    Thank You toour Sponsors
  • 3.
    About Me • @jdanton on Twitter • Principal Architect SQL Server, Comcast Cable • Joedantoni.wordpress.com • Videos and Blogs at SSWUG.org • Vice President of the Philadelphia SQL Server User Group – SQL Saturday #121 Philadelphia—June 9th
  • 4.
    Agenda • SQL Server2008 to 2012—What’s Changed in HA and DR • Geo-Clustering • All about Availability Groups
  • 5.
    Learning Objectives • SQL Server HA and DR • What’s involved in SQL Clustering • How clustering and Availability Groups work • What’s new in 2012 HA/DR
  • 6.
    Licensing (What’s New) •The Availability Group features will require the Enterprise Edition of SQL Server • The licensing model for SQL Enterprise Edition has changed. Consult your friendly Microsoft sales representative for more details • AlwaysOn read-only replicas will need to be licensed
  • 7.
    Windows Core Support •No GUI version of Windows • Allows for fewer patches • Uses PowerShell and MMCs for support
  • 8.
  • 9.
    High Availability (HA)and Disaster Recovery (DR) Options in SQL 2008 • Backup and Recovery • Failover Cluster Instances (FCI) • Mirroring • Log Shipping • Replication • SAN Replication* • Virtualization*
  • 10.
    High Availability (HA)and Disaster Recovery (DR) Options in SQL Server 2012 • Backup and Recovery • Failover Cluster Instances (FCI) • Mirroring • Availability Groups (2012) • Log Shipping • Replication • SAN Replication* • Virtualization*
  • 11.
    What’s new inSQL Server 2012 HA/DR • AlwaysOn Availability Groups • SMB Support for Failover Cluster Instances • Multi-subnet clustering is supported • Flexible Failover
  • 12.
    SQL Server FailoverClustering Architecture
  • 13.
    SQL Failover Clusteringin 2008 • SQL Clustering required 1 subnet to be used across the whole cluster • Cluster failover is controlled by isAlive/looksAlive processes, which check the SQL service and run @@servername
  • 14.
    SQL Failover Clusteringin 2012 • Full support for geo-distributed clusters • SMB Storage (File Shares) Supported for FCI • Flexible failover model based on sp_server_diagnostics • TempDB on Non-shared Disk Resource – Makes PCI-based Solid State Drive an option
  • 15.
    Quorum It’s notjust bad cologne anymore
  • 16.
    Quorum Are you there? Why Yes I am here
  • 17.
    Understanding Quorum • Thereare a several slides on this topic—it is critical! – In a nutshell, you cluster has to be able to talk to itself to keep the cluster service up in running – This applies to both SQL Server Failover Cluster Instances and AlwaysOn Availability Groups
  • 18.
    Quorum • Quorum iscritical—contains master copy of the cluster’s configuration • Serves as a tiebreaker if network communications between cluster nodes fail • If Quorum fails—cluster is shut down until it’s restored
  • 19.
    Quorum Models • Node and Disk Majority (Default) • Node Majority • No Majority (Quorum Disk Only) • Node and File Share Majority (Good for Geo Clusters)
  • 20.
    Quorum Failure Tolerance Number of Nodes 2 3 4 5 6 7 Node Majority 0 1 1 2 2 3 Node and Disk/File Share Majority 1 2 2 3 3 4 • Assuming Disk is Up Calculation is: Cluster Up = RoundUp(Total # of Nodes/2) • Assuming Disk is Down Calculation is: ClusterUp = RoundUp (Total # of Nodes/2)-1
  • 21.
    Why Do ClustersFailover? • Initiated by failures in hardware or software • Checked by isAlive/LooksAlive processes (in 2008R2 and below)
  • 22.
    Flexible Failover—New for2012 • Replaces looksAlive/isAlive functionality in SQL Clusters (and is used for Availability Groups) • Now runs sp_server_diagnostics – Accepts two parameter • HealthCheckTimeout (Default 60 sec/Minimum 15 sec) • Failover Condition Level
  • 23.
    Flexible Failover Policiesfor Clusters Level Condition Description No automatic • Indicates that no failover or restart will be 0 failover or restart triggered automatically on any failure conditions. Failover or restart 1 • SQL Server service is down. on server down • SQL Server instance is not responsive (Resource Failover or restart DLL cannot receive data from 2 on server sp_server_diagnostics within the unresponsive HealthCheckTimeout settings). Failover or restart • System stored procedure sp_server_diagnostics 3 (Default) on critical server returns ‘system error’. (Critical errors > 20) errors Failover or restart • System stored procedure sp_server_diagnostics 4 on moderate server returns ‘resource error’. (Moderate errors > 17) errors Failover or restart • System stored procedure sp_server_diagnostics 5 on any qualified returns ‘query_processing error’. (Deadlock) failure conditions
  • 24.
    What is StretchClustering • Also known as Geo-Clustering
  • 25.
  • 26.
    Geo-Distributed Clustering • RequiresSAN replication ($$$$) • Two of everything • Requires really fast network connection • Requires some trickery at the network/DNS level for connectivity • Witness Disk (Quorum) – Can be physical (SAN) disk, or cluster file share
  • 27.
    Geo-distributed Failover Clustering •Was available in SQL 2008, but easier to implement in 2012 • Won’t be used by most organizations due to cost and complexity
  • 28.
    Review—DR Options inSQL 2008 • Mirroring – Allowed automatic failover, but only one target – Mirror target is unreadable • Log Shipping – Allowed multiple targets, but failover a manual process, requiring a connection string change • Replication
  • 29.
  • 30.
    AlwaysOn Requirements • Windows Enterprise (Clustering is a requirement) • SQL Server Enterprise Edition • Windows Cluster • No shared storage is required • Quorum Disk (File Share if multi-site or local storage)
  • 31.
  • 32.
    Flexible AG Failover •Similar to how a failover clustered instance fails over • Connects to instance every 30 seconds to perform health check • Also, similar quorum model to Windows Failover Clustering
  • 33.
    Allows for SAN-LessHA/DR • This is not a huge thing for SQL Server in larger organizations, but big win for medium sized businesses • Allows much easier native SQL DR in Virtual Environments
  • 34.
    Considerations for AvailabilityGroups • All SQL servers (including the secondary in the DR site) in the same Windows domain • All the databases must be in FULL recovery model • The unit of failover (for local HA, as well as DR) is at the AG level, i.e., group of databases – not the instance
  • 35.
    Failover Scenarios Synchronous- Synchronous- Asynchronous- commit mode with commit mode with commit mode manual-failover automatic-failover mode mode Automatic failover No No Yes Manual failover No Yes Yes Forced failover Yes Yes No
  • 36.
    Read Only Replicas •Can have up to 3 • SQL Client 2012 will allow for this routing specifically • Can take backups from read-only copies* – Copy Only Backups (only full copy, does not affect primary log) • Indexing must be same on replicas • Bad queries can affect status of replica
  • 37.
    Client Connections inThis Model • Availability Group Listener – Works just like a failover clustering instance (single instance, single IP) – Creates a VCO (AD Virtual Computer Object)—similar to a cluster virtual object • Read-only Connections – Requires 2012 native ODBC client
  • 38.
  • 39.
    Client Connections • Alwaysspecify Multi-Subnet Failover=True in listener connection • From Books Online “will significantly reduce failover time for single and multi-subnet AlwaysOn topologies.” • SQL Server Failover Cluster Instances as well
  • 40.
  • 41.
  • 42.
    Summary • Lots of Change in the HA/DR Space • Licensing also changes—talk to your MS rep • SQL Server Failover Clusters still a good HA option • AlwaysOn Availability Groups add a lot more flexibility to DR
  • 43.
    Contact Info • Twitter:@jdanton • jdanton1@yahoo.com • Blog: joedantoni.wordpress.com
  • 44.
    Thank You toour Sponsors

Editor's Notes

  • #6 ELS: Change order here to match previous slide better, and follow order of slides later on (I moved them):SQL Server HA and DR What’s new in 2012 HA/DRWhat’s involved in SQL ClusteringHow clustering and Availability Groups work
  • #7 ELS: I think I would put the last bullet about mirroring on the next slide. To me, nothing changes about licensing for mirroring (right?), and it’s still available in Standard and Enterprise, right? If so, then I would classify it as a functionality “change” rather than licensingMirroring as a technology will be going away in a future version of SQL—so if you would like to have automatic DR, Standard edition will not be an option.
  • #8 The reason why I have this in my HA/DR presentation is that Core will reduce the amount of patches that need to be applied to your servers. Without IE, and many other attack vectors, Microsoft expects the patches needed to be reduced by about 50%.
  • #10 SQL Server clustering is the most obvious high availability solution that everyone knows about. However, mirroring between two SQL Servers (with a witness server) can also provide a level a both h/a and D/R. The other two options are a little bit more controversial and more complicated to setup. Both peer to peer replication and SQL Log Shipping can provide some measure of H/A, but there are caveats to this, and some data loss is possible. This is a little outside of the scope of this preso, so if you would like to know more detail around these topics, I highly recommend Paul Randal’s white paper on SQL HA and DR options. I’ll provide a link at the end of this presentation.ELS: This slide has High Availability spelled out, the next has HA. Make them consistent, either
  • #11 DR Options—yes backup and recovery is your first line of defense in the event of a disaster. You should have extensive monitoring and notification around your backup process, and take regular transaction log backups, if you need point in time recovery.Mirroring is probably the best high availability option. With a witness server (a server that sits in between the two mirrors) you get automatic failover in the event of the failure of your primary instance goes down. Most applications that use Microsoft connections to your database can support mirroring. The only negative, is that unless you have enterprise edition, you are limited to synchronous mirroring, which can have a performance impact on your primary. Enterprise edition brings in asynchronous mirroring, which allows for greater flexibility and distance between sites with no performance impact.Log shipping and Replication—both of these will require manual intervention in the event of a failure. However, they are very mature technologies and can work over great distances. This is not a DR scenario, but I have an application which replicates from the US to Switzerland over a nominal network connection, running on SQL 2000, and I haven’t had to touch it in two years. (Knocks on wood).Lastly SAN replication—this is really cool technology, and can enable the concept of geo-distributed clusters (also covered in Paul’s white paper). This is pretty far out of scope for today’s presentation, but I’ll say this—while really cool, it’s really complex to setup, and really expensive. You need additional software from your SAN vendor, which is always pretty pricey, and the additional network bandwidth to transfer bits in real time over the network. When I was at Wyeth, we did this between Philadelphia and Pearl River NY for the SAP system that ran the business. But the cost made it prohibitive to do much else. Also, when it goes wrong, it can be ugly.
  • #13 ELS Maybe change “Traditional” to be 2008, and note that it’s still an option in 2012
  • #14 ELS: Change title to be like next one (Clustering in 2008)
  • #25 Insert picture here
  • #27 Mention DNS Time To Life value for cluster DNS name, this applies to both Ags and SQL FCI.
  • #36 The amount of time that the database will be unavailable during a failover depends on the type of failover and its cause. For more information, see Estimate the Interruption of Service During Failover of an Availability Group (SQL Server). ImportantTo support client connections after failover, except for contained databases, logins and jobs defined on any of the former primary databases must be manually recreated on the new primary database. For more information, see Management of Logins and Jobs for the Databases of an Availability Group (SQL Server).
  • #37 ELS: I moved this slide and the next one DOWN (moved Failover Modes and Failover Scenarios up)