1. SQL Server 2012
High Availability
and DR
Northern Virginia SQL Users
Group
23-January-2012
2. About Me
@jdanton on Twitter
Joedantoni.wordpress.com
Videos and Blogs at SSWUG.org
Vice President of the Philadelphia SQL
Server UG
3. Agenda
Licensing Changes
SQL Server 2008 to 2012—What’s
Changed in HA and DR
Geo-Clustering
All about Availability Groups
4. Learning Objectives
SQLServer HA and DR
What involved in SQL Clustering
How it works
What’s new in 2012 HA/DR
This
presentation is geared towards
DBAs—so feel free to stop at any time with
questions
5. High Availability and DR
Options in SQL 2008
SQL Server Clustering
SQL Server Mirroring
Peer to Peer Replication
SQL Server Log Shipping*
6. Licensing (What’s changed)
The Availability Group features will require
the Enterprise Edition of SQL Server
The licensing model for SQL Enterprise
Edition has changed. Consult your friendly
Microsoft sales representative for more
details
Mirroring is listed as being deprecated
from Standard Edition. Will still be there in
2012
7. SQL Server 2012
Extended Events are used much more
heavily
Slipstream Install no longer required—SQL
will check for updates from your Windows
Update source
Can use internet Windows Update or
internal source
8. Windows Core Support
No GUI version of Windows
Allows for fewer patches
Uses PowerShell and MMCs for support
10. HA and DR Options in SQL
Server 2012
Backup and Recovery
Mirroring
Availability Groups (2012)
Log Shipping
Replication
SAN Replication*
Virtualization*
11. What’s new in SQL Server 2012
HA/DR
Multi-subnet clustering is supported
Flexible Failover
The BIG one—Always On Availability
Groups
13. Clustering--2008
SQL Clustering required 1 subnet to be
used across the whole cluster
Cluster failover is controlled by
isAlive/looksAlive processes, which check
the SQL service and run @@servername
14. Clustering 2012
Fullsupport for geo-distributed clusters
Flexible failover model
TempDB on Non-shared Disk Resource
Makes PCI-based Solid State Drive an
option
No check for this as of CTP3—instance
won’t start if TempDB drive location not
available
17. Geo-Distributed Clustering
Requires SAN replication ($$$$)
Two of everything
Requires really fast network connection
Requires some trickery at the
network/DNS level for connectivity
New Term: Witness Disk
Can be physical (SAN) disk, or cluster file
share
19. Takeaway 2012
Thisfeature was available in 2008, just
much more complicated to implement
from a network perspective
Won’t be used by 95% of organizations
20. Why Do Clusters Failover?
• Initiated by failures in
hardware or software
• Checked by
isAlive/LooksAlive
processes (in 2008R2
and below)
21. Flexible Failover
Replaces looksAlive/isAlive functionality in
SQL Clusters (and is used for Availability
Groups)
Now runs sp_server_diagnostics
Two new parameters
HealthCheckTimeout (Default 60
sec/Minimum 15 sec)
Failover Condition Level
22. Flexible Failover Policies for
Clusters
Level Condition Description
No automatic failover • Indicates that no failover or restart will be triggered
0
or restart automatically on any failure conditions.
Failover or restart on
1 • SQL Server service is down.
server down
• SQL Server instance is not responsive (Resource DLL cannot
Failover or restart on
2 receive data from sp_server_diagnostics within the
server unresponsive
HealthCheckTimeout settings).
Failover or restart on • System stored procedure sp_server_diagnostics returns
3 (Default)
critical server errors ‘system error’. (Critical errors > 20)
Failover or restart on • System stored procedure sp_server_diagnostics returns
4
moderate server errors ‘resource error’. (Moderate errors > 17)
Failover or restart on
• System stored procedure sp_server_diagnostics returns
5 any qualified failure
‘query_processing error’. (Deadlock)
conditions
24. Understanding Quorum
There are a few slides on this topic, it’s a
little confusing
In a nutshell, you cluster has to be able to
talk to itself to keep the cluster service up in
running
This applies to both SQL Server Failover
Cluster Instances and AlwaysOn Availability
Groups
25. Quorum
Quorum is critical—contains master copy
of the cluster’s configuration
Serves as a tiebreaker if network
communications between cluster nodes
fail
If Quorum fails—cluster is shut down until
it’s restored
26. Quorum Models
Node and Disk Majority (Default)
Node Majority
No Majority (Quorum Disk Only)
Node and File Share Majority (Good for
Geo Clusters)
27. Quorum Failure Tolerance
Number of Nodes 2 3 4 5 6 7
Node Majority 0 1 1 2 2 3
Node and Disk/File Share Majority 1 2 2 3 3 4
• Assuming Disk is Up Calculation is: Cluster Up = RoundUp(Total
# of Nodes/2)
• Assuming Disk is Down Calculation is: ClusterUp = RoundUp
(Total # of Nodes/2)-1
28. DR in SQL 2008
Mirroring
Allowed automatic failover, but only one
target
Mirror target is unreadable
Log Shipping
Allowed multiple targets, but failover a
manual process, requiring a connection
string change
Replication
30. AlwaysOn Requirements
Windows Enterprise (Clustering is a
requirement)
SQL Server Enterprise Edition
Windows Cluster
No shared storage is required
Quorum Disk Preferred
32. Flexible AG Failover
Similar to how a failover clustered
instance fails over
Connects to instance every 30 seconds to
perform health check
Also, similar quorum model to Windows
Failover Clustering
33. Allows for SAN Less HA/DR
Thisisn’t a huge thing for SQL Server at big
shops
It may allow us to incorporate a level of
DR into a virtual environment
34. Client Connections in This
Model
Availability
Group Listener (Yes, SQL Server
now has a listener)
Works just like a failover clustering instance
(single instance, single IP)
Creates a VCO (AD Virtual Computer
Object)
35. Contained Databases
Isolate Database from Instance
Currently only fully supported with SQL
Logins
No numbered procedures
Eases database movement
Allows for ease of migration to Azure
Not quite baked out as of RC0
36. Read Only Replicas
Can have up to 3
SQL Client 2012 will allow for this routing
specifically
Can take backups from read-only copys*
Copy Only Backups (only full copy, does
not affect primary log)
Indexingmust be same on replicas
Bad queries can affect status of replica
37. Considerations for Availability
Groups
All SQL servers (including the secondary in the
DR site) in the same Windows domain
All the databases must be in FULL recovery
model
The unit of failover (for local HA, as well as DR)
is at the AG level, i.e., group of databases –
not the instance
No delayed apply on the secondary
Removing log shipping means the regular log
backup job is removed
Need to re-establish periodic log backup
(essential for truncating the log)
38. Failover Modes
Automatic failover
Planned manual failover (without data
loss)
Forced manual failover (with possible
data loss)
39. Failover
Synchronous- Synchronous-
Asynchronous- commit mode commit mode
commit mode with manual- with automatic-
failover mode failover mode
Automatic
No No Yes
failover
Manual failover No Yes Yes
Forced failover Yes Yes No
41. Summary
Lotsof Change in the HA/DR Space
Licensing also changes—talk to your MS
rep
SQL Server Failover Clusters still a good HA
option
AlwaysOn Availability Groups add a lot
more flexibility to DR
SQL Server clustering is the most obvious high availability solution that everyone knows about. However, mirroring between two SQL Servers (with a witness server) can also provide a level a both h/a and D/R. The other two options are a little bit more controversial and more complicated to setup. Both peer to peer replication and SQL Log Shipping can provide some measure of H/A, but there are caveats to this, and some data loss is possible. This is a little outside of the scope of this preso, so if you would like to know more detail around these topics, I highly recommend Paul Randal’s white paper on SQL HA and DR options. I’ll provide a link at the end of this presentation.
Extended Events came out in SQL Server 2008, but very few people, myself included, paid much attention. Those who did found the implementation awkward and confusing. Only a few people persevered enough to discover just how powerful and amazing these things are. Which is why most anyone who wants to learn about extended events should plan on starting at one place, Jonathan Kehayias’ blog. Yeah, the Books Online help get you started, but Jonathan really makes it all take off.
DR Options—yes backup and recovery is your first line of defense in the event of a disaster. You should have extensive monitoring and notification around your backup process, and take regular transaction log backups, if you need point in time recovery.Mirroring is probably the best high availability option. With a witness server (a server that sits in between the two mirrors) you get automatic failover in the event of the failure of your primary instance goes down. Most applications that use Microsoft connections to your database can support mirroring. The only negative, is that unless you have enterprise edition, you are limited to synchronous mirroring, which can have a performance impact on your primary. Enterprise edition brings in asynchronous mirroring, which allows for greater flexibility and distance between sites with no performance impact.Log shipping and Replication—both of these will require manual intervention in the event of a failure. However, they are very mature technologies and can work over great distances. This is not a DR scenario, but I have an application which replicates from the US to Switzerland over a nominal network connection, running on SQL 2000, and I haven’t had to touch it in two years. (Knocks on wood).Lastly SAN replication—this is really cool technology, and can enable the concept of geo-distributed clusters (also covered in Paul’s white paper). This is pretty far out of scope for today’s presentation, but I’ll say this—while really cool, it’s really complex to setup, and really expensive. You need additional software from your SAN vendor, which is always pretty pricey, and the additional network bandwidth to transfer bits in real time over the network. When I was at Wyeth, we did this between Philadelphia and Pearl River NY for the SAP system that ran the business. But the cost made it prohibitive to do much else. Also, when it goes wrong, it can be ugly.
Insert picture here
A partially contained database is a contained database that allows the use of uncontained features. Partially contained databases do not allow the following actions or entities. Numbered proceduresSchema-bound objects that depend on built-in functions with collation changesBinding change resulting from collation changes, including references to objects, columns, symbols, or types.Replication, change data capture, and change tracking.Use the sys.dm_db_uncontained_entities and sys.sql_modules (Transact-SQL) view to return information about uncontained objects or features. By determining the containment status of the elements of your applications, you can discover what objects or features need to be replaced or altered for use in a fully contained database.
Automatic failoverAutomatic failover is supported only when the current primary and one secondary replica are both configured with failover mode set to AUTOMATIC and the secondary replica currently synchronized. If the failover mode of either the primary or secondary replica is MANUAL, automatic failover cannot occur. Occurs only between a primary replica and a secondary replica that are configured for synchronous-commit mode and automatic failover mode when the secondary replica is in the SYNCHRONIZED state.Planned manual failover (without data loss)Planned manual failover, or manual failover, is useful for administrative purposes. It is supported only if both the primary replica and secondary replica are configured for synchronous-commit mode and the secondary replica is currently synchronized (in the SYNCHRONIZED state). A database administrator manually initiates a manual failover.Forced manual failover (with possible data loss)Intended only for disaster recovery, forced manual failover, or forced failover, is supported only when the synchronization health of the target availability replica either NOT_SYNCHRONIZING or SYNCHRONIZING. This is the only form of failover supported by in asynchronous-commit availability mode.Automatic failover setExists only when a pair of availability replicas (including the current primary replica) are configured for synchronous-commit mode with automatic failover, if any. An automatic failover set takes effect only if the secondary replica is currently SYNCHRONIZED with the primary replica. Synchronous-commit failover setExists only when a set of two or three availability replicas (including the current primary replica) are configured for synchronous-commit mode. A synchronous-commit failover set takes effect only if the secondary replicas are configured for manual failover mode and at least one secondary replica is currently SYNCHRONIZED with the primary replica. Entire failover setWithin a given availability group, the set of all availability replicas whose operational state is currently ONLINE, regardless of availability mode and of failover mode. The entire failover set becomes relevant when no secondary replica is currently SYNCHRONIZED with the primary replica.
The amount of time that the database will be unavailable during a failover depends on the type of failover and its cause. For more information, see Estimate the Interruption of Service During Failover of an Availability Group (SQL Server). ImportantTo support client connections after failover, except for contained databases, logins and jobs defined on any of the former primary databases must be manually recreated on the new primary database. For more information, see Management of Logins and Jobs for the Databases of an Availability Group (SQL Server).