KoprowskiT_MaidenheadUG-High_Availability_of_SQL_in_the_context_of_SLA
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

KoprowskiT_MaidenheadUG-High_Availability_of_SQL_in_the_context_of_SLA

on

  • 479 views

If SQL Server is heart of our environment, his health should be very important, right? If SQL Server is important, his availability for our businesses (internal and external) is important to. For our ...

If SQL Server is heart of our environment, his health should be very important, right? If SQL Server is important, his availability for our businesses (internal and external) is important to. For our customers doesn't matter where data are stored, how are stored and what we do with those data. Especially for our managers. The data must be available on demand, on time, at he moment of request. High Availability is our responsibility. How we can prepare our environment for HA? How HA is connected for with SLA? And why Service Level Agreement are important for us? In this session I want to discuss about HA options for SQL Server (2008, 2012), about our different customers, and about Service Level Agreement (formal or not).

Statistics

Views

Total Views
479
Views on SlideShare
479
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

KoprowskiT_MaidenheadUG-High_Availability_of_SQL_in_the_context_of_SLA Presentation Transcript

  • 1. HIGH AVAILABILITY OF SQL SERVERIN THE CONTEXT OF SLATobiasz Janusz Koprowski
  • 2. SELECT {BIO}• Polish SQL Server User Group Leader• Microsoft Certified Trainer• MCP, MCSA, MLSS, MLSBS, MCTS, MCITP, MCT• SQL Server MVP from 2010• Friends of RedGate PLUS• PASS SQL Azure Virtual Chapter Co-Founder• Blogger, Influencer, Technical Writer• Last 7 years (living) in Data Center in Wrocław• Generally about 12 years in IT/banking area• GITCA Technical Lead & Vice-Chair EMEA Board• Speaker at SQL Server Community Launch, Time for SharePoint, CodeCamps, SharePoint Community Launch, CISSP Day, InfoTRAMS, SQLSaturday, SQLBits, CarreerCon,• Autor of few articles on TechNet (PL) and WSS.pl portal• Deep Dives Co-Author: High availability of SQL Server in the context of Service Level Agreements (Chapter 18th)• Working for MS Subject Matter Expert and MS Terminology community (Windows 7, 8 & Visualstudio 2010,2011
  • 3. Agenda• Back to the school: − What is High Availability − What is Service Level Agreement• Using HA in SQL Server 2008• HA solutions in SQL Server 2008 that means: Enterprise, Enterprise• Why SLA and DBA• Dependency of SLA and HA• Case Studies• Q&A
  • 4. What is High Availability?• High Availability (HA) to ensure the continued operation of equipment and systems for the purposes of (usually) in an enterprise production environment.• Is designed to prevent data loss as a result of: − software bugs, − manufacturing defects − hardware failure − natural disasters − human error − other unforeseen events
  • 5. What is Disaster?
  • 6. What is Disaster?
  • 7. Two kinds of monster:PSO > USO > SLA• PSO Planned System Outages – Planned System Unavailability − Minimum planned unavailability, due to the need to carry out modernization work, installing patches, replacement / extension of hardware, − Agreed/accepted by/with the client and not affecting the provisions of the HA, and SLA, until• ...USO Unplaned System Outages – Unplanned System Unavailability − an error that prevents a partial or total work environment in a tangible, measurable customer − resulting in high costs if you need repairs, as well as penalty payments for non-SLA
  • 8. Performance metrics (HA)• What it really is the availability of the order of 99.99%?• Availability 99.99% to 0.01UNAVAILABILITY in a requested period (eg annual), which ...• How much is that in terms of the unavailability of the server / environment / database: Availability = MTBF / MTBF + MTTR − MTBF -> Mean Time Between Failures − MTTR -> Mean Time To Repair
  • 9. Unavailability in minutes, hours, days, weeks... Downtime Downtime Downtime Availability % per year per month* per week90% 36.5 days 72 hours 16.8 hours95% 18.25 days 36 hours 8.4 hours98% 7.30 days 14.4 hours 3.36 hours99% 3.65 days 7.20 hours 1.68 hours99.5% 1.83 days 3.60 hours 50.4 min99.8% 17.52 hours 86.23 min 20.16 min99.9% ("three nines") 8.76 hours 43.2 min 10.1 min99.95% 4.38 hours 21.56 min 5.04 min99.99% ("four nines") 52.6 min 4.32 min 1.01 min99.999% ("five nines") 5.26 min 25.9 s 6.05 s99.9999% ("six nines") 31.5 s 2.59 s 0.605 s
  • 10. What is SLA?• SLA - Service Level Agreement.• The origins date back to 1980 and the agreements between operators and end customers.• Mutually negotiable contract for the provision of services (not just IT, but these in particular)• It must be concluded formally, though legally permissible is an informal agreement• Including the level and range of services provided by means of measurable indicators (level of accessibility, usability, performance)• The contract should have specified minimum and maximum range for each subject to its services
  • 11. Metrics of SLAThere is no specific SLA measurement WITHOUT indicators!SAMPLE CALL CENTER / SERVICE DESK:• ABA (Abandonment Rate): Percentage of calls abandoned while waiting for a response.• ASA (Average Speed ​to Answer): Average time (usually in seconds) required for the connection of boards help.• TSF (Time Service Factor): Percentage of calls answered in precise time frame, such as 80% in 20 seconds.• FCR (First Call Resolution): Percentage of calls where the problem was solved without having to switch to another expert• TAT (Turn Around Time): The time it takes to complete certain tasks.
  • 12. High Availability in SQL Server 2008Microsoft SQL Server 2008/2008R2/2012:• Database Mirroring• Database Snapshots• Windows Clustering• SQL Server Replication• Hot-add memory and CPU• Online Index Operations• Table and Index Partitioning• Failover Clustering• Peer-To-Peer Replication• Always On
  • 13. Solutions for HA for SQL Server DATABASE FAILOVER TRANSACTIONAL AREA LOG SHIPPING MIRRORING CLUSTERING REPLICATION some data lossData Loss no data loss no data loss some data loss possible possibleAutomatic Failover YES (in HA mode) YES no no YES, connect to sameTransparent To Client YES, autodirect IP no, NLB helps no, NLB helps 20 seconds or more + seconds plus time toDowntime < 3 seconds time to recovery seconds recoveryStandby Ready Access Yes, with db snapshots no data loss YESData Granularity DB only all systems and dbs table or view DB onlyMasking of hdd failure YES No, shared disk YES YES NO, duplicate NO, duplicate NO, duplicateSpecial hardware recommended Cluster HCL recommended recommendedComplexity Some More More More
  • 14. Why High Availability? High Availability MSFT SLIDE• Businesses need to work around the clock to meet customer demands• When systems are not running, businesses are losing revenue, opportunities, customers and reputation• High availability reduces the impact of required maintenance on day-to-day operations and helps recover quickly from disasters• Businesses need flexibility to easily build high availability solutions that meet business and technology needs Online operations Multiple instance clustering Prevent Unplanned Downtime Live Migration Automatic page repair with database mirroring Reduce Planned Downtime Hot-add CPU and RAM Database snapshots Peer-to-peer replication
  • 15. Prevent Unplanned Downtime High Availability MSFT SLIDE Multiple-Instance Database Clustering Applications & Business Logic 1100101 00101 0010111 1100101 0010100 1100101 00101 1100101 • More than one passive node is available to host instances from 00101 101 00101 110010 110010 110010 multiple failovers on active nodes • Having multiple failover nodes provides greater availability • Multiple instances can share theActive Failover Offline Active Active same failover node, which reduces hardware costs • Simplified setup reduces administrative costs Because of the critical nature of the G4S application, CASON sets up the servers in a failover cluster to ensure high availability. − —CASON Case Study
  • 16. Enhanced Database Mirroring High Availability MSFT SLIDE • High Performance Mirroring • Increase performance through asynchronous mirroringApplications & • Automatic Page RepairBusiness Logic • Automatically detects page corruption and retrieves data from the mirror • Reduces downtime and management costs • Minimizes application changes to correctly handle I/O errorsPrincipal Mirror • Reporting from Mirror • Increase utilization of mirror server • Reduce need for reporting servers “This is a really powerful enhancement because prior to this… you would have to run DBCC CHECKDB... and that would likely mean taking downtime… With SQL Server 2008 Database Mirroring you can avoid the effort and downtime.” — Glenn Berry, Database Architect, NewsGator Technologies
  • 17. Help Recover From User Errors High Availability MSFT SLIDE 1100101 • Database Snapshots • Provide a read-only static view of 00101 1100101 00101 110010 the database at a point in timeApplications & • Revert to a point in time before userBusiness Logic error • Data loss is limited to changes after the snapshot • Run reports from a snapshot created Snapshot Source on the mirror server in a mirror to 1100101 00101 1100101 00101 better utilize resources 110010 1100101 00101 1100101 00101 110010 “Database snapshots allow you to create read-only databases for reporting and can also be useful in your data recovery efforts in the event of a disaster.” —Tim Chapman, SQL Server Database Administrator
  • 18. Maintain Databases Without Downtime High Availability MSFT SLIDE 1100101 00101 1100101 • Online Operations • Allow routine maintenance without 00101 110010 corresponding downtimeApplications & • Online index operationsBusiness Logic • Online page and file restoration • Online configuration of peer-to-peer nodes • Users and applications can access Table Index 0 5 Deleted 1 Deleted data while the table, key, or index is 4 Deleted 2 23 Deleted 3 being updated 74 5 05 6 3 7 We recommend performing online index operations for business environments that operate 24 hours a day, seven days a week, in which the need for concurrent user activity during index operations is vital. — SQL Server Books Online
  • 19. Minimize Planned Downtime and Increase EfficiencyHigh Availability MSFT SLIDE • Live Migration • Move running instances of VMs between host servers • Virtual machines can be moved forApplications & maintenance or to balance workloadBusiness Logic 11001010 11001010 11001010 11001010 0101 0101 0101 0101 11001010 11001010 11001010 on host servers • Perform maintenance on physical 11001010 0101 0101 0101 0101 110010 110010 110010 110010 machines without any downtime 11001010 11001010 0101 0101 11001010 11001010 0101 0101 11001010 11001010 0101 0101 11001010 11001010 0101 0101 • Requires Windows Server 2008 R2 Hyper-v 110010 110010 110010 110010 “This server already runs on our cluster solution with high availability, but after we have tested live migration on the new hardware, we’ll move it over to ensure optimal performance and reliability”
  • 20. Minimize Planned Downtime High Availability MSFT SLIDE • Hot-Add CPU and RAM • Dynamically add memory and processors to servers withoutApplications & incurring downtimeBusiness Logic • Requires hardware support for either physical or virtual hardware 110010 110010 100101 100101 110010 110010 100101 100101 110010 110010 110010 110010 100101 100101 110010 110010 100101 100101 110010 110010 Hot-add CPU is the ability to dynamically add CPUs to a running system. Adding CPUs can occur physically by adding new hardware, logically by online hardware partitioning, or virtually through a virtualization layer. —SQL Server Books Online
  • 21. Access Data Seamlessly Across Servers High Availability MSFT SLIDE • Peer-to-Peer Replication • Increases reliability by replicating data to multiple serversApplications &Business Logic • Provides higher availability in case of failure or to allow maintenance 1100101 0010110 00101 0101100 1100101 1011001 00101 01 110010 at any of the participating nodes • Offers improved performance for each node with geo-scale 110010 100101 110010 100101 architecture 110010 1100101 00101 • Add and remove servers easily 1100101 00101 110010 without taking replication offline, by using the new topology wizard “[Microsoft] SQL Server 2008 replication proved to be very predictable and reliable in our testing. This helps us to create flexible and scalable replication solutions. Reliability must be at the foundation of all that we do.” — Sergey Elchinsky, Leading System Engineer, Baltika Breweries
  • 22. Database Mirroring• Mirroring, which is a mirror image of the data − Available only for two bases (principal, mirror) − The desired function of a witness (witness)• Requirements: − principal, mirror - only SQL Server Enterprise − witness - can be SQL Server Express• Availability for the database: − copy of the database on a different physical server and / or virtual• Availability for the system: − A copy of the entire environment on a different physical server and / or virtual
  • 23. Database Mirroring Refresher Synchronous Mode KEY POINT: mirror database is an EXACT copy of the principal 1 AcknowledgeCommit 7 Acknowledge 6 Constantly 2 redoing on mirror 2 Transmit to mirror 4Write tolocal log Committed Write to 3 in log remote log 5 DB DB Log Log
  • 24. Hot-add memory and CPU• In SQL Server 2005 added the ability to use memory to be added "on the fly"• In SQL Server 2008 extends the dynamic capabilities of SQL Server work, allowing you to hot add CPU• "Hot-add" is the ability to connect the RAM / CPU to the computer while the computer is running, and then by refreshing the SQL Server to use the new equipment ONLINE• The equipment must support hot-add (of course!) − Supported only in the Enterprise Edition running on a 64-bit version of Windows Server 2008 Datacenter / Enterprise − SQL Server does not automatically start using the new processor / memory − The need to reconfigure run − Already running query will not use the newly added memory / processor.
  • 25. Hot-Add CPU: Affinity Masks• Affinity masks control which CPUs are used by SQL Server, and for what purpose• Any affinity masks will need to be updated after hot-adding new CPUs − If the affinity mask is set to non-zero, you will need to update it so that SQL Server knows it can use the new CPUs. − On systems with > 32 CPUs, you will need to set the affinity64 mask to pick up the new CPUs − If you want to use the new CPUs for IO only, you must add the relevant bits to the affinity I/O (or affinity64 I/O) mask• If questioned about affinity masks − All zeroes means that Windows decides which CPUs are used − Non-zero: single bit per CPU, if bit is 1, SQL Server will use it − bit cannot be set in affinity AND affinity I/O mask
  • 26. Fast Manual Failover• High Security mode (synchronous mirroring without witness), manual failover is always used• SQL Server 2005, if there is an emergency situation, the database on the mirror is closed and restarted to force the to recover non-commited transaction log − This can greatly increase the failover time − Consider a database with hundreds of files, which all have to be opened to start the sequence database• SQL Server 2008 removes this step, thus speeding up and reducing the use of emergency shutdown
  • 27. SEND and REDO queues Time Amount of log not Amount of log sent to mirror sent to mirror Represents possible data loss Log to redo on mirror • SEND Queue Represents failover time • Unsent log • REDO QueueMirror
  • 28. Peer-to-Peer Topology (?)• In SQL Server 2005 introduces the ability to use solution peer-to-peer (or "two-way") Transactional Replication − A great way to scale the resources needed to work − Partialy as a way to have "undue copy"• One major drawback - changing the topology of peer-to-peer required to stop ALL activity on the servers in the topology tree• In SQL Server 2008, − these restrictions have been removed (in most cases), − Setup Wizard also upgraded peer-to-peer network in SSMS − Switching partitions can be repeated
  • 29. Topology Wizard• The wizard now is graphical, with drag-n-drop functionality for making topology connections
  • 30. SQL Server 2012 & AlwaysOn | marketing• Help reduce planned and unplanned downtime with the new integrated high availability and disaster recover solution, SQL Server AlwaysOn.• Simplify deployment and management of HA requirements using integrated configuration and monitoring tools.• Improve IT cost efficiency and performance using Active Secondary.• Reduce planned downtime with Windows Server Core.
  • 31. SQL Server 2012 & AlwaysOn | technicalAlwaysOn Failover Cluster Instances As part of the SQL Server AlwaysOn offering, AlwaysOn Failover Cluster Instances leverages Windows Server Failover Clustering (WSFC) functionality to provide local high availability through redundancy at the server-instance level—a failover cluster instance (FCI). An FCI is a single instance of SQL Server that is installed across Windows Server Failover Clustering (WSFC) nodes and, possibly, across multiple subnets. On the network, an FCI appears to be an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.AlwaysOn Availability Groups AlwaysOn Availability Groups is an enterprise-level high-availability and disaster recovery solution introduced in SQL Server 2012 to enable you to maximize availability for one or more user databases. AlwaysOn Availability Groups requires that the SQL Server instances reside on Windows Server Failover Clustering (WSFC) nodes.Database mirroring Avoid using this feature in new development work, and plan to modify aplications that currently use this feature. We recommend that you use AlwaysOn Availability Groups instead. Database mirroring is a solution to increase database availability by supporting almost instantaneous failover. Database mirroring can be used to maintain a single standby database, or mirror database, for a corresponding production database that is referred to as the principal database. For more information, see Database Mirroring (SQL Server).Log shipping Like AlwaysOn Availability Groups and database mirroring, log shipping operates at the database level. You can use log shipping to maintain one or more warm standby databases (referred to as secondary databases) for a single production database that is referred to as the primary database. For more information about log shipping, see About Log Shipping (SQL Server).
  • 32. SQL Server 2012 & AlwaysOn
  • 33. SLA - what does this have to do with the DBA• Production hours: − Hours in which the partition / table / database must be available − May be different for different parts of a database, for example, depending on the application• The percentage of time the service: − The percentage of time within (time range) when the service / partition / table / database is available• Hours reserved for downtime: − These advance hours of downtime (technical break) facilitate the work of users − Methods Customer Support − The response time from the HelpDesk − DBA response time for an event
  • 34. SLA - what does this have to do with the DBA• Number of users on the system − Number of transactions processed per unit of time• Acceptable performance levels for access to the various operations − Minimum time required to replicate the different servers• Deadline for data recovery from failures − Accidental deletion of data − Damage to the database − SQL Server Crash − OS Server Crash• Time it takes to read the data on the web (eg read / write table sales) so that it was possible to continue the sale − Maximum amount of space − Maximum amount of tables / databases − Number of users in specific roles
  • 35. Why SLA is so important?• In fact, its more than just a signed agreement between the client and your boss.• It is also a contract that YOU need to meet• If its signed an agreement to zero downtime and zero data loss (abstraction?) Then you need to make sure that if corruption can fulfill this contract (change / delete data on purpose by the authorized user).• If you can not meet the SLA, the business is exposed to downtime and data loss• The end result is to submit your CV to a recruitment agency ...
  • 36. Do you think you can meet your Service Level Agreement?• You need to know what are the conditions / requirements for SLA if you meet them• As you can accomplish if you do not know that there is an SLA?• As you review the contract if you did not invite anyone to the meeting on the creation of a Service Level Agreement?• The end result is to submit your CV to a recruitment agency ...
  • 37. Do you think you can meet your SLA? • The recovery plan looks great on paper - but if ever you test it? • Suppose this situation: − We allow 15 minutes is not available for database size of 100 GB. − We are able to within the last 15 minutes substitute a copy of the user database − What will you do in case of damage to the database? − What will you do in the event of disk failure? − What will you do in case of burning the motherboard? − What do you do when cutting the cable FC? − How much time it will take to recover from a backup? − How much time it will take to bring tape with backup from a second location 25 kilometers away in the city center at 14?Do you still meet the SLA 15 minutes of downtime?
  • 38. Summary• Database mirroring• Log Shipping• Hot-add CPU• Transactional Replication• Failover clustering enhancements• Peer-to-peer replication enhancements• Clouds (Google, Azure, Amazon...)
  • 39. Summary• You need to know about the existence of SLA• You must take part in a Service Level Agreement (requirements / features / technology)• You need to have contingency plans - TESTED• You must have knowledge of their responsibilities• You must be able to meet the technical SLA
  • 40. Resources• Database mirroring − http://www.sqlskills.com/blogs/paul/2007/10/11/SQLServer2008PerformanceBoostForData baseMirroring.aspx − http://www.sqlskills.com/blogs/paul/2007/10/01/SQLServer2008NewPerformanceCounters ForDatabaseMirroring.aspx − http://www.sqlskills.com/blogs/paul/2007/09/27/SQLServer2008AutomaticPageRepairWith DatabaseMirroring.aspx• Backup compression − http://www.sqlskills.com/blogs/paul/2008/01/09/SQLServer2008BackupCompressionCPUC ost.aspx − http://www.sqlskills.com/blogs/paul/2007/09/20/SQLServer2008BackupCompression.aspx• Hot-add CPU − http://www.sqlskills.com/blogs/paul/2008/01/10/SQLServer2008HotAddCPUAndAffinityMa sks.aspx• DBCC CHECKDB − http://www.sqlskills.com/blogs/paul/CategoryView,category,CHECKDB%2BFrom%2BEvery %2BAngle.aspx• Failover clustering − http://www.microsoft.com/windowsserver2008/failover-clusters.aspx• Peer-to-peer replication − http://www.sqlskills.com/blogs/paul/2007/12/07/SQLServer2008ConfiguringPeertoPeerRep lication.aspx
  • 41. AFTER SESSION {next contact}• MAIL: KoprowskiT@windowslive.com• MSG: KoprowskiT@windowslive.com• SKYPE: tjkoprowski• TWITTER @KoprowskiT• SlideShare (post-sessions): http://www.slideshare.net/Anorak BLOGS:• ITPRO Anorak’s Vision: http://itblogs.pl/notbeautifulanymore/ [PL/EN]• Volume Licensing Specialites: http://koprowskit.eu/licensing/ [PL/EN]• My MVP Blog: http://koprowskit.eu/geek/ [PL/EN/ES]
  • 42. PLEASE RATE MY SESSIONTHANK YOU