SQLintersection keynote a tale of two teams
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

SQLintersection keynote a tale of two teams

on

  • 368 views

Shared the stage with Kevin Kline. Paul Randal and Kimberly L. Tripp organized an excellent conference. This slide deck talks about how to design large MS SQL Server architectures with 1000s of ...

Shared the stage with Kevin Kline. Paul Randal and Kimberly L. Tripp organized an excellent conference. This slide deck talks about how to design large MS SQL Server architectures with 1000s of databases that are high performance and yet easy to manage. ioMemory by Fusion-io provides performance and SQL Sentry provides an amazing interface to manage and monitor 1000s of databases.

Statistics

Views

Total Views
368
Views on SlideShare
368
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Interestingly, the default configuration of the server is generally quite good. Even at very high scale there is not much additional work that can be done. The closest you get to a magic: make SQL Server go faster traceflag is 834 (http://support.microsoft.com/kb/920093 and http://msdn2.microsoft.com/en-us/library/aa366720.aspx) for Windows large-page allocations for the buffer pool.If you see a flat node, it will fill up eventually if you start doing enough work in SQL.
  • On heavily OLTP systems, there’s enough NIC traffic that they need their own CPU cores to process the TCP work. Use affinity mask to segregate the NIC cores.Increasing connections to ~6000 (users had think time), you’ll started seeing waits on THREADPOOLSolution: increase sp_configure ‘max worker threads’Probably don’t want to go higher than 4096Gradually increase it, default max is 980Avoid killing yourself in thread management – bottleneck is likely somewhere else
  • (Should be around 4:30 pm)PCI-e v1 busX4 slot: 750M/secX8 slot: 1.5GB/secX16 – fast enough, around the 3GB/secSome “v2 compliant” PCI-e bus still run at v1 speeds!
  • Interesting Shape, what’s causing it?
  • The hardware between the CPU and the physical drive is often complexDifferent topologies, depending on vendor and technologyTwo major topologies for SQL Server StorageDAS – Direct Attached StorageStandards: (SCSI), SAS, SATARAID controller in the machinePCI-X or PCI-E direct accessSAN – Storage Area Networks Standards: iSCSI or Fibre Channel (FC)Host Bus Adapters or Network Cards in the machineSwitches / Fabric access to the diskSAN & Tiered Storage ArraysSANData is explicitly placed on various disk groups which the admin must track.Moving data between tiers is manual and typically offlineGranularity is whatever the admin decides to move.Depends on the admin tracking storage hot spots and usageTiered SANArray tracks usage patterns and automatically moves data between storage tiers.Data movement is in the background and fully onlineGranularity is LUN today, moving to finer-grainedDepends on the array, tracking storage hot spots and usage
  • (Should be around 4:45-4:50 pm)
  • (Should be around 5:00 – 5:10 pm)
  • First image = basic AG for high availability and disaster recoverySecond image = Node and File Share Majority quorum modelThird image = Node Majority quorum model
  • These are the default tools.
  • Open up the Always On view for Instance/Group Matrix. Illustrate the thick green lines from SQL2 to SQL4.Start the job on SQL2, “Keynote workload”. The lines will begin to turn red. Discuss how the IO load on SQL2 (Fusion-IO) is backing up on SQL4 (slow disks).Start the job on SQL1, “Keynote move C/F to SQL1”. Takes a couple minutes. Nodes shut down then flip over to SQL1. The restore to create the database file is super fast. Super-low latency because of Fusion-IO. May have time to run “Keynote workload” again on the SQL1 after the flip.Start the job on SQL1, “Revert”. Takes even more time. Main point here is how long it takes because we have to restore and reinitialize database on the slow disks. Then it may not finish in time because the initialization and restore take so much longer.

SQLintersection keynote a tale of two teams Presentation Transcript

  • 1. High Performance andHigh Availability withSQL Server 2012 AlwaysOnSumeet Bansal, Fusion-IOKevin Kline, SQL Sentry
  • 2. 2© SQLintersection. All rights reserved.http://www.SQLintersection.comIntroductionA Tale of Two TeamsTwo rival teams …… each working to satisfy an important customer.What’s the hardware solution? What’s the software solution?
  • 3. 3© SQLintersection. All rights reserved.http://www.SQLintersection.comThe Customer A real-world major financial institution headquartered in London. A core banking application - credit card transactions from ATM andBranches Requirement: 10,000 Business Transactions / sec (Not IOPs!) Highly available using AlwaysOn across hundreds of nodes in manyAvailability Groups ... AND IT LOOKS LIKE THIS AT LOAD...o
  • 4. 4© SQLintersection. All rights reserved.http://www.SQLintersection.comMeeting the Requirements: Hardware High Performance on SQL Server means tuningthe FULL STACK. Key Takeaway: This is NOT going to be easy…OS SQL CPU HBA NIC Array Cache Spindles 
  • 5. 5© SQLintersection. All rights reserved.http://www.SQLintersection.comFirst Surprise - Memory At scale, SQL Server does a generally good job ofmemory management by default. Some improvements are possible on largeCPU/Memory boxes dedicated to SQL Server: Lock Pages in Memory Big performance gain! Use gpedit.msc to grant it to SQL Service account Large page Allocations (-TF834) On Windows 2008R2 previous issues with this TF are fixed Around 10% throughput increase NUMA node memory distribution: Beware! Set max memory close to box max if dedicated box availableo
  • 6. 6© SQLintersection. All rights reserved.http://www.SQLintersection.comSecond Surprise - NICs At scale, network traffic will generate a LOT ofinterrupts for the CPU These must be handled by CPU Cores Must distribute packets to cores for processing Rule of thumb (OTLP): 1 NIC / 16 Cores Watch the DPC activity in Taskmanager Remove SQL Server (using affinity masking) from the NIC coreso
  • 7. 7© SQLintersection. All rights reserved.http://www.SQLintersection.comDrive Selection - General Number of files matter for SQL Server TempDB and user database has multiple files, on segregated arrays Other important configs: NTFS allocation size at 64-KB; HBA queue depth at 64; Storport HBA Driver Number of drives matter More drives = more speed True for both SAN and DAS ... Less so for SSD, but still relevant (especially for NAND) If designing for performance, make sure the topology canhandle it! Understand the path to the drives Consider workload: Random or Sequential? Key Takeaway: Validate and compare configurations prior todeployment
  • 8. 8© SQLintersection. All rights reserved.http://www.SQLintersection.comRules of Thumb – Disk IO Traditional Spindle throughput 10K RPM – 100-130 IOPs, ‘fullstroke’ 15K RPM – 150-180 IOPs, ‘fullstroke’ Can achieve 2x or more when‘short stroking’ the disks (usingless than 20% capacity of thephysical spindle) These are for random 8K I/O Aggregate throughput whensequential access: Between 90MB/sec and125MB/sec for a single drive If truly sequential, any block sizeover 8K will give you thesenumbers Some 3.5” drives slightly fasterthan 2.5” Approximate latency: 3-5ms Cable speed Theoretical: 1.5GB/sec Typical: 1.2GB/sec PCI-e v2 Bus X4 slot: 1.5 – 1.8GB/sec X8 slot: 3GB/sec HBA speed 4Gbit – ~500MB/sec 8Gbit – ~1GB/sec on PCI-e X4 v2bus Typical: 350-400MB/sec on 4Gbit,doubled on 8Gbitu
  • 9. 9© SQLintersection. All rights reserved.http://www.SQLintersection.comWhat’s Causing these Non-DiskBottlenecks?Addeddisk pairBackplanelimit140140110Addedcontroller
  • 10. 10© SQLintersection. All rights reserved.http://www.SQLintersection.comUnderstand the Full Stack to the drives Key Takeaway: The deeper the topology, thegreater latency, the more important thetuning Best Practices: Understand topology, potential bottlenecks andtheoretical throughput of each component in the path! Engage storage engineers early in the process Two major topologies for SQL ServerStorage DAS – Direct Attached Storage SAN – Storage Area Networks
  • 11. 11© SQLintersection. All rights reserved.http://www.SQLintersection.comTraditional Centralized Architecture11ApplicationCPU andMemoryHBA SwitchesTargetAdaptersCPU andMemoryRAIDControllersHDD/SSDSERVERSActive andArchive DataSTORAGE (Performance Optimized)NETWORKMillisecondsDatabasesVirtualizationWeb-scaleLatency and ProcessingTime
  • 12. 12© SQLintersection. All rights reserved.http://www.SQLintersection.comShared Data Decentralization12Active DataApplication CPUs NAND Flash Raid Controller HDD/SSDSERVERSLatency and ProcessingTimeArchive DataMicroseconds MillisecondsDatabasesVirtualizationWeb-scale
  • 13. 13© SQLintersection. All rights reserved.http://www.SQLintersection.comThe SAN – Panacea to All IO Issues…….YEAH RIGHT! Green: Checkpoint, Red: tx/sec, Black: Disk Latencyo
  • 14. 14© SQLintersection. All rights reserved.http://www.SQLintersection.comDAS vs. SAN - SummaryFeature SAN DASCost High, offset by better utilization Low, may waste spaceFlexibility More, abstraction allows onlineconfiguration changesLess, get it right the firsttime!Skills required Complex with steep learningcurveSimple and wellunderstoodAdditionalFeaturesSnapshots; StorageReplication; Thin ProvisioningNonePerformance Not high performancetechnologyHigh performance forsmall investmentReliability More, very high reliability Less, depending on RAIDlevelClusteringSupportYes No (specialimplementations exist)So, which should we choose?SAN DASo
  • 15. 15© SQLintersection. All rights reserved.http://www.SQLintersection.comLet’s See What It Can Do!1 x MS SQL Server 25 Billion Transactions/Day(Equivalent to the numberof estimated Credit cardtransactions around theglobe in a single day)http://www.fusionio.com/blog/powering-global-commerce-with-sql-server-iomemory/4 x 1.2TB
  • 16. DemoTurn difficult disk IO tuning into easyioMemory plug-n-play.
  • 17. 17© SQLintersection. All rights reserved.http://www.SQLintersection.comMeeting the Requirements: Software Highly transparent instrumentation meansmonitoring the FULL STACK. Key Takeaway: This is NOT going to be easy…OS SQL CPU HBA NIC Array Cache Spindles ov
  • 18. 18© SQLintersection. All rights reserved.http://www.SQLintersection.comInstrumentation: PerfMon Throughput: Measured in MB/sec or IOPs by PerfMon: Logical Disk Disk Read Bytes / Sec Disk Write Bytes / Sec Disk Read / Sec Disk Writes / Sec Latency: Measured in milliseconds (ms) by PerfMon: Logical Disk Avg. Disk Sec / read Avg. Disk Sec / write More on healthy latency values later Key Takeway: For transparency, PerfMon gives a limited picture ofperformance.o
  • 19. 19© SQLintersection. All rights reserved.http://www.SQLintersection.comInstrumentation: Profiler / Trace High overhead Lots of experience needed to filter the results Deprecated! (But only for relational engine). Key Takeway: Shows triggered events, but not a comprehensive view ofwhole system. Not a reliable long-term solution.o
  • 20. 20© SQLintersection. All rights reserved.http://www.SQLintersection.comInstrumentation: DMVs-- SQL Server 2012 Diagnostic Information Queries, by Glenn Berry, @GlennAlanBerry-- http://sqlserverperformance.wordpress.com/-- http://sqlskills.com/blogs/glenn/-- Get total buffer used by DB for current instanceSELECT DB_NAME(database_id) AS [Database Name],COUNT(*) * 8/1024.0 AS [Cached Size (MB)]FROM sys.dm_os_buffer_descriptors WITH (NOLOCK)WHERE database_id > 4 -- system databasesAND database_id <> 32767 -- ResourceDBGROUP BY DB_NAME(database_id)ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE); Great information! Built in for SQL Server 2005+. No history. No correlation. No interpretation. Key Takeway: Very useful. Not very useable.o
  • 21. 21© SQLintersection. All rights reserved.http://www.SQLintersection.comInstrumentation: Extended Events Low overhead Lots of experience needed to filter the results How much memory or space? Other administrative questions toanswer… Key Takeway: Deep data, but is it actionable and proactive information?o
  • 22. 22© SQLintersection. All rights reserved.http://www.SQLintersection.comInstrumentation: Notifications Per server setup Requires SQLAgent service Can only capture error msg/lvl, WMI metrics, PerfMon metrics Key Takeway: Alerts are available, but high support requirements andlimited proactivity.o
  • 23. DemoBringing all the instrumentation togetherfor meaningful, actionableperformance information.ow
  • 24. 24© SQLintersection. All rights reserved.http://www.SQLintersection.comMeeting the Requirements: HA Need more flexibility than in legacy approacheslike log shipping and database mirroring. Need a shared nothing architecture. Key Takeaway: This is not too bad UNTIL wescale up …OS SQL CPU HBA NIC Array Cache Spindles 
  • 25. 25© SQLintersection. All rights reserved.http://www.SQLintersection.comAvailability Groups Fundamentalso
  • 26. 26© SQLintersection. All rights reserved.http://www.SQLintersection.comSpecial Considerations: AlwaysOn Granular control and some visibility into AlwaysOn through SSms Rt-Click  Show Dashboard. Designed for small scale implementations. As with earlier tools, user carries the risk and requirement for expertise.o
  • 27. DemoHA + DR management andmonitoring at scale.ox
  • 28. 28© SQLintersection. All rights reserved.http://www.SQLintersection.comHow Did We Do It?OS SQL CPU HBA NIC Array Cache Spindles OS +SQL CPU Fusion-io +SQL Sentry 
  • 29. 29© SQLintersection. All rights reserved.http://www.SQLintersection.comReferences Thomas Kejser, SQLCAT, and high performance IO tuning: http://blog.kejser.org/tag/sqlcat/ http://blog.kejser.org/ Jonathan Kehayias & xEvents: http://www.sqlskills.com/blogs/jonathan/category/extended-events/ Joe Sack & AlwaysOn: http://www.sqlskills.com/blogs/joe/answering-questions-with-the-alwayson-dashboard/ SQLPerformance.com (Jonathan Kehayias) instrumentation overheadanalysis: http://www.sqlperformance.com/2012/10/sql-trace/observer-overhead-trace-extended-events
  • 30. 30© SQLintersection. All rights reserved.http://www.SQLintersection.comReview High performance IO is very hard when restricted todisk-only architectures. ioMemory from Fusion-IO is the solution! Highly transparent monitoring and alerting,especially for HA, is very hard with native tools andfeatures. Performance Advisor from SQL Sentry is the solution! Visit our booths to see the latest releases and signup for free trials and demonstrations!
  • 31. Don’t forget to enter your evaluationof this session using EventBoard!Questions?Thank you!