1. DAT 402 Designing High Performance I/O System with SQL Server Dubi Lebel Dubi.lebel@gmail.com
2. Query B runs two time in a week , takes about 10 minutes to return results set Query A runs two thousand time every day, takes up to 1 second to return results set Which one of those query affecting more the DISK I/O?
3. insert 100 rows wrapped in 1 transaction Insert 100 single row (100 transactions) Which one of those inserts is faster? which one affecting more the DISK I/O?
4. Amdahl's Law: system speed-up limited by the slowest part! CPU Performance: 60% per year Disk Performance < 10% per year (IO per sec) I/O system performance limited by mechanical delays (disk I/O)
5. Who am I? D.B.A. (Don’t Bother Asking) Architect at Logic Ind. Worked as a database Dev & admin since 1990. Works with SQL Server from first (SYBASE) version 3.6 Been at S.R.L. (R.I.P.) 8½ years Been at Microsoft 7 ½ as Technical manager of SQL pre-sales (managed 3 times the DB track at Tech-ED) Co-manage with Ami Leven the SQL Server Israeli User Group
6. Thanks Thomas Kejser - SQL CAT member http://sqlcat.com/members/tkejser.aspx Thomas with Henk Van derValk (BI405 29/11/10, 11:30 - 12:45 ) world record SSIS -ETL performance - 1.18 TB in under 10 minutes.
7. Key Takeaway This is NOT going to be easy…182 slides You can either dive here or in the sea. But here you will see what you can’t see in the sea. The lessons in this session wrote in sweat and blood
8. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
13. What is IO for me? Terminology. Tools. DiThe path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
14. What is IO for me? Throughput Latency Capacity How do you Measure it?
15. What is IO for me - Throughput The amount of successful data passing between storage and computer in a specified amount of time Measured in MB/sec or IOPs Performance Monitor: Logical Disk Disk Read Bytes / Sec Disk Write Bytes / Sec Disk Read / Sec Disk Writes / Sec
16. What is IO for me - Latency A synonym for delay. how much time it takes for a packet of data to get from one designated to another Measured in milliseconds (ms) Performance Monitor: Logical Disk Avg. Disk Sec / read Avg. Disk Sec / write More on healthy latency values later
17. What is IO for me - Capacity Capacity is just Capacity Measured in GB/TB The easy one! does it important? Key Takeaway: Don’t think about disk i/o as disk capacity
18. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
19. Terminology - Basic Disk - Spindles – Physical disks in the Storage Array Array - Box with the Spindles in it
20. Terminology – ACR”N JBOD - Just a Bunch of Disks SAME – Stripe and Mirror Everything RAID - Redundant Array of Inexpensive Disks DAS - Direct Attached Storage NAS - Network Attached Storage SAN - Storage Area Network CAS - Content Addressable Storage
22. Terminology – Adv. LUN - Logical Unit Number Host - The Server or Servers a LUN is presented to. Disk - How the OS sees a LUN when presented IOps - Physical Operation To Disk Sequential IO - Reads or writes which are sequential on the spindle Random IO - Reads or writes which are located at random positions on the spindle
24. The Full Stack I/O Controller / HBA Cabling Array Cache Spindle PCI Bus Windows CPU SQL Serv.
25. The Traditional Hard Disk Drive Cover mounting holes (cover not shown) Base casting Spindle Slider (and head) Case mounting holes Actuator arm Platters Actuator axis Actuator Flex Circuit (attaches heads to logic board) SATA interface connector Power connector Source: Panther Products
36. Numbers to Remember - Spindles Traditional Spindle throughput in random 8/16K I/O 10K RPM – 100 -130 IOPs at ‘full stroke’ 15K RPM – 150-180 IOPs at ‘full stroke’ Can achieve 2x or more when ‘short stroking’ the disks (using less than 20% capacity of the physical spindle) Aggregate throughput when sequential access: Between 90MB/sec and 125MB/sec for a single drive If true sequential, any block size over 8K will give you these numbers Depends on drive form factor, 3.5” drives slightly faster than 2.5” Approximate latency: 3-5ms
37. Scaling of Spindle Count - Short vs. Full Stroke Each 8 disks exposes a single 900GB LUN RAID Group capacity ~1.1TB Test data set is fixed at 800GB Single 800GB for single LUN (8 disks), two 400GB test files across two LUNs, etc… Lower IOPs per physical disk when more capacity of the physical disks are used (longer seek times)
38. The “New” Hard Disk Drive (SSD)Solid State Drive No moving parts!
39. Paul S. Randal SQLskills.com http://www.sqlskills.com/BLOGS/PAUL/category/Benchmarking.aspx Benchmarking-Introducing-SSDs-part 1 , 2 and 3
40. SSD - Game Changer! No moving parts Power consumption (20%) Read operations Random = Sequential low latency on access
41. SSD - NAND Flash Throughput, especially random, much higher than traditional drive Typically 10**4 IOPS for a single drive example: Intel X25 and FusionIO Storage organized into cells of 512KB Each cell consists of 64 pages, each page 8KB When a cell need to rewritten, the 512KB Block must first be erased This is an expensive operation, can take very long Disk controller will attempt to locate free cells before trying to delete existing ones Writes can be slow DDR ”write cache” often used to ”overcome” this limitation When blocks fill up, NAND becomes slower with use But only up to a certain level – eventually peaks out Still MUCH faster than typical drives
42. SSD - Battery Backed DRAM Throughput close to speed of RAM Typically 10**5 IOPS for a single drive Drive is essentially DRAM RAM on a PCI card (example: FusionIO) ...or with a fiber interface (example: DSI3400) Battery backed up to persist storage Be careful about downtime, how long can drive survive with no power? As RAM prices drop, these drives are becoming larger Extremely high throughput, watch the path to the drives
43. SSD Directly on PCI-X Slot > 10,000 IOPs Mixed read/write Latency < 1ms PCI bus bottleneck
45. Question SSD is evolution of DiskOnKey, What is the most dangerous event that can lead you to loss all your Disk On Key data? Don’t put all the eggs on one basket!
46. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
51. The most important parameters are: -kW means writes (as opposed to reads) -t2 means two threads -s120 means test for 120 seconds -dM means drive letter M -o1 means one outstanding request (not piling up requests) -frandom means random access (as opposed to -fsequential) -b64 means 64kb IOs
52. The Output E:rogram Files (x86)QLIO>sqlio -kW -t2 -s120 -dM -o1 -frandom -b64 -BH -LS Testfile.dat sqlio v1.5.SG using system counter for latency timings, -1361967296 counts per second 2 threads writing for 120 secs to file M:Testfile.dat using 64KB random IOs enabling multiple I/Os per thread with 1 outstanding buffering set to use hardware disk cache (but not file cache) using current size: 24576 MB for file: M:Testfile.dat initialization done CUMULATIVE DATA: throughput metrics: IOs/sec: 1539.50 MBs/sec: 96.21 latency metrics: Min_Latency(ms): 0 Avg_Latency(ms): 0 Max_Latency(ms): 572 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 66 32 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
55. http://www.iometer.org/ developed by the Intel Corp. 1998 an I/O subsystem measurement and characterization tool for single and clustered systems. given to the Open Source Development Lab (OSDL). In November 2001 Last update 2008-06-22-rc2
56.
57.
58. One worker per processor.Can play a significant role in observed performance.
67. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
68. I/O Controller / HBA Cabling Array Cache Spindle PCI Bus Windows CPU SQL Serv.
69. hardware between the CPU and the physical drive Different topologies depending on vendor and technology Best Practices: Understand topology, potential bottlenecks and theorectical throughput of components in the path Engage storage engineers early in the process The deeper the topology, the more latency
70. Controller Network components between disks and server Multiple disks connected to a computer system through a controller Failure detection and recovery (checksum, bad sector remapping)
71. Disk interface standard Fiber Channel (FC) Fastest Bus Speeds between 2-4 Gigs SCSI -Small Computer System Interconnect, Older Technology, slower bus speeds S(ATA) - ATAdaptor Newer Technology, even slower bus speeds Enterprise Flash Disks (EFDs) Newest Technology, same bus speeds as FC
72. Cache System cache - Buffer data between disk and interface Disk cache Use DRAM to cache recently accessed blocks Blocks are replaced usually in an LRU order Minimum 8-16MB Needs battery and operational for reliable writes
73. Cache size, checkpoint 2GB vs. 8GB Key Takeaway: Write cache helps, up to a certain point
74. I/O Systems interrupts Processor Cache Memory - I/O Bus I/O Controller I/O Controller I/O Controller Main Memory Graphics Disk Disk Network
75. DAS vs. SAN DAS – Direct Attached Storage Standards: (SCSI), SAS, SATA RAID controller in the machine PCI-X or PCI-E direct access SAN – Storage Area Networks Standards: iSCSI or Fibre Channel (FC) Host Bus Adapters or Network Cards in the machine Switches / Fabric access to the disk
76. Path to the drives - DAS Shelf Interface Cache Controller PCI Bus Shelf Interface PCI Bus Shelf Interface Cache Controller Shelf Interface
77. Path to the Drives – DAS ”on chip” Controller PCI Bus PCI Bus Controller
78. Path to the drives - SAN Cache Fiber Channel Ports Controllers/Processors Switch PCI Bus PCI Bus PCI Bus HBA PCI Bus Switch Best Practice: Make sure you have the tools to monitor the entire path to the drives. Understand utilization of individual componets
79. SQL Server on DAS - Pros Beware of non disk related bottlenecks SCSI/SAS controller may not have bandwidth to support disks PCI bus should be fast enough Example: Need PCI-X 8x to consume 1GB/sec Can use Dynamic Disks to stripe LUN’s together Bigger, more manageable partitions Often a cheap way to get high performance at low price Very few skills required to configure
80. SQL Server on DAS - Cons Cannot grow storage dynamically Buy enough capacity from the start … or plan database layout to support growth Example: Allocate two files per LUN to allow doubling of capacity by moving half the files Inexpensive and simple way to create very high performance I/O system Important: No SAN = No Cluster! Must rely on other technologies (ex: Database Mirror) for maintaining redundant data copies Consider requirements for storage specific functionality Ex: DAS will not do snapshots Be careful about single points of failure Example: Loss of single storage controller may cause many drives lost
81. Numbers to Remember - DAS SAS Cable speed Theoretical: 1.5GB/sec Typical: 1.2GB/sec PCI-X v1 bus X4 slot: 750M/sec X8 slot: 1.5GB/sec X16 – fast enough, around the 3GB/sec PCI-X v2 Bus X4 slot: 1.5 – 1.8GB/sec X8 slot: 3GB/sec Be aware that a PCI-X bus may be “v2 compliant” but still run at v1 speeds.
82. SQL Server on SAN – Pitfalls (1/2) Sizing on capacity instead of performance Over-estimating the ability of the SAN or array Overutilization of shared storage Lack of knowledge about physical configuration and the potential bottlenecks or expected limits of configuration Makes it hard to tell if performance is reasonable on the configuration Array controller or bandwidth of the connection is often the bottleneck Key Takeaway: Bottleneck is not always the disk One size fits all solutions is probably not optimal for SQL Server Get the full story, make sure you have SAN vendors tools to measure the path to the drives Capture counters that provide the entire picture (see benchmarking section)
83. SQL Server on SAN – Pitfalls (2/2) Storage technologies are evolving rapidly and traditional best practices may not apply to all configurations Assuming physical design does not matter on SAN Over estimating the benefit of array cache Physical isolation practices become more important at the high end High volume OLTP, large scale DW Isolation of HBA to storage ports can yield big benefits This largely remains true for SAN although some vendors claim it is not needed
84. Numbers to Remember - SAN HBA speed 4Gbit – Theoretical around 500MB/sec Realistically: between 350 and 400MB/sec 8Gbit will do twice that But remember limits of PCI-X bus An 8Gbit card will require a PCI-X4 v2 slot or faster Max throughput per storage controller Varies by SAN vendor, check specifications Drives are still drives – there is no magic
86. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
87. Monitoring - Windows View of I/O Make sure to capture all of these for the complete picture…
88. Validating a System for High Throughput OLTPCached files vs. Disk Access From Cache By the way – Notice the queue depth From Disk
89. Random or Sequential? Knowing if your workload is random or sequential in nature can be a hard question to answer Depends a lot on application design SQL Server Access Methods can give some insights High values of Readahead pages/sec indicates a lot of sequential activity High values of index seeks / sec indicates a lot of random activity Look at the ratio between the two Transaction log is always sequential Best Practice: Isolate transaction log on dedicated drives
90. Configuring Disks in WindowsThe one slide best practice Use Disk Alignment at 1024KB Use GPT if MBR not large enough Format partitions at 64KB allocation unit size One partition per LUN Only use Dynamic Disks when there is a need to stripe LUNs using Windows striping (i.e. Analysis Services workload) Tools: Diskpar.exe, DiskPart.exe and DmDiag.exe Format.exe Disk Manager
91. Ensure Disks are Formatted Correctly The worst scenario? Random operations using 64K IO and 64K chunk size. One sector off and you are hitting two disks for every IO thus halving the random performance potential. Note: On a RAID array this means accessing two different stripe units on two separate disks. Graphics Source: Jimmy May ד
96. How Many Filegroups? Performance Filegroups can be used to separate tables/indexes - allowing selective placement of these at the disk level Separate objects requiring more data files due to high page allocation rate Can be used to separate I/O patterns Administration consideration (primarily) Backup can be performed at the filegroup or file level Partial availability Database is available if primary filegroup is available; other filegroups can be offline A filegroup is available if all its files are available Tables and Indexes Can specify separate filegroups for in-row data and large-object data Best Practice: Place LOB data in a dedicated filegroup Partitioned Tables Each partition can be in its own filegroup Partition per filegroup may provide better archiving strategy Partitions can be moved in and out of the table Best Practice: Do not place data in PRIMARY filegroup, allocate a new filegroup and set this as default
97. Share storage environments At the disk level and other shared components (i.e. service processors, cache, etc…)
98. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
100. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
101. Storage Selection General Pitfalls There are organizational barriers between DBA’s and storage administrators Each needs to understand the others “world” Share storage environments At the disk level and other shared components (i.e. service processors, cache, etc…) Sizing only on “capacity” is a common problem Key Takeaway: Take latency and throughput requirements (MB/s, IOPs and max latency) into consideration when sizing storage One size fits all type configurations Storage vendor should have knowledge of SQL Server and Windows best practices when array is configured Especially when advanced features are used (snapshots, replication, etc…)
102.
103. OLAP - DWH Workloads Smaller number of Tlog Writes with longer writes (often 64K) T-Log buffers get written because buffer is full Often ‘sequential’ Read-Ahead with 64K or more from data files
104. Backup / Restore Backup and restore operations utilize internal buffers for the data being read/written Number of buffers is determined by: The number of data file volumes The number of backup devices Or by explicitly setting BUFFERCOUNT If database files are spread across are a few (or a single) logical volume(s), and there are a few (or a single) output device(s) optimal performance may not be achievable by default Tuning can be achieved by using the BUFFERCOUNT parameter for BACKUP / RESTORE More Information: http://sqlcat.com/technicalnotes/archive/2008/04/21/tuning-the-performance-of-backup-compression-in-sql-server-2008.aspx
105. FILESTREAM Writes to varbinary(max) will go through the buffer pool and are flushed during checkpoint Reads & Writes to FILESTEAM data does not go through the buffer pool (either T-SQL or Win32) T-SQL uses buffered access to read & write data Win32 can use either buffered or non-buffered Depends on application use of APIs FileStream I/O is not tracked via sys.dm_io_virtual_file_stats Best practice to separate on to separate logical volume for monitoring purposes Writes/Generates to FILESTREAM generates less transaction log volume than varbinary(max) Actual FILESTREAM data is not logged FILESTREAM data is captured as part of database backup and transaction log backup May increase throughput capacity of the transaction log http://sqlcat.com/technicalnotes/archive/2008/12/09/diagnosing-transaction-log-performance-issues-and-limits-of-the-log-manager.aspx
106. TEMPDB User group 92 November 2009: Nothing is not more permanent than the temporary http://www.slideshare.net/sqlserver.co.il/nothing-is-not-more-permanent-than-the-temporary
107. Tools SQLIO Used to stress an I/O subsystem – Test a configuration’s performance http://www.microsoft.com/downloads/details.aspx?FamilyId=9A8B005B-84E4-4F24-8D65-CB53442D9E19&displaylang=en SQLIOSim Simulates SQL Server I/O – Used to isolate hardware issues 231619 HOW TO: Use the SQLIOStress Utility to Stress a Disk Subsystem http://support.microsoft.com/?id=231619 Fiber Channel Information Tool Command line tool which provides configuration information (Host/HBA) http://www.microsoft.com/downloads/details.aspx?FamilyID=73d7b879-55b2-4629-8734-b0698096d3b1&displaylang=en
108. KB Articles KB 824190 Troubleshooting Storage Area Network (SAN) Issues http://support.microsoft.com/?id=824190 KB 304415: Support for Multiple Clusters Attached to the Same SAN Device http://support.microsoft.com/?id=304415 KB 280297: How to Configure Volume Mount Points on a Clustered Server http://support.microsoft.com/?id=280297 KB 819546: SQL Server support for mounted volumes http://support.microsoft.com/?id=819546 KB 304736: How to Extend the Partition of a Cluster Shared Disk http://support.microsoft.com/?id=304736 KB 325590: How to Use Diskpart.exe to Extend a Data Volume http://support.microsoft.com/?id=325590 KB 328551: Concurrency enhancements for the tempdb database http://support.microsoft.com/?id=328551 KB 304261: Support for Network Database Files http://support.microsoft.com/?id=304261
109. General Storage References Microsoft Windows Clustering: Storage Area Networks http://www.microsoft.com/windowsserver2003/techinfo/overview/san.mspx StorPort in Windows Server 2003: Improving Manageability and Performance in Hardware RAID and Storage Area Networks http://www.microsoft.com/windowsserversystem/wss2003/techinfo/plandeploy/storportwp.mspx Virtual Device Interface Specification http://www.microsoft.com/downloads/details.aspx?FamilyID=416f8a51-65a3-4e8e-a4c8-adfe15e850fc&DisplayLang=en Windows Server System Storage Home http://www.microsoft.com/windowsserversystem/storage/default.mspx Microsoft Storage Technologies – Multipath I/O http://www.microsoft.com/windowsserversystem/storage/technologies/mpio/default.mspx Storage Top 10 Best Practices http://sqlcat.com/top10lists/archive/2007/11/21/storage-top-10-best-practices.aspx
110. Recommended Courses: SQL 2008 Maintaining a Microsoft SQL Server 2008 Database Implementing a Microsoft SQL Server 2008 Database ועכשיו מבצע מיוחד! הרשם לאחד מקורסי SQL המוצעים כאן קנה בחינת הסמכה משלימה: 70-432 TS: Microsoft SQL Server 2008, Implementation and Maintenance 70-433 TS: Microsoft SQL Server 2008, Database Development וקבל מנוי שנתי ל- TechNet ואפשרות לבחינה חוזרת ללא עלות לפרטים נוספים והרשמה, פנה למכללות המוסמכות
111. Recommended Courses: Business Intelligence Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services Implementing and Maintaining Microsoft SQL Server 2008 Integration Services Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services לפרטים נוספים והרשמה, פנה למכללות המוסמכות