IO Dubi Lebel


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

IO Dubi Lebel

  1. 1. DAT 402 Designing High Performance I/O System with SQL Server Dubi Lebel
  2. 2. Query B runs two time in a week , takes about 10 minutes to return results set Query A runs two thousand time every day, takes up to 1 second to return results set Which one of those query affecting more the DISK I/O?
  3. 3. insert 100 rows wrapped in 1 transaction Insert 100 single row (100 transactions) Which one of those inserts is faster? which one affecting more the DISK I/O?
  4. 4. Amdahl's Law: system speed-up limited by the slowest part! CPU Performance: 60% per year Disk Performance < 10% per year (IO per sec) I/O system performance limited by mechanical delays (disk I/O)
  5. 5. Who am I? D.B.A. (Don’t Bother Asking) Architect at Logic Ind. Worked as a database Dev & admin since 1990. Works with SQL Server from first (SYBASE) version 3.6 Been at S.R.L. (R.I.P.) 8½ years Been at Microsoft 7 ½ as Technical manager of SQL pre-sales (managed 3 times the DB track at Tech-ED) Co-manage with Ami Leven the SQL Server Israeli User Group
  6. 6. Thanks Thomas Kejser - SQL CAT member Thomas with Henk Van derValk (BI405 29/11/10, 11:30 - 12:45 ) world record SSIS -ETL performance - 1.18 TB in under 10 minutes.
  7. 7. Key Takeaway This is NOT going to be easy…182 slides You can either dive here or in the sea. But here you will see what you can’t see in the sea. The lessons in this session wrote in sweat and blood
  8. 8. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  9. 9. What is IO for me?
  10. 10. What is IO for me?
  11. 11. 1956: IBM 305 RAMAC Computer with Disk Drive
  12. 12. Seagate ST4053 40 MByte This was my disk on my first desktop
  13. 13. What is IO for me? Terminology. Tools. DiThe path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  14. 14. What is IO for me? Throughput Latency Capacity How do you Measure it?
  15. 15. What is IO for me - Throughput The amount of successful data passing between storage and computer in a specified amount of time Measured in MB/sec or IOPs Performance Monitor: Logical Disk Disk Read Bytes / Sec Disk Write Bytes / Sec Disk Read / Sec Disk Writes / Sec
  16. 16. What is IO for me - Latency A synonym for delay. how much time it takes for a packet of data to get from one designated to another Measured in milliseconds (ms) Performance Monitor: Logical Disk Avg. Disk Sec / read Avg. Disk Sec / write More on healthy latency values later
  17. 17. What is IO for me - Capacity Capacity is just Capacity Measured in GB/TB The easy one! does it important? Key Takeaway: Don’t think about disk i/o as disk capacity
  18. 18. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  19. 19. Terminology - Basic Disk - Spindles – Physical disks in the Storage Array Array - Box with the Spindles in it
  20. 20. Terminology – ACR”N JBOD - Just a Bunch of Disks SAME – Stripe and Mirror Everything RAID - Redundant Array of Inexpensive Disks DAS - Direct Attached Storage NAS - Network Attached Storage SAN - Storage Area Network CAS - Content Addressable Storage
  21. 21. NAS vs. SAN
  22. 22. Terminology – Adv. LUN - Logical Unit Number Host - The Server or Servers a LUN is presented to. Disk - How the OS sees a LUN when presented IOps - Physical Operation To Disk Sequential IO - Reads or writes which are sequential on the spindle Random IO - Reads or writes which are located at random positions on the spindle
  23. 23. Cardiothoracic Surgery
  24. 24. The Full Stack I/O Controller / HBA Cabling Array Cache Spindle PCI Bus Windows CPU SQL Serv.
  25. 25. The Traditional Hard Disk Drive Cover mounting holes (cover not shown) Base casting Spindle Slider (and head) Case mounting holes Actuator arm Platters Actuator axis Actuator Flex Circuit (attaches heads to logic board) SATA interface connector Power connector Source: Panther Products
  26. 26. Disk Arm and Head Disk arm A disk arm carries disk heads Disk head Read and write on disk surface Read/write operation Disk controller receives a command with <track#, sector#> Seek the right cylinder (tracks) Wait until the right sector comes Perform read/write
  27. 27. Mechanical Component of A Disk Drive Tracks Concentric rings around disk surface, bits laid out serially along each track Cylinder A track of the platter, 1000-5000 cylinders per zone, 1 spare per zone Sectors Each track is split into arc of track (min unit of transfer)
  28. 28. Numbers to Remember - Spindles Traditional Spindle throughput in random 8/16K I/O 10K RPM – 100 -130 IOPs at ‘full stroke’ 15K RPM – 150-180 IOPs at ‘full stroke’ Can achieve 2x or more when ‘short stroking’ the disks (using less than 20% capacity of the physical spindle) Aggregate throughput when sequential access: Between 90MB/sec and 125MB/sec for a single drive If true sequential, any block size over 8K will give you these numbers Depends on drive form factor, 3.5” drives slightly faster than 2.5” Approximate latency: 3-5ms
  29. 29. Scaling of Spindle Count - Short vs. Full Stroke Each 8 disks exposes a single 900GB LUN RAID Group capacity ~1.1TB Test data set is fixed at 800GB Single 800GB for single LUN (8 disks), two 400GB test files across two LUNs, etc… Lower IOPs per physical disk when more capacity of the physical disks are used (longer seek times)
  30. 30. The “New” Hard Disk Drive (SSD)Solid State Drive No moving parts!
  31. 31. Paul S. Randal Benchmarking-Introducing-SSDs-part 1 , 2 and 3
  32. 32. SSD - Game Changer! No moving parts Power consumption (20%) Read operations Random = Sequential low latency on access
  33. 33. SSD - NAND Flash Throughput, especially random, much higher than traditional drive Typically 10**4 IOPS for a single drive example: Intel X25 and FusionIO Storage organized into cells of 512KB Each cell consists of 64 pages, each page 8KB When a cell need to rewritten, the 512KB Block must first be erased This is an expensive operation, can take very long Disk controller will attempt to locate free cells before trying to delete existing ones Writes can be slow DDR ”write cache” often used to ”overcome” this limitation When blocks fill up, NAND becomes slower with use But only up to a certain level – eventually peaks out Still MUCH faster than typical drives
  34. 34. SSD - Battery Backed DRAM Throughput close to speed of RAM Typically 10**5 IOPS for a single drive Drive is essentially DRAM RAM on a PCI card (example: FusionIO) ...or with a fiber interface (example: DSI3400) Battery backed up to persist storage Be careful about downtime, how long can drive survive with no power? As RAM prices drop, these drives are becoming larger Extremely high throughput, watch the path to the drives
  35. 35. SSD Directly on PCI-X Slot > 10,000 IOPs Mixed read/write Latency < 1ms PCI bus bottleneck
  36. 36. Overview of Drive Characteristics
  37. 37. Question SSD is evolution of DiskOnKey, What is the most dangerous event that can lead you to loss all your Disk On Key data? Don’t put all the eggs on one basket!
  38. 38. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  39. 39. Tools SQLIO IOMETER SQLIOStress - SQLIOSim
  40. 40. SQLIO What is it Domo How to read results
  41. 41. How to Run SQLIO Brent Ozar – MVP, Quest Software
  42. 42. Write This Down. It’s Important. sqlio -kW -t2 -s120 -dM -o1 -frandom -b64 -BH -LS Testfile.dat sqlio -kR -t2 -s120 -dM -o2 -frandom -b64 -BH -LS Testfile.dat sqlio -kW -t2 -s120 -dM -o8 -frandom -b64 -BH -LS Testfile.dat sqlio -kW -t2 -s120 -dM -o16 -frandom -b64 -BH -LS Testfile.dat sqlio -kR -t2 -s120 -dM -o64 -frandom -b64 -BH -LS Testfile.dat sqlio -kR -t2 -s120 -dM -o128 -frandom -b64 -BH -LS Testfile.dat sqlio -kW -t4 -s120 -dM -o1 -fsequential -b64 -BH -LS Testfile.dat sqlio -kR -t4 -s120 -dM -o2 -fsequential -b64 -BH -LS Testfile.dat sqlio -kR -t4 -s120 -dM -o4 - fsequential -b64 -BH -LS Testfile.dat sqlio -kW -t4 -s120 -dM -o8 - fsequential -b64 -BH -LS Testfile.dat
  43. 43. The most important parameters are: -kW means writes (as opposed to reads) -t2 means two threads -s120 means test for 120 seconds -dM means drive letter M -o1 means one outstanding request (not piling up requests) -frandom means random access (as opposed to -fsequential) -b64 means 64kb IOs
  44. 44. The Output E:Program Files (x86)SQLIO>sqlio -kW -t2 -s120 -dM -o1 -frandom -b64 -BH -LS Testfile.dat sqlio v1.5.SG using system counter for latency timings, -1361967296 counts per second 2 threads writing for 120 secs to file M:Testfile.dat using 64KB random IOs enabling multiple I/Os per thread with 1 outstanding buffering set to use hardware disk cache (but not file cache) using current size: 24576 MB for file: M:Testfile.dat initialization done CUMULATIVE DATA: throughput metrics: IOs/sec: 1539.50 MBs/sec: 96.21 latency metrics: Min_Latency(ms): 0 Avg_Latency(ms): 0 Max_Latency(ms): 572 histogram: ms: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24+ %: 66 32 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  45. 45. Jonathan Kehayias
  46. 46. IOMETER What is it Domo How to read results
  47. 47. developed by the Intel Corp. 1998 an I/O subsystem measurement and characterization tool for single and clustered systems. given to the Open Source Development Lab (OSDL). In November 2001 Last update 2008-06-22-rc2
  48. 48. Note: If you leave this field at “0”, IOMeter will use all available disk space. Heuristics: One manager per server. One worker per processor. Can play a significant role in observed performance.
  49. 49. QLIOStress - SQLIOSim What is it Dome How to read results
  50. 50. SQLIOSim Parser
  51. 51. HP StorageWoker
  52. 52. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  53. 53. I/O Controller / HBA Cabling Array Cache Spindle PCI Bus Windows CPU SQL Serv.
  54. 54. hardware between the CPU and the physical drive Different topologies depending on vendor and technology Best Practices: Understand topology, potential bottlenecks and theorectical throughput of components in the path Engage storage engineers early in the process The deeper the topology, the more latency
  55. 55. Controller Network components between disks and server Multiple disks connected to a computer system through a controller Failure detection and recovery (checksum, bad sector remapping)
  56. 56. Disk interface standard Fiber Channel (FC) Fastest Bus Speeds between 2-4 Gigs SCSI -Small Computer System Interconnect, Older Technology, slower bus speeds S(ATA) - ATAdaptor Newer Technology, even slower bus speeds Enterprise Flash Disks (EFDs) Newest Technology, same bus speeds as FC
  57. 57. Cache System cache - Buffer data between disk and interface Disk cache Use DRAM to cache recently accessed blocks Blocks are replaced usually in an LRU order Minimum 8-16MB Needs battery and operational for reliable writes
  58. 58. Cache size, checkpoint 2GB vs. 8GB Key Takeaway: Write cache helps, up to a certain point
  59. 59. I/O Systems interrupts Processor Cache Memory - I/O Bus I/O Controller I/O Controller I/O Controller Main Memory Graphics Disk Disk Network
  60. 60. DAS vs. SAN DAS – Direct Attached Storage Standards: (SCSI), SAS, SATA RAID controller in the machine PCI-X or PCI-E direct access SAN – Storage Area Networks Standards: iSCSI or Fibre Channel (FC) Host Bus Adapters or Network Cards in the machine Switches / Fabric access to the disk
  61. 61. Path to the drives - DAS Shelf Interface Cache Controller PCI Bus Shelf Interface PCI Bus Shelf Interface Cache Controller Shelf Interface
  62. 62. Path to the Drives – DAS ”on chip” Controller PCI Bus PCI Bus Controller
  63. 63. Path to the drives - SAN Cache Fiber Channel Ports Controllers/Processors Switch PCI Bus PCI Bus PCI Bus HBA PCI Bus Switch Best Practice: Make sure you have the tools to monitor the entire path to the drives. Understand utilization of individual componets
  64. 64. SQL Server on DAS - Pros Beware of non disk related bottlenecks SCSI/SAS controller may not have bandwidth to support disks PCI bus should be fast enough Example: Need PCI-X 8x to consume 1GB/sec Can use Dynamic Disks to stripe LUN’s together Bigger, more manageable partitions Often a cheap way to get high performance at low price Very few skills required to configure
  65. 65. SQL Server on DAS - Cons Cannot grow storage dynamically Buy enough capacity from the start … or plan database layout to support growth Example: Allocate two files per LUN to allow doubling of capacity by moving half the files Inexpensive and simple way to create very high performance I/O system Important: No SAN = No Cluster! Must rely on other technologies (ex: Database Mirror) for maintaining redundant data copies Consider requirements for storage specific functionality Ex: DAS will not do snapshots Be careful about single points of failure Example: Loss of single storage controller may cause many drives lost
  66. 66. Numbers to Remember - DAS SAS Cable speed Theoretical: 1.5GB/sec Typical: 1.2GB/sec PCI-X v1 bus X4 slot: 750M/sec X8 slot: 1.5GB/sec X16 – fast enough, around the 3GB/sec PCI-X v2 Bus X4 slot: 1.5 – 1.8GB/sec X8 slot: 3GB/sec Be aware that a PCI-X bus may be “v2 compliant” but still run at v1 speeds.
  67. 67. SQL Server on SAN – Pitfalls (1/2) Sizing on capacity instead of performance Over-estimating the ability of the SAN or array Overutilization of shared storage Lack of knowledge about physical configuration and the potential bottlenecks or expected limits of configuration Makes it hard to tell if performance is reasonable on the configuration Array controller or bandwidth of the connection is often the bottleneck Key Takeaway: Bottleneck is not always the disk One size fits all solutions is probably not optimal for SQL Server Get the full story, make sure you have SAN vendors tools to measure the path to the drives Capture counters that provide the entire picture (see benchmarking section)
  68. 68. SQL Server on SAN – Pitfalls (2/2) Storage technologies are evolving rapidly and traditional best practices may not apply to all configurations Assuming physical design does not matter on SAN Over estimating the benefit of array cache Physical isolation practices become more important at the high end High volume OLTP, large scale DW Isolation of HBA to storage ports can yield big benefits This largely remains true for SAN although some vendors claim it is not needed
  69. 69. Numbers to Remember - SAN HBA speed 4Gbit – Theoretical around 500MB/sec Realistically: between 350 and 400MB/sec 8Gbit will do twice that But remember limits of PCI-X bus An 8Gbit card will require a PCI-X4 v2 slot or faster Max throughput per storage controller Varies by SAN vendor, check specifications Drives are still drives – there is no magic
  70. 70. DAS vs. SAN
  71. 71. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  72. 72. Monitoring - Windows View of I/O Make sure to capture all of these for the complete picture…
  73. 73. Validating a System for High Throughput OLTPCached files vs. Disk Access From Cache By the way – Notice the queue depth  From Disk
  74. 74. Random or Sequential? Knowing if your workload is random or sequential in nature can be a hard question to answer Depends a lot on application design SQL Server Access Methods can give some insights High values of Readahead pages/sec indicates a lot of sequential activity High values of index seeks / sec indicates a lot of random activity Look at the ratio between the two Transaction log is always sequential Best Practice: Isolate transaction log on dedicated drives
  75. 75. Configuring Disks in WindowsThe one slide best practice Use Disk Alignment at 1024KB Use GPT if MBR not large enough Format partitions at 64KB allocation unit size One partition per LUN Only use Dynamic Disks when there is a need to stripe LUNs using Windows striping (i.e. Analysis Services workload) Tools: Diskpar.exe, DiskPart.exe and DmDiag.exe Format.exe Disk Manager
  76. 76. Ensure Disks are Formatted Correctly The worst scenario? Random operations using 64K IO and 64K chunk size. One sector off and you are hitting two disks for every IO thus halving the random performance potential. Note: On a RAID array this means accessing two different stripe units on two separate disks. Graphics Source: Jimmy May ד
  77. 77. 11 / 20; Using Unaligned Partitions ד
  78. 78. Do multiple data files make a difference? Paul S. Randal More drives typically yield better speed True for both SAN and DAS ... Less so for SSD, but still relevant (especially for NAND)
  79. 79. How Many Data Files Do I Need? More data files does not necessarily equal better performance Determined mainly by 1) hardware capacity & 2) access patterns Number of data files may impact scalability of heavy write workloads Potential for contention on allocation structures (PFS/GAM/SGAM) Mainly a concern for applications with high rate of page allocations on servers with >= 8 CPU cores More of a consideration for Tempdb (most cases) Can be used to maximize # of spindles – Data files can be used to “stripe” database across more physical spindles Best practice: Pre-size data/log files, use equal size for files within a single file group and manually grow all files within filegroup at same time (vs. AUTOGROW)
  80. 80. How Many Filegroups? Performance Filegroups can be used to separate tables/indexes - allowing selective placement of these at the disk level Separate objects requiring more data files due to high page allocation rate Can be used to separate I/O patterns Administration consideration (primarily) Backup can be performed at the filegroup or file level Partial availability Database is available if primary filegroup is available; other filegroups can be offline A filegroup is available if all its files are available Tables and Indexes Can specify separate filegroups for in-row data and large-object data Best Practice: Place LOB data in a dedicated filegroup Partitioned Tables Each partition can be in its own filegroup Partition per filegroup may provide better archiving strategy Partitions can be moved in and out of the table Best Practice: Do not place data in PRIMARY filegroup, allocate a new filegroup and set this as default
  81. 81. Share storage environments At the disk level and other shared components (i.e. service processors, cache, etc…)
  82. 82. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  83. 83. Monitoring - SQL Server View of I/O
  84. 84. What is IO for me? Terminology. Tools. The path from client application to the storage and back. What affects the disk performance? Benchmark and Sizing Methodology. Workload - Design for Performance.
  85. 85. Storage Selection General Pitfalls There are organizational barriers between DBA’s and storage administrators Each needs to understand the others “world” Share storage environments At the disk level and other shared components (i.e. service processors, cache, etc…) Sizing only on “capacity” is a common problem Key Takeaway: Take latency and throughput requirements (MB/s, IOPs and max latency) into consideration when sizing storage One size fits all type configurations Storage vendor should have knowledge of SQL Server and Windows best practices when array is configured Especially when advanced features are used (snapshots, replication, etc…)
  86. 86. Disk Subsystem - SQL Server I/O Pattern Understanding I/O characteristics of common SQL Server operations/scenarios can help you determine how to configure storage
  87. 87. OLTP Workloads High number of small Tlog writes (often single digit KB) T-Log buffer is written because Commit is issued by the application Concurrency around writing into T-Log Buffer Majority ‘random’ single page reads
  88. 88. OLAP - DWH Workloads Smaller number of Tlog Writes with longer writes (often 64K) T-Log buffers get written because buffer is full Often ‘sequential’ Read-Ahead with 64K or more from data files
  89. 89. Backup / Restore Backup and restore operations utilize internal buffers for the data being read/written Number of buffers is determined by: The number of data file volumes The number of backup devices Or by explicitly setting BUFFERCOUNT If database files are spread across are a few (or a single) logical volume(s), and there are a few (or a single) output device(s) optimal performance may not be achievable by default Tuning can be achieved by using the BUFFERCOUNT parameter for BACKUP / RESTORE More Information:
  90. 90. FILESTREAM Writes to varbinary(max) will go through the buffer pool and are flushed during checkpoint Reads & Writes to FILESTEAM data does not go through the buffer pool (either T-SQL or Win32) T-SQL uses buffered access to read & write data Win32 can use either buffered or non-buffered Depends on application use of APIs FileStream I/O is not tracked via sys.dm_io_virtual_file_stats Best practice to separate on to separate logical volume for monitoring purposes Writes/Generates to FILESTREAM generates less transaction log volume than varbinary(max) Actual FILESTREAM data is not logged FILESTREAM data is captured as part of database backup and transaction log backup May increase throughput capacity of the transaction log
  91. 91. TEMPDB User group 92 November 2009: Nothing is not more permanent than the temporary
  92. 92. Tools SQLIO Used to stress an I/O subsystem – Test a configuration’s performance SQLIOSim Simulates SQL Server I/O – Used to isolate hardware issues 231619 HOW TO: Use the SQLIOStress Utility to Stress a Disk Subsystem Fiber Channel Information Tool Command line tool which provides configuration information (Host/HBA)
  93. 93. KB Articles KB 824190 Troubleshooting Storage Area Network (SAN) Issues KB 304415: Support for Multiple Clusters Attached to the Same SAN Device KB 280297: How to Configure Volume Mount Points on a Clustered Server KB 819546: SQL Server support for mounted volumes KB 304736: How to Extend the Partition of a Cluster Shared Disk KB 325590: How to Use Diskpart.exe to Extend a Data Volume KB 328551: Concurrency enhancements for the tempdb database KB 304261: Support for Network Database Files
  94. 94. General Storage References Microsoft Windows Clustering: Storage Area Networks StorPort in Windows Server 2003: Improving Manageability and Performance in Hardware RAID and Storage Area Networks Virtual Device Interface Specification Windows Server System Storage Home Microsoft Storage Technologies – Multipath I/O Storage Top 10 Best Practices
  95. 95. Recommended Courses: SQL 2008 Maintaining a Microsoft SQL Server 2008 Database Implementing a Microsoft SQL Server 2008 Database ועכשיו מבצע מיוחד! הרשם לאחד מקורסי SQL המוצעים כאן קנה בחינת הסמכה משלימה: 70-432 TS: Microsoft SQL Server 2008, Implementation and Maintenance 70-433 TS: Microsoft SQL Server 2008, Database Development וקבל מנוי שנתי ל- TechNet ואפשרות לבחינה חוזרת ללא עלות לפרטים נוספים והרשמה, פנה למכללות המוסמכות
  96. 96. Recommended Courses: Business Intelligence Implementing and Maintaining Microsoft SQL Server 2008 Reporting Services Implementing and Maintaining Microsoft SQL Server 2008 Integration Services Implementing and Maintaining Microsoft SQL Server 2008 Analysis Services לפרטים נוספים והרשמה, פנה למכללות המוסמכות
  97. 97. משובים ופייסבוק מירב- השלמה