HeroLympics Eng V03 Henk Vd Valk

861 views

Published on

Learn from this study how to get amazing performance from SSIS2008 and SQL2008!

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
861
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • HeroLympics Eng V03 Henk Vd Valk

    1. 2. SQL Server 2005 vs 2008 Integration Services World Record Performance! Henk van der Valk Workload Performance Architect Unisys ES7000 Performance Centers [email_address]
    2. 3. Agenda <ul><ul><li>Performance : </li></ul></ul><ul><ul><li>SSIS 2005 vs SSIS 2008 performance study </li></ul></ul><ul><ul><li>Optimizing tricks for SSIS Bulk inserts </li></ul></ul><ul><ul><li>And if time permits …. </li></ul></ul><ul><ul><li>SQL2008 Storage/IO tuning </li></ul></ul><ul><ul><li>Windows 2008 tuning </li></ul></ul>
    3. 4. About the speaker <ul><li>Deals with the largest SQL environments in the world </li></ul><ul><li>Co-Founder ES7000 Performance Centers (2001) </li></ul><ul><li>Performance optimizer & troubleshooter </li></ul><ul><li>Hosting Dutch SQLPass Performance SIG </li></ul><ul><li>Windows Datacenter Edition Certified </li></ul>
    4. 5. Performance Study Goals <ul><li>Work with Microsoft SSIS Dev team to test improvements of the reworked SSIS Pipeline, from SQL2008 versus SQL 2005 </li></ul><ul><li>Document performance & scalability for loading and transforming data sets running on the Unisys ES7000/one (x64) servers </li></ul><ul><li>Live Demo’s - Windows2008 DC build 6.0.6001 / SQL 2008 EE (Katmai) 10.0.1300.04 </li></ul><ul><li>SSIS Test history: </li></ul><ul><ul><li>SSIS Sept. 2004 , SQL 2005 Beta2, build 9.00.954 </li></ul></ul><ul><ul><li>SSIS Feb. 2005 , SQL 2005 IDW13, build 9.00.1094 </li></ul></ul><ul><ul><li>SSIS Dec 2006, Katmai build 9.0.9086.2 </li></ul></ul><ul><ul><li>Katmai SSIS: build 10.0.1075.7 (SQL_PreRelease).070927-0159 ) </li></ul></ul><ul><ul><li>Versus: SQL2005-Post SP2: Version 9.0.3175 </li></ul></ul>
    5. 6. What’s the big deal with the Optimized Dataflow engine in SSIS 2008? A quick overview
    6. 7. De SSIS Pipeline XML DB Sources Flat File Dests RAW Custom DB Flat File Custom File OLEDB Data Destination ODBC CUSTOM Raw Adapters FLATFILE Derived Column Conditional Split Aggregate Fuzzy Lookup Merge Join RAW OLEDB Data Source ODBC CUSTOM Raw Adapters FLATFILE
    7. 8. OS Platform memory support Fast shared memory connection for running SSIS / SQL on same system! 32bit: Each SSIS package can use 3GB RAM, (up to 20 in parallel) 64bit: Each SSIS package can use up to 2TB, practically “unlimited for now”! IA32 (32 CPU’s) IA64 (64 Cores) X64 (64 Cores) Windows Total Virtual Memory 4 GB 16 TB 16 TB Per Process Virtual Addressable Memory 2 or 3 GB 8 TB 8 TB Supported physical memory 64 GB 2 TB 2 TB
    8. 9. Hardware configuration ES7000 /540 16 way/16GB 32-bit 3.0 GHz Xeon MP ES7000 /420 16 way/64GB, 64-bit 1.5 GHz Itanium-2 Unisys ES7000 /one 64 Core/256GB 3.4 GHz x64
    9. 10. Lab Infrastructure <ul><li>Both ES7000/one systems are identical configured (32cores / 128GB) </li></ul>
    10. 11. Test approach <ul><li>The starting point: </li></ul><ul><ul><li>TPC-H Schema (Decision support benchmark Schema) </li></ul></ul><ul><ul><li>Random data generation utility to generate Input files </li></ul></ul><ul><ul><li>Loading data from flat files (16 columns of data) </li></ul></ul><ul><ul><li>Each Line-item file : </li></ul></ul><ul><ul><li>Measure duration + resource utilization </li></ul></ul><ul><ul><li>Increase the amount of parallelism by loading more files simultaneously </li></ul></ul><ul><ul><li>Increase HW resources (more CPU ’ s, type of CPUs ) </li></ul></ul><ul><ul><li>Compare Yukon vs Katmai execution times + max. total handling capacity. </li></ul></ul>Size of flat files Number of rows 10 x 21.6 GByte 10 x 100 Million
    11. 12. Introduction to Aggregations … Fact Table numerieke performance measurements Dimension Table Dimension Table Dimension Table Dimension Table Employee_Dim EmployeeKey EmployeeID ... Time_Dim TimeKey TheDate ... Product_Dim ProductKey ProductID ... Customer_Dim CustomerKe y CustomerID ... Location_Dim LocationKey LocationID ... Sales _ Fact TimeKey EmployeeKey ProductKey CustomerKey LocationKey Sales ...
    12. 13. Package details <ul><li>Number of distinct values: </li></ul><ul><ul><li>Col1: 1 – 1000 </li></ul></ul><ul><ul><li>Col2: 1 – 20 </li></ul></ul><ul><ul><li>Col3: 1 – 1000 </li></ul></ul><ul><ul><li>Col4: 1 – 1000 </li></ul></ul><ul><li>Packages created </li></ul><ul><li>8 aggregations using </li></ul><ul><li>Group-bys </li></ul><ul><li>Each Aggregation had: </li></ul><ul><ul><li>8 – (10) measures </li></ul></ul><ul><ul><li>comprised of min, max, sums, averages and counts </li></ul></ul>Aggs Col1 col2 Col3 Col4 Agg1  Agg2  Agg3  Agg4  Agg5   Agg6   Agg7   Agg8   Agg9    Agg 10   
    13. 14. Aggregation Package 1 st design
    14. 15. 1 st package / base design execution
    15. 16. Aggregation Results 1 st package <ul><li>To process 1 file with 100 Million LineItems: </li></ul><ul><li>Feb. 2005: On Itanium2 64-bit - SQL2005 this took : </li></ul><ul><ul><li>2 hours, 10 minutes </li></ul></ul><ul><li>With hardware upgrade to ES7000/one - X64 </li></ul><ul><ul><li>Execution on both SQL2005 and SQL2008 : </li></ul></ul><ul><ul><li>1 hour, 33 minutes </li></ul></ul><ul><li>Conclusion: </li></ul><ul><ul><li>- New x64 hardware: 35% gain </li></ul></ul><ul><ul><li>- No Katmai performance improvements for this … basic aggr. (blocking component. ) </li></ul></ul>
    16. 17. Katmai Data Flow Engine a new Worker thread for each execution tree
    17. 18. Package with Multicast Aggregating up to 1 billion lineItems <ul><li>8 aggregations, 8 transforms with Multicast: </li></ul><ul><li>Test scenario: </li></ul><ul><ul><li>run 1, up to 10 packages of 100 Million lines each, </li></ul></ul><ul><ul><li>Execute in parallel </li></ul></ul>
    18. 19. Base Aggregation <ul><li>1hour 4 min . to aggregate </li></ul><ul><ul><li>100 million rows /22 GB </li></ul></ul>
    19. 20. Optimization1: Conditional split <ul><li>SQL2005: </li></ul><ul><ul><li>27min. 48 sec </li></ul></ul><ul><li>SQL2008: 10min. 55 sec </li></ul>
    20. 21. Katmai Dataflow engine @ work <ul><li>Yukon: 27min. 48 sec </li></ul><ul><li>Katmai: 10min. 55 sec </li></ul>
    21. 22. -Yukon- Optimization Use “Union All’s” to create parallelism <ul><li>SQL2005 - Elapsed time from 27min. 48 sec </li></ul><ul><li>Down to : 00:10 min 57sec </li></ul><ul><ul><li>Up to 6 CPU’s are fully utilized </li></ul></ul>
    22. 23. Katmai Dataflow engine @ work <ul><li>Katmai - Elapsed time : 00:10min 55 sec </li></ul><ul><li>Conclusion: No need for “Union Alls” in SQL2008 </li></ul>
    23. 24. <ul><li>SQL2005 </li></ul><ul><li>10 packages in parallel: </li></ul><ul><li>Avg. CPU load 37% </li></ul><ul><li>140 Mbyte/sec read Disk IO from FlatFiles </li></ul>Aggregating 1 billion LineItems SQL2005 with Multicast
    24. 25. <ul><li>SQL2008 </li></ul><ul><li>10 packages in parallel: </li></ul><ul><li>Avg. CPU load 100% </li></ul><ul><li>270 MByte/sec read Disk IO from FlatFiles </li></ul>Aggregating 1 billion LineItems SQL2008 with Multicast
    25. 26. 2nd package with Multicast Aggregating 1 billion lineItems <ul><li>Using the Multicast in Katmai provides significant increase in throughput </li></ul><ul><li>Processing 8+ packages in parallel use all available 32 Cores </li></ul>
    26. 27. Basic flat file throughput
    27. 28. Demo Flat file input source throughput <ul><li>Reading from a 100 mill row / 22GB flat file – </li></ul><ul><li> 15 / 5 / 1 Columns </li></ul><ul><li>Itanium2 / Yukon : 20 / 35 / 55 MB/sec </li></ul><ul><li>x64 both Katmai / Yukon : 72 / 92 / 130 MB/sec </li></ul><ul><li>(new hardware) </li></ul>5 1 15 col. 5 col. 1 col. 15
    28. 29. Data flow Engine threads <ul><li>Yukon # Engine threads =5 </li></ul><ul><li>Katmai # Engine threads =10 </li></ul><ul><ul><li>Sysinternals.com - Process Explorer (look at the thread tab – CPU & contxt switches) </li></ul></ul><ul><ul><li>Pslist.exe dtsdebughost /d </li></ul></ul><ul><ul><li>Tlist.exe dtsdebughost </li></ul></ul>
    29. 30. “ World record” TPCH Data loading on the Unisys ES7000/one with Windows2008 & SQL 2008 - Bulk Inserts -
    30. 31. Agenda <ul><li>SQL 2008 tuning </li></ul><ul><ul><li>Optimizations / Configuration </li></ul></ul><ul><ul><li>Filegroups / files vs Write IO size </li></ul></ul>
    31. 32. 1 File per Filegroup? <ul><li>SQL2008: 256KB Write IOs </li></ul><ul><li>1 Filegroup gives variable blocksizes (64 KB - 256 KB IO’s) </li></ul><ul><li>Check for PageIOLatch_UP </li></ul>
    32. 33. Initial : Maximum performance ?
    33. 34. Limit hit at 400 MB/sec read
    34. 35. 64 core / 64 Bulk Inserts with –x <ul><li>Minidump analysis show lots of perf logging overhead </li></ul><ul><li>Starting SQLServer with –x option boosts throughput: </li></ul>
    35. 36. SQL Server with startup parameter -x
    36. 37. Tip: Soft Numa on 64 cores <ul><li>Assign BULK INSERT Tasks to dedicated CPU’s (both SQL2005/2008) </li></ul><ul><li>[HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL Server100NodeConfigurationNode63] </li></ul><ul><li>&quot;CpuMask&quot;=hex:00,00,00,00,00,00,00,80 </li></ul><ul><li>[HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLServerSuperSocketNetLibTcp] </li></ul><ul><li>&quot;ListenOnAllIPs&quot;=dword:00000001 </li></ul><ul><li>[HKEY_LOCAL_MACHINESOFTWAREMicrosoftMicrosoft SQL ServerMSSQL10.MSSQLSERVERMSSQLServerSuperSocketNetLibTcpIPAll] </li></ul><ul><li>&quot;TcpPort&quot;=&quot;2000[0x00000001],2001[0x00000002], </li></ul>
    37. 38. Tip: Sharpen data type Money type (13% improvement) <ul><li>Use Money type instead of decimal columns  </li></ul><ul><ul><li>Storing as money (a 8-byte integer with implied 4 decimal digits).  TDS (Tabular Data Stream) is the format SQL Server uses for transfer of data over the wire, and it does not support decimal or numeric.  </li></ul></ul><ul><ul><li>(Both Yukon and Katmai) </li></ul></ul>
    38. 39. T-SQL bulkInserts <ul><li>lineItems 64000 rows/sec per CPU </li></ul>
    39. 40. World record - TPCH Data loading With SSIS 2008 Bulk Inserts
    40. 41. Agenda <ul><li>Now SQL2008 is fully optimized, shifting focus to SSIS : </li></ul><ul><li>Infrastructure </li></ul><ul><li>Windows2008 </li></ul><ul><ul><li>network parameters </li></ul></ul><ul><ul><li>Interrupt Affinity tool </li></ul></ul><ul><ul><li>settings </li></ul></ul><ul><li>SSIS Package optimizations </li></ul>
    41. 43. SSIS Base package – Control Flow
    42. 44. SSIS Base package – Data Flow <ul><li>Data types sharpened </li></ul>
    43. 45. IntPolicy tool / Interrupt Affinity <ul><li>On ES7000 SQL Server, assign NIC interrupts & DPC’s onto CPU‘s </li></ul>
    44. 46. Interrupt Affinity set for 8 network cards
    45. 47. Intel Pro/1000 MT <ul><li>Apply changes to each of the (16) network cards: </li></ul><ul><li>0) Adaptive Inter-Frame spacing disabled </li></ul><ul><li>1) Flow control = Tx & Rx enabled </li></ul><ul><li> client & server Interrupt Moderation = Medium </li></ul><ul><li>2) Jumbo Packet = 9014 bytes enabled </li></ul><ul><li>3) Client & server Interrupt Moderation = Medium </li></ul><ul><li>Coalesc buffers = 256 </li></ul><ul><li>4) Set server Rx buffers to 512 and server Tx buffers to 512 </li></ul><ul><li>5) Set client Rx buffers to 512 and client Tx buffers to 256 </li></ul><ul><li>6) Link speed 1000mbps Full Duplex </li></ul>
    46. 48. Other flat file best practices <ul><li>Use Fast Parse option when possible: </li></ul><ul><ul><li>Flat file source /destinations </li></ul></ul><ul><ul><li>Data Conversion and Derived Column transformations </li></ul></ul><ul><ul><li>Integer data types and date/time formats </li></ul></ul><ul><li>Reduce volume where possible </li></ul><ul><ul><li>Don’t push unneeded columns </li></ul></ul><ul><ul><li>Conditional split for filtering rows </li></ul></ul><ul><ul><li>Do not parse or convert columns unnecessarily </li></ul></ul><ul><ul><ul><li>In a fixed-width format you can combine adjacent unneeded columns into one </li></ul></ul></ul><ul><ul><ul><li>Leave unneeded columns as strings </li></ul></ul></ul>
    47. 49. SSIS Bulk Insert package : <ul><li>Some basic optimizations found </li></ul><ul><li>Elapsed time: 02 min 56 sec </li></ul><ul><li>DtsDebughost.exe: 1.7 GB flatfile Read 1.9 GB write to SQLServer Approx. 200 MB RAM </li></ul>
    48. 50. IO Throughput <ul><li>Average Read bytes/sec 10 MB/sec </li></ul><ul><li>Average Write bytes/sec 12 MB/sec </li></ul>
    49. 51. SSIS - IO tuning <ul><li>Observation: SSIS – 14K Reads vs 465K writes (128 KB IO Read) SQL - 465 K Reads vs 8800 Writes (256 KB IO Write) </li></ul><ul><li>->> Time to tune Data transport between SSIS and SQL! </li></ul>
    50. 52. Tip: Increase packet size <ul><li>SSIS: Connection Mgr. Packet Size from 0 into 32K </li></ul>
    51. 53. <ul><li>Result: 465K writes down to 58K write IOs </li></ul><ul><li>Elapsed time: 2 min 36 sec (= 20 sec less) </li></ul>
    52. 54. SQL Server 2008 startup options <ul><li>Use SQL startup flags: </li></ul><ul><li>-x (Do not collect perfmon data) </li></ul><ul><li>-E 256k Allocs per file , default 64K </li></ul><ul><li>Use Network packet size: 32767 </li></ul><ul><li>Lock pages in memory privilege </li></ul>
    53. 57. Result: Packages completed in less then 1800 seconds!
    54. 58. Windows  2008 Optimizations
    55. 59. Change DEP (Windows2008) <ul><li>Tip from the Unisys TPCC Benchmark team: </li></ul><ul><li>bcdedit /set nx OptIn   </li></ul><ul><ul><li>the w2k8 DEP policy by default is OptOut </li></ul></ul>
    56. 60. Windows 2008 optimization <ul><li>Enable MPIO, discover Multi-Path IO </li></ul>
    57. 61. Seeing Today - Securing tomorrow [email_address]

    ×