HeroLympics Eng V03 Henk Vd Valk

SQL Server 2005 vs 2008 Integration Services World Record Performance! Henk van der Valk Workload Performance Architect Unisys ES7000 Performance Centers [email_address]

Agenda Performance : SSIS 2005 vs SSIS 2008 performance study Optimizing tricks for SSIS Bulk inserts And if time permits …. SQL2008 Storage/IO tuning Windows 2008 tuning

About the speaker Deals with the largest SQL environments in the world Co-Founder ES7000 Performance Centers (2001) Performance optimizer & troubleshooter Hosting Dutch SQLPass Performance SIG Windows Datacenter Edition Certified

Performance Study Goals Work with Microsoft SSIS Dev team to test improvements of the reworked SSIS Pipeline, from SQL2008 versus SQL 2005 Document performance & scalability for loading and transforming data sets running on the Unisys ES7000/one (x64) servers Live Demo’s - Windows2008 DC build 6.0.6001 / SQL 2008 EE (Katmai) 10.0.1300.04 SSIS Test history: SSIS Sept. 2004 , SQL 2005 Beta2, build 9.00.954 SSIS Feb. 2005 , SQL 2005 IDW13, build 9.00.1094 SSIS Dec 2006, Katmai build 9.0.9086.2 Katmai SSIS: build 10.0.1075.7 (SQL_PreRelease).070927-0159 ) Versus: SQL2005-Post SP2: Version 9.0.3175

What’s the big deal with the Optimized Dataflow engine in SSIS 2008? A quick overview

De SSIS Pipeline XML DB Sources Flat File Dests RAW Custom DB Flat File Custom File OLEDB Data Destination ODBC CUSTOM Raw Adapters FLATFILE Derived Column Conditional Split Aggregate Fuzzy Lookup Merge Join RAW OLEDB Data Source ODBC CUSTOM Raw Adapters FLATFILE

OS Platform memory support Fast shared memory connection for running SSIS / SQL on same system! 32bit: Each SSIS package can use 3GB RAM, (up to 20 in parallel) 64bit: Each SSIS package can use up to 2TB, practically “unlimited for now”! IA32 (32 CPU’s) IA64 (64 Cores) X64 (64 Cores) Windows Total Virtual Memory 4 GB 16 TB 16 TB Per Process Virtual Addressable Memory 2 or 3 GB 8 TB 8 TB Supported physical memory 64 GB 2 TB 2 TB

Hardware configuration ES7000 /540 16 way/16GB 32-bit 3.0 GHz Xeon MP ES7000 /420 16 way/64GB, 64-bit 1.5 GHz Itanium-2 Unisys ES7000 /one 64 Core/256GB 3.4 GHz x64

Lab Infrastructure Both ES7000/one systems are identical configured (32cores / 128GB)

Test approach The starting point: TPC-H Schema (Decision support benchmark Schema) Random data generation utility to generate Input files Loading data from flat files (16 columns of data) Each Line-item file : Measure duration + resource utilization Increase the amount of parallelism by loading more files simultaneously Increase HW resources (more CPU ’ s, type of CPUs ) Compare Yukon vs Katmai execution times + max. total handling capacity. Size of flat files Number of rows 10 x 21.6 GByte 10 x 100 Million

Introduction to Aggregations … Fact Table numerieke performance measurements Dimension Table Dimension Table Dimension Table Dimension Table Employee_Dim EmployeeKey EmployeeID ... Time_Dim TimeKey TheDate ... Product_Dim ProductKey ProductID ... Customer_Dim CustomerKe y CustomerID ... Location_Dim LocationKey LocationID ... Sales _ Fact TimeKey EmployeeKey ProductKey CustomerKey LocationKey Sales ...

Package details Number of distinct values: Col1: 1 – 1000 Col2: 1 – 20 Col3: 1 – 1000 Col4: 1 – 1000 Packages created 8 aggregations using Group-bys Each Aggregation had: 8 – (10) measures comprised of min, max, sums, averages and counts Aggs Col1 col2 Col3 Col4 Agg1  Agg2  Agg3  Agg4  Agg5   Agg6   Agg7   Agg8   Agg9    Agg 10   

Aggregation Package 1 st design

1 st package / base design execution

Aggregation Results 1 st package To process 1 file with 100 Million LineItems: Feb. 2005: On Itanium2 64-bit - SQL2005 this took : 2 hours, 10 minutes With hardware upgrade to ES7000/one - X64 Execution on both SQL2005 and SQL2008 : 1 hour, 33 minutes Conclusion: - New x64 hardware: 35% gain - No Katmai performance improvements for this … basic aggr. (blocking component. )

Katmai Data Flow Engine a new Worker thread for each execution tree

Package with Multicast Aggregating up to 1 billion lineItems 8 aggregations, 8 transforms with Multicast: Test scenario: run 1, up to 10 packages of 100 Million lines each, Execute in parallel

Base Aggregation 1hour 4 min . to aggregate 100 million rows /22 GB

Optimization1: Conditional split SQL2005: 27min. 48 sec SQL2008: 10min. 55 sec

Katmai Dataflow engine @ work Yukon: 27min. 48 sec Katmai: 10min. 55 sec

-Yukon- Optimization Use “Union All’s” to create parallelism SQL2005 - Elapsed time from 27min. 48 sec Down to : 00:10 min 57sec Up to 6 CPU’s are fully utilized

Katmai Dataflow engine @ work Katmai - Elapsed time : 00:10min 55 sec Conclusion: No need for “Union Alls” in SQL2008

SQL2005 10 packages in parallel: Avg. CPU load 37% 140 Mbyte/sec read Disk IO from FlatFiles Aggregating 1 billion LineItems SQL2005 with Multicast

SQL2008 10 packages in parallel: Avg. CPU load 100% 270 MByte/sec read Disk IO from FlatFiles Aggregating 1 billion LineItems SQL2008 with Multicast

2nd package with Multicast Aggregating 1 billion lineItems Using the Multicast in Katmai provides significant increase in throughput Processing 8+ packages in parallel use all available 32 Cores

Demo Flat file input source throughput Reading from a 100 mill row / 22GB flat file – 15 / 5 / 1 Columns Itanium2 / Yukon : 20 / 35 / 55 MB/sec x64 both Katmai / Yukon : 72 / 92 / 130 MB/sec (new hardware) 5 1 15 col. 5 col. 1 col. 15

Data flow Engine threads Yukon # Engine threads =5 Katmai # Engine threads =10 Sysinternals.com - Process Explorer (look at the thread tab – CPU & contxt switches) Pslist.exe dtsdebughost /d Tlist.exe dtsdebughost

“ World record” TPCH Data loading on the Unisys ES7000/one with Windows2008 & SQL 2008 - Bulk Inserts -

Agenda SQL 2008 tuning Optimizations / Configuration Filegroups / files vs Write IO size

1 File per Filegroup? SQL2008: 256KB Write IOs 1 Filegroup gives variable blocksizes (64 KB - 256 KB IO’s) Check for PageIOLatch_UP

Initial : Maximum performance ?

64 core / 64 Bulk Inserts with –x Minidump analysis show lots of perf logging overhead Starting SQLServer with –x option boosts throughput:

SQL Server with startup parameter -x

Tip: Soft Numa on 64 cores Assign BULK INSERT Tasks to dedicated CPU’s (both SQL2005/2008) [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\100\NodeConfiguration\Node63] "CpuMask"=hex:00,00,00,00,00,00,00,80 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQLServer\SuperSocketNetLib\Tcp] "ListenOnAllIPs"=dword:00000001 [HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQLServer\SuperSocketNetLib\Tcp\IPAll] "TcpPort"="2000[0x00000001],2001[0x00000002],

Tip: Sharpen data type Money type (13% improvement) Use Money type instead of decimal columns Storing as money (a 8-byte integer with implied 4 decimal digits). TDS (Tabular Data Stream) is the format SQL Server uses for transfer of data over the wire, and it does not support decimal or numeric. (Both Yukon and Katmai)

T-SQL bulkInserts lineItems 64000 rows/sec per CPU

World record - TPCH Data loading With SSIS 2008 Bulk Inserts

Agenda Now SQL2008 is fully optimized, shifting focus to SSIS : Infrastructure Windows2008 network parameters Interrupt Affinity tool settings SSIS Package optimizations

SSIS Base package – Control Flow

SSIS Base package – Data Flow Data types sharpened

IntPolicy tool / Interrupt Affinity On ES7000 SQL Server, assign NIC interrupts & DPC’s onto CPU‘s

Interrupt Affinity set for 8 network cards

Intel Pro/1000 MT Apply changes to each of the (16) network cards: 0) Adaptive Inter-Frame spacing disabled 1) Flow control = Tx & Rx enabled client & server Interrupt Moderation = Medium 2) Jumbo Packet = 9014 bytes enabled 3) Client & server Interrupt Moderation = Medium Coalesc buffers = 256 4) Set server Rx buffers to 512 and server Tx buffers to 512 5) Set client Rx buffers to 512 and client Tx buffers to 256 6) Link speed 1000mbps Full Duplex

Other flat file best practices Use Fast Parse option when possible: Flat file source /destinations Data Conversion and Derived Column transformations Integer data types and date/time formats Reduce volume where possible Don’t push unneeded columns Conditional split for filtering rows Do not parse or convert columns unnecessarily In a fixed-width format you can combine adjacent unneeded columns into one Leave unneeded columns as strings

SSIS Bulk Insert package : Some basic optimizations found Elapsed time: 02 min 56 sec DtsDebughost.exe: 1.7 GB flatfile Read 1.9 GB write to SQLServer Approx. 200 MB RAM

IO Throughput Average Read bytes/sec 10 MB/sec Average Write bytes/sec 12 MB/sec

SSIS - IO tuning Observation: SSIS – 14K Reads vs 465K writes (128 KB IO Read) SQL - 465 K Reads vs 8800 Writes (256 KB IO Write) ->> Time to tune Data transport between SSIS and SQL!

Tip: Increase packet size SSIS: Connection Mgr. Packet Size from 0 into 32K

Result: 465K writes down to 58K write IOs Elapsed time: 2 min 36 sec (= 20 sec less)

SQL Server 2008 startup options Use SQL startup flags: -x (Do not collect perfmon data) -E 256k Allocs per file , default 64K Use Network packet size: 32767 Lock pages in memory privilege

Result: Packages completed in less then 1800 seconds!

Windows  2008 Optimizations

Change DEP (Windows2008) Tip from the Unisys TPCC Benchmark team: bcdedit /set nx OptIn the w2k8 DEP policy by default is OptOut

Windows 2008 optimization Enable MPIO, discover Multi-Path IO

Seeing Today - Securing tomorrow [email_address]

HeroLympics Eng V03 Henk Vd Valk

More Related Content

What's hot

Viewers also liked

Similar to HeroLympics Eng V03 Henk Vd Valk

Recently uploaded

HeroLympics Eng V03 Henk Vd Valk