Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presentation db2 best practices for optimal performance

3,123 views

Published on

Presentation db2 best practices for optimal performance

Published in: Technology

Presentation db2 best practices for optimal performance

  1. 1. October 25–29, 2009 • Mandalay Bay • Las Vegas, Nevada 0 DB2 Best Practices for Optimal Performance Sunil Kamath Senior Technical Staff Member IBM Toronto Labs sunil.kamath@ca.ibm.com
  2. 2. Download this slide http://ouo.io/Fyhxku
  3. 3. Agenda Basics – Sizing workloads – Best Practices for Physical Design Benchmarks DB2 9.7 Performance Improvements Summary 1 – Scan Sharing – XML in DPF – Statement Concentrator – Currently Committed – LOB Inlining – Compression – Index Compression – Temp Table Compression – XML Compression – Range Partitioning with local indexes
  4. 4. Performance “Truisms” There is always a bottleneck! Remember the 5 fundamental bottleneck areas: 1. Application 2. CPU 3. Memory 4. Disk 5. Network Balance is key! 2
  5. 5. Ideally one should understand: – The application – Load process requirements – Number of concurrent users/jobs – Largest tables' sizes – Typical query scenarios – Size of answer sets being generated – Response time objectives for loads and queries – Availability requirements – … Sizing a Configuration 3
  6. 6. Sizing “Rules of Thumb” Platform choice CPU Memory Disk – Space – Spindles 4
  7. 7. Platform Selection DB2 is highly optimized for all major platforms – AIX, Linux, Windows, Solaris, HP-UX – 64 bit is strongly recommended Much more than a performance question – Integration with other systems – Skills / Ease of Use – $$$ Often more than 1 “good” choice 5
  8. 8. Selecting DB2 with and without Data Partitioning (InfoSphere Warehouse) Differences becoming smaller – Function and manageability gaps Data Partitioning is less common for – OLTP,ERP,CRM Data Partitioning is most common for – Data Warehousing 6
  9. 9. Memory! How Much Do I Need? Highly dependent on many factors – Depends on number of users (connections) – Depends on the query workload – Depends on whether or not other software is sharing the machines being measured Advisable to allocate 5% of active data for bufferpool sizing New systems use 64-bit processors – If using 32-bit Windows/Linux/DB2 just use 4GB. 7
  10. 10. Disk! How Many GB Do I Need? More than you think! Don’t forget about – Working storage – Tempspace – Indexes, MQT’s etc. But big drives tend to give lots of space – 146/300GB drives now standard Raw data x 4 (unmirrored)* Raw data x 5 (RAID5)* Raw data x 8 (RAID10)* * Assumes no compression8
  11. 11. Disk! How Many Spindles Do I Need? Need to define a balanced system – Don't want too few large disks • Causes I/O bottleneck Different kinds of requirements – IOPS • Latency – MB/sec • Throughput Don’t share disks for table/indexes with logs Don’t know how many disks in the SAN? – Make friends with storage Admin! 9
  12. 12. Basic Rules of Thumb (RoT) Meant to be approximate guidelines: – 150-200 GB active data per core – 50 concurrent connections per core – 8 GB RAM per core – 1500-2000 IOPS per core The above guidelines works for most virtualization environments as well These RoT are NOT meant to be a replacement or alternative to real workload sizing 10
  13. 13. Additional Considerations for Virtualized environments Performance overhead with Hypervisor – Varies with type of hypervisor and environment Effect of over committing CPU at “system” level Effect of over committing memory at “system” level Effects of sharing same disks for multiple workloads 11
  14. 14. Building Your Database 12
  15. 15. Physical Database Design Create 1 database for each DB2 instance Issue “create database” with – Unicode codeset • Default starting with DB2 9.5 – Automatic Storage • Storage paths for tables/indexes etc • DBPATH for log etc. – Suitable pagesize Example – CREATE DB <DBNAME> AUTOMATIC STORAGE YES ON /fs1/mdmdb, /fs2/mdmdb, /fs3/mdmdb, /fs4/mdmdb DBPATH on /fs0/mdmdb USING CODESET UTF-8 TERRITORY <TERRITORY> COLLATE USING UCA400_NO PAGESIZE 8K; Suggestion: Make everything explicit to facilitate understanding 13
  16. 16. Selecting a Page Size Use a single page size if possible – For example, 8K or 16K With LARGE tablespaces there is ample capacity for growth OLTP – Smaller page sizes may be better (e.g. 8K) Warehouse – Larger pages sizes often beneficial (e.g. 16K) XML – Use 32K page size Choosing an appropriate pagesize should depend on access pattern of rows (sequential Vs random) With DB2 9.7, the tablespace limits have increased by 4x; For example, with 4K page size, the max tablespace size is now 8 TB 14
  17. 17. Tablespace Design Use automatic storage – Significant enhancements in DB2 9.7 Use Large tablespaces – Default since DB2 9.5 Disable file system caching via DDL as appropriate Ensure temp tablespaces exist – 1 for each page size, ideally just 1 Keep number of tablespaces reasonably small – 1 for look up tables in single node nodegroup – 1 for each fact table (largest tables) – 1 for all others Create separate tablespaces for indexes, LOBs Large tablespaces further help exploit table/index/temp compression 15
  18. 18. Choosing DMS vs. SMS Goal: – Performance of RAW – Simplicity/usability of SMS DMS FILE is the preferred choice – Performance is near DMS RAW • Especially when bypassing filesystem caching – Ease of use/management is similar to SMS • Can gradually extend the size – Flexible • Can add/drop containers • Can separate data/index/long objects into their own table space – Potential to transition to Automatic Storage Automatic storage is built on top of DMS FILE – But it automates container specification / management 16
  19. 19. Choosing DMS FILE vs. Automatic Storage Goal: – To maximize simplicity/usability Automatic Storage is the preferred choice with DB2 9.5 – Strategic direction • Receives bulk of development investment – Key enabler/prerequisite for future availability/scalability enhancements – Performance is equivalent to DMS FILE – Ease of use/management is superior • No need to specify any containers • Makes it easy to have many table spaces – Flexible • Can add/drop storage paths 17
  20. 20. Consider Schema optimizations Decide on how to structure your data – Consider distributing your data across nodes • Using DPF hash-partitioning – Consider partitioning your data by ranges • Using table range partitioning – Consider organizing your data • Using MDC (multi dimensional clustering) Auxiliary data structures – Do the right indexes exist ? • Clustered, clustering, include columns for unique index – Would Materialized query tables (MQT) help? You can feed dynamic snapshot into design advisor 18
  21. 21. Table Design OK to have multiple tables in a tablespace Once defined, use ALTER table to select options – APPEND MODE - use for tables where inserts are at end of table (ALTER TABLE ... APPEND ON) • This also enables concurrent append points for high concurrent INSERT activity – LOCKSIZE - use to select table level locking (ALTER TABLE ... LOCKSIZE TABLE) – PCTFREE - use to reserve space during load/reorg (ALTER TABLE ...PCTFREE 10) Add pk/fk constraints after index creation 19
  22. 22. Table Design - Compression Compress base table data at row level – Build a static dictionary, one per table On-disk and in-memory image is smaller Need to uncompress data before processing Classic tradeoff: more CPU for less disk I/O – Great for IO-bound systems that have spare CPU cycles Large, rarely referenced tables are ideal 20
  23. 23. Index Design In general, every table should have at least 1 index – Ideally a unique index / primary key index Choose appropriate options – PCTFREE - should be 0 for read-only table – PAGE SPLIT HIGH/LOW – for ascending inserts especially – CLUSTER - define a clustering index – INCLUDE columns - extra cols in unique index for index-only access – COLLECT STATISTICS while creating an index With DB2 9.7 indexes can be compressed too! 21
  24. 24. Benchmarks DB2 is the performance leader TPoX 22
  25. 25. World Record Performance With TPC-C 4,033,378 3,210,540 6,085,166 200,000 1,200,000 2,200,000 3,200,000 4,200,000 5,200,000 6,200,000 7,200,000 tpmC DB2 8.2 on 64-way POWER5 DB2 9.1 on 64-way POWER5+ DB2 9.5 on 64-way POWER6 64x 1.9GHz POWER5 2 TB RAM 6400 disks 64x 2.3GHz POWER5+ 2 TB RAM 6400 disks TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council. • DB2 8.2 on IBM System p5 595 (64 core POWER5 1.9GHz): 3,210,540 tpmC @ $5.07/tpmC available: May 14, 2005 • DB2 9.1 on IBM System p5 595 (64 core POWER5+ 2.3GHz): 4,033,378 tpmC @ 2.97/tpmC available: January 22, 2007 • DB2 9.5 on IBM POWER 595 (64 core POWER6 5.0GHz): 6,085,166 tpmC @ 2.81/tpmC available: December 10, 2008 Results current as of June 24, 2009 Check http://www.tpc.org for latest results 64x 5GHz POWER6 4 TB RAM 10,900 disks • Higher is better 23
  26. 26. World Record TPC-C Performance on x64 with RedHat Linux 1,200,632 1,020,000 841,809 220,000 420,000 620,000 820,000 1,420,000 1,220,000 DB2 9.5 SQL Server 2005 tpmC IBM x3950 M2 Intel Xeon7460 RHEL 5.2 IBM x3950 M2 Intel Xeon7350 Win2003 TPC Benchmark, TPC-C, tpmC, are trademarks of the Transaction Processing Performance Council. •DB2 9.5 on IBM System x3950 M2 (8 Processor 48 core Intel Xeon 7460 2.66GHz): 1,200,632 tpmC @ $1.99/tpmC available: December 10, 2008 • SQL Server 2005 on HP DL580G5G4 (8 Processor 32 core Intel Xeon 7350 2.93GHz): 841,809 tpmC @$3.46/tpmC available: April 1, 2008 • Higher is better Results current as of June 24, 2009. Check http://www.tpc.org for latest results 24
  27. 27. World record 10 TB TPC-H result on IBM Balanced Warehouse E7100 IBM System p6 570 & DB2 9.5 create top 10TB TPC-H performance 208457 108099 343551 60,000 0 180,00 0 120,00 0 300,00 0 240,00 0 360,00 0 QphH IBM p6 570/DB2 9.5 HP Integrity Superdome-DC Itanium/Oracle 11g Sun Fire 25K/Oracle 10g •Significant proof-point for the IBM Balanced Warehouse E7100 •DB2 Warehouse 9.5 takes DB2 performance on AIX to new levels •65% faster than Oracle 11g best result •Loaded 10TB data @ 6 TB / hour (incl. data load, index creation, runstats) • Higher is better TPC Benchmark, TPC-H, QphH, are trademarks of the Transaction Processing Performance Council. •DB2 Warehouse 9.5 on IBM System p6 570 (128 core p6 4.7GHz), 343551 QphH@10000GB, 32.89 USD per QphH@10000GB available: April 15, 2008 •Oracle 10g Enterprise Ed R2 w/ Partitioning on HP Integrity Superdome-DC Itanium 2 (128 core Intel Dual Core Itanium 2 9140 1.6 GHz), 208457 QphH@10000GB, 27.97 USD per QphH@10000GB, available: September 10, 2008 •Oracle 10g Enterprise Ed R2 w/ Partitioning on Sun Fire E25K (144 core Sun UltraSparc IV+ - 1500 MHz): 108099 QphH @53.80 USD per QphH@10000GB available: January 23, 2006 Results current as of June 24, 2009 Check http://www.tpc.org for latest results 25
  28. 28. World record SAP 3-tier SD Benchmark This benchmark represents a 3 tier SAP R/3 environment in which the database resides on its own server where database performance is the critical factor DB2 outperforms Oracle by 68% and SQL Server by 80% – DB2 running on 32-way p5 595 – Oracle and SQL Server 2000 running on 64-way HP Top SAP SD 3-tier Results byDBMS Vendor 168300 100000 93000 0 20000 40000 60000 80000 100000 120000 140000 160000 180000 SDUsers DB2 8.2 on 32way p5 595 SQL Server on 64-way HPIntegrity Oracle 10g on 64way HP Integrity Results current as of June 24, 2009 Check http://www.sap.com/benchmark for latest results 26 • Higher is better
  29. 29. More SAP performance than any 8-socket server Result comparable to a 32-socket 128-core Sun M9000 32-core Sun T5440 4-sockets 8-sockets 32-sockets 24-core Opteron 32-core Power 750 48-core Opteron 48-core Opteron 128-core Sun M9000 Power 750 Express 15,600 SAP SD 2-Tier Users on The IBM Power 750 Express With DB2 9.7 on AIX 6.1 27 http://www.sap.com/benchmark for latest results Results current as of March 03, 2010 Check
  30. 30. Best SAP SD 2-Tier performance with SAP 6 ERP 4 20% more performance, 1/4 the number of cores vs. Sun M9000 4p/32c/128t 8p/64c/256t Sun M9000 SPARC 32p/128c/256-t 32 sockets Sun M9000 SPARC 64p/256c/512t 64 sockets IBM Power System 780, 8p / 64c / 256t, POWER7, 3.8 GHz, 1024 GB memory, 37,000 SD users, dialog resp.: 0.98s, line items/hour: 4,043,670, Dialog steps/hour: 12,131,000, SAPS: 202,180, DB time (dialog/ update):0.013s / 0.031s, CPU utilization: 99%, OS: AIX 6.1, DB2 9.7, cert# 2010013. SUN M9000, 64p / 256c / 512t, 1156 GB memory, 32,000 SD users, SPARC64 VII, 2.88 SAP SD Users All results are with SAP ERP 6 EHP4 Sun T5440 SPARC 4p/32c/256t IBM X3850 Nehalem-EX 4p/32c/64t 4 sockets Power 750 Sun X4640 Opteron 8p/48c/48t Fujitsu 1800E Nehalem-EX 8p/64c/128t 8 sockets Power 780 37,000SAP users on SAP SD 2 Tier Power 780 with DB2 #1 4-so ket Windows #1 #1Overall 4-socket Power 750 with DB2 System x3850 X5 with DB2 GHz, Solaris 10, Oracle 10g , cert# 2009046. 28 Results current as of April 07, 2010. Check
  31. 31. Benchmark Multi-tier end-to-end performance benchmark for Java EE 5 Single node result: 1014.40 EjOPS 8 nodes cluster result: 7903.16 EjOPS – Approx. 38,500 tx/sec, 135,000 SQL/sec – WAS 7 on 8x HS22 Blades (Intel Xeon X5570 2-socket/8- core) – DB2 9.7 FP1 on x3850 M2 (Intel Xeon X7460 4-socket/24- core), SLES 10 SP2 Result published on January 7, 2010 First to Publish SPECjEnterprise2010 29 Results as of January 7, 2010
  32. 32. More Efficient performance than Ever 30 3,000 Infor Baan ERP 2-Tier Users on The IBM Power 750 Express using DB2 9.7.  More performance, with less space and far less energy consumption than ever Infor ERP LN Benchmark results on P6 / P7 P6 P7 System p 570 p 750 Processor Speed 5 GHz 3.55 GHz No. of chips or sockets 8 2 cores / chip 2 8 Total number of cores 16 16 Total Memory 256 GB 256 GB AIXversion 6.1 6.1 DB2 Version 9.7 GA 9.7 GA # Infor Baan Users 2800 3000 # users / core 175 187.5 # users / chip 350 1500
  33. 33. Performance Improvements DB2 9.7 has tremendous new capabilities that can substantially improve performance When you think about the new features … – “It depends” – We don’t know everything (yet) – Your mileage will vary – Please provide feedback! 31
  34. 34. Active Subagents db2agntp Process/Thread Coordinator Agen s db2pcl Cl nr db2pfchr db2loggw db2dlock db2agntp db2loggr Prefetche rs Page eaners Buffer Pool(s) Deadlock Detector L Subsyste m L o g Buffer Database Level Idle Big - bl oc sts k, ogging Wr ite Lo g Req ue sts syn ef etc h Req ue sts c IO Pr Data DisksLog Disks Commo n iCel ervnetr subagent UDB Client Library UDB S OrgPer-instance Listeners Instance Level db2tcpcm db2ipccmdb2agent (idle) db2agent A anization Idle Agent Pool Idle, pooled agent or t Per-application Per-databaseSingle, Multi-threaded Process db2sysc 32 TCPIP (remote clients) or Shared Memory & Semaphores (local clients) DB2 Threaded Architecture
  35. 35. Performance Advantages of the Threaded Architecture Context switching between threads is generally faster than between processes – No need to switch address space – Less cache “pollution” Operating system threads require less context than processes – Share address space, context information (such as uid, file handle table, etc) – Memory savings Significantly fewer system file descriptors used – All threads in a process can share the same file descriptors – No need to have each agent maintain its own file descriptor table 33
  36. 36. From the existing DB2 9 Deep Compression … Reduce storage costs Improve performance Easy to implement 1.5 Times Better 3.3 Times Better 2.0 Times Better 8.7 Times Better DB2 9 Other “With DB2 9, we’re seeing compression rates up to 83% on the Data Warehouse. The projected cost savings are more than $2 million initially with ongoing savings of $500,000 a year.” - Michael Henson “We achieved a 43 per cent saving in total storage requirements when using DB2 with Deep Compression for its SAP NetWeaver BI application, when compared with the former Oracle database, The total size of the database shrank from 8TB to 4.5TB, and response times were improved by 15 per cent. Some batch applications and change runs were reduced by a factor of ten when using IBM DB2.” - Markus Dell ermann 34
  37. 37. Index Compression What is Index Compression? The ability to decrease the storage requirements from indexes through compression. By default, if the table is compressed the indexes created for the table will also be compressed. – including the XML indexes Index compression can be explicitly enabled/disabled when creating or altering an index. Why do we need Index Compression? Index compression reduces disk cost and TCO (total cost of ownership) Index compression can improve runtime performance of queries that are I/O bound. When does Index Compression work best? – Indexes for tables declared in a large RID DMS tablespaces (default since DB2 9). – Indexes that have low key cardinality & high cluster ratio. 35
  38. 38. Index Compression Page Header Index Page (pre DB2 9.7) Fixed Slot Directory (maximum size reserved) AAAB, 1, CCC AAAB, 1, CCD BBBZ, 1, ZZZ 1055, 1056 3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037 3009, 3012, 3013, 3015, 3016, 3017, 3109 BBBZ, 1, ZZCCAAAE 6008, 6009, 6010, 6011 Index Key RID List How does Index Compression Work? • DB2 will consider multiple compression algorithms to attain maximum index space savings through index compression. 36
  39. 39. Index Compression Page Header Index Page (DB2 9.7) Saved Space from Variable Slot Directory AAAB, 1, CCC AAAB, 1, CCD BBBZ, 1, ZZZ 1055, 1056 3011, 3025, 3026, 3027, 3029, 3033, 3035, 3036, 3037 3009, 3012, 3013, 3015, 3016, 3017, 3109 BBBZ, 1, ZZCCAAAE 6008, 6009, 6010, 6011 Variable Slot Directory • In 9.7, a slot directory is dynamically adjusted in order to fit as many keys into an index page as possible. Variable Slot Directory Index Key RID List 37
  40. 40. 1055, 1 Saved 3011, 14, 1, 1, 2, 4, 2, 1, 1 3009, 3, 1, 2, 1, 1, 92 Saved from RID List Saved Saved Index Compression Page Header Index Page (DB2 9.7) Saved Space from Variable Slot Directory RID Deltas AAAB, 1, CCC AAAB, 1, CCD BBBZ, 1, ZZZ BBBZ, 1, ZZCCAAAE 6008, 1, 1, 1 Variable Slot Directory First RID Index Key Compressed RID RID List Compression 38 • Instead of saving the full version of a RID, we can save some space by storing the delta between two RIDs. • RID List compression is enabled when there are 3 or more RIDs in an index page.
  41. 41. Saved Saved from RID List and Prefix Compression Saved Saved Index Compression C 1055, 1 D 3011, 14, 1, 1, 2, 4, 2, 1, 1 Z 3009, 3, 1, 2, 1, 1, 92 CCAAAE 6008, 1, 1, 1 COMMON PREFIX Index Page (DB2 9.7) Page Header Saved Space from Variable Slot Directory Prefix Compression Compressed • Instead of saving all key values, we can save some space by storing a common prefix and suffix records. • During index creation or insertion, DB2 will compare the new key with adjacent index keys and find the longest common prefixes between them. Variable Slot Directory AAAB, 1, CC BBBZ, 1, ZZ 0, 2 SUFFIX RECORDS Key Compressed RID 39
  42. 42. Simple Index Compr ession Tests - E lapsed Time 49.12 49.24 83.99 53.89 28.31 33.67 68.3 44.07 0 10 20 30 40 Seconds 50 60 70 80 90 Simple Select Simple Insert Simple Update Simple Delete Without Index Compression With Index Compression Index Compression 34.5 34.8 16.2 20.8 23.6 33.9 6.8 10.5 1.6 2.0 2.6 2.5 3.1 3.3 52.2 52.1 0% 20% 40% Select: Select: Base Ixcomp Insert: Insert: Base Ixcomp Update: Update: Base Ixcomp Delete: Delete: Base Ixcomp MachineUtiliza tion user system idle iowait ComplexQueryDatabase WarehouseTested * Higher is better SimpleIndexCompressionTests 16.7 17.5 37.1 36.4 49.1 46.3 48.2 45.0 11.7 11.4 33.3 30.9 25.9 18.5 38.0 34.2 60% 80% 100% Estimated In dex C ompression Savin gs 16% 10% 20% 30% 40% 50% 60% 70% Percentage Com pressed (Indexes) 20% 24% 31% 50% 55% 57% 0% W arehouse #1 W arehouse #2 W arehouse #3 W arehouse #4 W arehouse #5 W arehouse #6 W arehouse #7 Average 36% Runs 18% Faster Runs 19% Faster Runs As fast • Lower is better Results in a Nutshell • Index compression uses idle CPU cycles and idle cycles spent waiting for I/O to compress & decompress index data. • When we are not CPU bound, we are able to achieve better performance in all inserts, deletes and updates. Runs 40 16% Faster
  43. 43. Temp Table Compression What is Temp Table Compression? The ability to decrease storage requirements by compressing temp table data Temp tables created as a result of the following operations are compressed by default: – Temps from Sorts – Created Global Temp Tables – Declared Global Temp Tables – Table queues (TQ) Why do we need Temp Table Compression on relational databases? Temp table spaces can account for up to 1/3 of the overall tablespace storage in some database environments. Temp compression reduces disk cost and TCO (total cost of ownership) 41
  44. 44. Temp Table Compression Canada|Ontario|Toronto|Matthew Canada|Ontario|Toronto|Mark USA|Illinois|Chicago|Luke USA|Illinois|Chicago|John 0x12f0 – CanadaOntarioToronto … 0xe57a – Mathew … 0xff0a – Mark … 0x15ab – USAIllinoixChicago … 0xdb0a – Luke … 0x544d – John … Create dictionary from sample data String of data across a row How does Temp Table Compression Work? – It extends the existing row-level compression mechanism that currently applies to permanent tables, into temp tables. 0x12f0,0xe57a 0x12f0,0xff0a 0x15ab,0xdb0a 0x15ab,0x544d Saved data (compressed) Lempel-Ziv Algorithm 42
  45. 45. Query Workload CPU Analysis for Temp Compression 39.26 46.50 1.7 1.3 29.00 29.50 22.19 14.61 0% 20% 40% 60% 80% 100% Baseline Temp Compression user sys idle iowait Temp Table Compression SpaceSavingsforComplexWarehouseQuerieswithTemp Compression 78.3 50.2 0.0 20.0 40.0 60.0 80.0 100.0 WithoutTempCompTotalBytesStored WithTempCompBytesStored Size(Gigabyt es) Saves 35% Space Effective CPU Usage • Lower is better ElapsedTimeforComplexWarehouseQuerieswithTemp Compression 183.98 175.56 120.00 130.00 140.00 150.00 160.00 170.00 180.00 190.00 200.00 WithoutTempCompRuntime WithTempCompRuntime Minu tes 5% Faster • Lower is better Results in a Nutshell For affected temp compression enabled complex queries, an average of 35% temp tablespace space savings was observed. For the 100GB warehouse database setup, this sums up to over 28GB of saved temp space. 43
  46. 46. XML Data Compression What is XML Data Compression? The ability to decrease the storage requirements of XML data through compression. XML Compression extends row compression support to the XML documents. If row compression is enabled for the table, the XML data will be also compressed. If row compression is not enabled, the XML data will not be compressed either. Why do we need XML Data Compression? Compressing XML data can improve storage efficiency and runtime performance of queries that are I/O bound. XML compression reduces disk cost and TCO (total cost of ownership) for databases with XML data 44
  47. 47. XML Data Compression Relational Data Data (uncompressed) < 32KB XML Data 32KB – 2GB XML Data Comp. Data Data (compressed) Inlined < 32KB XML Data Compressed 32KB – 2GB XML Data Dictionary #1 Dictionary #2 How does XML Data Compression Work? – Small XML documents (< 32k) can be inlined with any relational data in the row and the entire row is compressed. • Available since DB2 9.5 – Larger XML documents that reside in a data area separate from relational data can also be compressed. By default, DB2 places XML data in the XDA to handle documents up to 2GB in size. – XML compression relies on a separate dictionary than the one used for row compression. 45
  48. 48. XML Data Compression X M L C o m p re s s io n S a v in g s 4 3 % 6 1 % 6 3 % 6 3 % 7 4 % 7 7 % 7 7 % 0 % 2 0 % 4 0 % 6 0 % P e r c e n ta g e C o m p r e s s e d 8 0 % X M L D B Test # 1 X M L D B Test # 2 X M L D B Test # 3 X M L D B Test # 4 X M L D B Test # 5 X M L D B Test # 6 X M L D B Test # 7 XMLDatabaseTested Results in a Nutshell Significantly improved query performance for I/O-bound workloads. Achieved 30% faster maintenance operations such as RUNSTATS, index creation, and import. Average compression savings of ⅔ across 7 different XML customer databases and about ¾ space savings for 3 of those 7 databases. Average Elapsed Time for SQLXML and Xquery Queries over an XML and Relational Data database using XDA Compression 31.1 19.7 0 5 10 15 20 25 30 35 Without XML Compression With XML Compression Time(sec) Average 67% • Lower is better • Higher is better 37% Faster 46
  49. 49. Range Partitioning with Local Indexes 47 What does Range Partitioning with Local Indexes mean? – A partitioned index is an index which is divided up across multiple storage objects, one per data partition, and is partitioned in the same manner as the table data – Local Indexes can be created using the PARTITIONED keyword when creating an index on a partitioned table (Note: MDC block indexes are partitioned by default) Why do we need Range Partitioning with local Indexes? – Improved ATTACH and DETACH partition operations – More efficient access plans – More efficient REORGs. When does Range Partitioning with Local Indexes work best? – When frequents roll-in and roll-out of data are performed – When one tablespace is defined per range.
  50. 50. Index siz e com parison: Leaf page count 18,409 13,476 0 4,000 8,000 12,000 16,000 20,000 global index on RP table local index on RP table Indexleafpages Results in a Nutshell Partition maintenance with ATTACH: – 20x speedup compared to DB2 9.5 global index because of reduced index maintenance. – 3000x less log space used than with DB 9.5 global indexes. Asynchronous index maintenance on DETACH is eliminated. Local indexes occupy fewer disk pages than 9.5 global indexes. – 25% space savings is typical. – 12% query speedup over global indexes for index queries – fewer page reads. 25% Space Savings • Lower is better Local Indexes * Lower is better Range Partitioning with Local Indexes Total Time and Log Space required to ATTACH 1.2 million rows 651.84 0.05 0.03 0.21 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 V9.5 Global Indexes V9.7 Local Indexes V9.7 Local IndexesNo Indexes - Baseline built during ATTACHbuilt before ATTACH LogSpacerequired(MB) 180.00 160.00 140.00 120.00 100.00 80.00 60.00 40.00 20.00 0.00 Attach/SetIntegritytime(sec) Log Space used, MB Attach/Set Integrity time (sec) 48
  51. 51. Scan Sharing What is Scan Sharing? It is the ability of one scan to exploit the work done by another scan This feature targets heavy scans such as table scans or MDC block index scans of large tables. Scan Sharing is enabled by default on DB2 9.7 Why do we need Scan Sharing? Improved concurrency Faster query response times Increased throughput When does Scan Sharing work best? Scan Sharing works best on workloads that involve several clients running similar queries (simple or complex), which involve the same heavy scanning mechanism (table scans or MDC block index scans). 49
  52. 52. Scan Sharing How does Scan Sharing work? – When applying scan sharing, scans may start somewhere other than the usual beginning, to take advantage of pages that are already in the buffer pool from scans that are already running. – When a sharing scan reaches the end of file, it will start over at the beginning and finish when it reaches the point that it started. – Eligibility for scan sharing and for wrapping are determined automatically in the SQL compiler. – In DB2 9.7, scan sharing is supported for table scans and block index scans. Unshared Scan Shared Scan A scan B scan Re-read pages causing extra I/O A scan Shared A & B scan B scan 50 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3
  53. 53. Block Index Scan Test : Q1 and Q6 Interleaved Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 QueryRan staggeringevery10sec 0 50 100 150 200 250 300 350 400 450 500 550 600 Scan Sharing Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 Q1 Q6 QueryRan staggeringevery10sec 0 50 100 150 200 250 300 350 400 450 500 550 600 No Scan Sharing Q1 : CPU Intensive Q6 : IO Intensive Scan Sharing Tests on Table Scan 1,284.6 90.3 0.0 200.0 400.0 600.0 800.0 1,000.0 1,200.0 1,400.0 No Scan Sharing Scan Sharing Average of running 100 Instances of Q1 Seconds Scan Sharing • Lower is better • Lower is better Runs 14x Faster! • MDC Block Index Scan Sharing shows 47% average query improvement gain. • The fastest query shows up to 56% runtime gain with scan sharing. • 100 concurrent table scans now run 14 times faster with scan sharing! Runs 47% 51 Faster!
  54. 54. Complex Queries per Hour Throughputfor a 10GBWarehouse Database: 16 Parallel Streams 381.92 636.43 0 100 200 300 400 500 600 700 Scan SharingOFF Scan SharingON Scan Sharing • Higher is better 67% Throughput Improved Results in a Nutshell When running 16 concurrent streams of complex queries in parallel, a 67% increase in throughput is attained when using scan sharing. Scan sharing works fully on UR and CS isolation and by design, has limited applicability on RR and RS isolation levels. 52
  55. 55. XML Scalability on Infosphere Warehouse (a.k.a DPF) What does it mean? Tables containing XML column definitions can now be stored and distributed on any partition. XML data processing is optimized based on their partitions. Why do we need XML in database partitioned environments? As customers adopt the XML datatype in their warehouses, XML data needs to scale just as relational data XML data also achieves the same benefit from performance improvements attained from the parallelization in DPF environments. 53
  56. 56. XML Scalability on Infosphere Warehouse (a.k.a DPF) Simple query: Elapsed time speedup from 4 to 8 partitions 0 0.5 1 1.5 2 2.5 count w ith count, no grouped agg index index update colo join noncolo join Elapsedtime4P/8P rel xml xmlrel * Results in a Nutshell Table results show the elapsed time performance speedup of complex queries from a 4 partition setup to an 8 partition setup. Queries tested have a similar star-schema balance for relational and XML. Each query run in 2 or 3 equivalent variants: – Completely relational (“rel”) – Completely XML (“xml”) – XML extraction/predicates with relational joins (“xmlrel”) (join queries only) Queries/updates/deletes scale as well as relational ones. Average XML query-speedup is 96% of relational Complex query: Elapsed time speedup from 4 to 8 partitions 0 0.5 1 1.5 2 2.5 3 3.5 1 2 3 4 5 6 7 8 9 10 Query number Elapsedtime4P/8P rel xml xmlrel 54
  57. 57. Statement Concentrator Why do we need the statement concentrator? This feature is aimed at OLTP workloads where simple statements are repeatedly generated with different literal values. In these workloads, the cost of recompiling the statements many times adds a significant overhead. Statement concentrator avoids this compilation overhead by allowing the compiled statement to be reused, regardless of the values of the literals. What is the statement concentrator? It is a technology that allows dynamic SQL statements that are identical, except for the value of its literals, to share the same access plan. The statement concentrator is disabled by default, and can be enabled either through the database configuration parameter (STMT_CONC) or from the prepare attribute 55
  58. 58. Statement Concentrator Effect of the Statement Concentrator on Prepare times for 20,000 statements using 20 users 436 23 0 100 200 300 400 500 Concentrator off Concentrator on PrepareTime(sec) 19x Reduction in Prepare time! • Lower is better Results in a Nutshell The statement concentrator allows prepare time to run up to 25x faster for a single user and 19x faster for 20 users. The statement concentrator improved throughput by 35% in a typical OLTP workload using 25 users Effect of the Statement Concentrator for an OLTP workload 133 180 200 180 160 140 120 100 80 60 40 20 0 Concentrator Off Concentrator On Throughpu t • Higher is better 35% Throughput Improved! 56
  59. 59. Currently Committed What is Currently Committed? Currently Committed semantics have been introduced in DB2 9.7 to improve concurrency where readers are not blocked by writers to release row locks when using Cursor Stability (CS) isolation. The readers are given the last committed version of data, that is, the version prior to the start of a write operation. Currently Committed is controlled with the CUR_COMMIT database configuration parameter Why do we need the Currently Committed feature? Customers running high throughput database applications cannot tolerate waiting on locks during transaction processing and require non-blocking behavior for read transactions. 57
  60. 60. Currently Committed Results in a Nutshell By enabling currently committed, we use CPU that was previously idle (18%), leading to an increase of over 28% in throughput. Throughput of OLTP Workload using Currently Committed 981.25 1,260.89 0 300 600 900 1,200 1,500 Currently Commit Disabled Currently Commit Enabled Transactionspersecond CPU Analysis - CPU Analysis on Currently Committed 45.0 58.9 12.9 17.2 33.5 5.0 8.7 19.0 0% 20% 40% 60% 80% 100% CC Disabled CC Enabled user system idle iowait Effective CPU usage Allows 28% more throughput • Higher is better With currently committed enabled, we see reduced LOCK WAIT time by nearly 20%. We observe expected increases in LSN GAP cleaners and increased logging. 58
  61. 61. LOB Inlining Why do we need the LOB Inlining feature? Performance will increase for queries that access inlined LOB data as no additional I/O is required to fetch the LOB data. LOBS are prime candidates for compression given their size and the type of data they represent. By inlining LOBS, this data is then eligible for compression, allowing further space savings and I/O from this feature. What is LOB INLINING? LOB inlining allows customers to store LOB data within a formatted data row in a data page instead of creating separate LOB object. Once the LOB data is inlined into the base table row, LOB data is then eligible to be compressed. 59
  62. 62. LOB Inlining Inlined LOB vs. Non-Inlined LOB 75% 75% 64% 55% 70% 65% 7% 22% 30% 10% 0% 80% 70% 60% 50% 40% 30% 20% 8kLob 16kLob 32kLob Size of LOB Insert Performance Select Performance Update Performance %Improvement Results in a Nutshell INSERT and SELECT operations are the ones with more benefit. The smaller the LOB the bigger the benefit of the inlining For UPDATE operations the larger the LOB the better the improvements We can expect the inlined LOBs will have the same performance as a varchar(N+4) 60 * Higher is better
  63. 63. Summary of Key DB2 9.7 Performance Features Compression for indexes, temp tablespaces and XML data results on space savings and better performance Range Partitioning with local indexes results in space savings and better performance including increased concurrency for certain operations like REORG and set integrity. It also makes roll-in and roll-out of data more efficient. Scan Sharing improves workloads that have multiple heavy scans in the same table. XML Scalability allows customers to exploit the same benefits in data warehouses as they exist for relational data Statement Concentrator improves the performance of queries that use literals reducing their prepare times Currently Committed increases throughput and reduces the contention on locks LOB Inlining allows this type of data to be eligible for compression 61
  64. 64. A glimpse at the Future Expect more leadership benchmark results on POWER7 and Nehalam EX Preparing for new workloads – Combined OLTP and Analytics Preparing for new operating environments – Virtualization – Cloud – Power-aware Preparing for new hardware – SSD storage – POWER7 – Nehalem EX 62
  65. 65. Conclusion DB2 is the performance benchmark leader New features in DB2 9.7 that further boost performance – For BOTH the OLTP and Data warehouse areas Performance is a critical and integral part of DB2! – Maintaining excellent performance • On current hardware • Over the course of DB2 maintenance – Preparing for future hardware/OS technology 63
  66. 66. Appendix – Mandatory SAP publication data Required SAP Information For more information regarding these results and SAP benchmarks, visit www.sap.com/benchmark. These benchmark fully complies with the SAP Benchmark Council regulations and has been audited and certified by SAP AG SAP 3-tier SD Benchmark: 168,300 SD benchmark users. SAP R/3 4.7. 3-tier with database server: IBM eServer p5 Model 595, 32-way SMP, POWER5 1.9 GHz, 32 KB(D) + 64 KB(I) L1 cache per processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 v8.2.2, AIX 5.3 (cert # 2005021) 100,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Model SD64A, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1 cache, 256 KB L2 cache, 9 MB L3 cache. Oracle 10g, HP-UX11i (cert # 2004068) 93,000 SD benchmark users. SAP R/3 4.7. 3-tier with database server: HP Integrity Superdome 64P Server, 64-way SMP, Intel Itanium 2 1.6 GHz, 32 KB L1 cache, 256 KB L2 cache, 9 MB L3 cache . SQL Server 2005, Windows 2003 (cert # 2005045) SAP 3-tier BW Benchmark: 311,004 throughput./hour query navigation steps.. SAP BW 3.5. Cluster of 32 servers, each with IBM x346 Model 884041U, 1 processor/ 1 core/ 2 threads, Intel XEON 3.6 GHz, L1 Execution Trace Cache, 2 MB L2 cache, 2 GB main memory. DB2 8.2.3 SLES 9. (cert # 2005043) SAP TRBK Benchmark: 15,519,000. Day processing no. of postings to bank accounts/hour. SAP Deposit Management 4.0. IBM System p570, 4 core, POWER6, 64GB RAM. DB2 9 on AIX 5.3. (cert # 2007050) 10,012,000 Day processing no. of postings to bank accounts/hour. SAP Account Management 3.0. Sun Fire E6900, 16 core, UltraSPARC1V, 56GB RAM, Oracle 10g on Solaris 10, (cert # 2006018) 8,279,000 Day processing no. of postings to bank accounts/hour/ SAP Account Management 3.0. HP rx8620, 16 core, HP mx2 DC,64 GB RAM, SQL Server on Windows Server (cert # 2005052) SD 2-tier SD Benchmark: 39,100 SD benchmark users, SAP ECC 6.0. Sun SPARC Enterprise Server M9000, 64 processors / 256 cores / 512 threads, SPARC64 VII, 2.52 GHz, 64 KB(D) + 64 KB(I) L1 cache per core, 6 MB L2 cache per processor, 1024 GB main memory, Oracle 10g on Solaris 10. (cert # 2008-042-1) 35,400 SD benchmark users, SAP ECC 6.0. IBM Power 595, 32 processors / 64 cores / 128 threads, POWER6 5.0 GHz, 128 KB L1 cache and 4 MB L2 cache per core, 32 MB L3 cache per processor, 512 GB main memory. DB2 9.5, AIX 6.1. (Cert# 2008019). 30,000 SD benchmark users. SAP ECC 6.0. HP Integrity SD64B , 64 processors/128 cores/256 threads, Dual-Core Intel Itanium 2 9050 1.6 GHz, 32 KB(I) + 32 KB(D) L1 cache, 2 MB(I) + 512 KB(D) L2 cache, 24 MB L3 cache, 512 GB main memory. Oracle 10g on HP-UX 11iV3. (cert # 2006089) 23,456 SD benchmark users. SAP ECC 5.0. Central server: IBM System p5 Model 595, 64-way SMP, POWER5+ 2.3GHz, 32 KB(D) + 64 KB(I) L1 cache per processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors. DB2 9, AIX 5.3 (cert # 2006045) 20,000 SD benchmark users. SAP ECC 4.7. IBM eServer p5 Model 595, 64-way SMP, POWER5, 1.9 GHz, 32 KB(D) + 64 KB(I) L1 cache per processor, 1.92 MB L2 cache and 36 MB L3 cache per 2 processors, 512 GB main memory. (cert # 2004062) These benchmarks fully comply with SAP Benchmark Council's issued benchmark regulations and have been audited and certified by SAP. For more information, see http://www.sap.com/benchmark 64

×