DB2 Real Estate – Buy, Invest, Sell, ... Reorg?!


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

DB2 Real Estate – Buy, Invest, Sell, ... Reorg?!

  1. 1. DB2 Real Estate – Buy, Invest, Sell, … Reorg?! Bill Minor - IBM Toronto Lab bminor@ca.ibm.com TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows
  2. 2. Highlights The cost of disk storage represents a significant portion of the overall expense associated with large database systems. Once purchased, managing that storage can significantly add to the total cost of ownership. Effective management and utilization of disk space is instrumental in keeping your database Real Estate costs in check. The goals of this presentation are to: Provide intimate details into the reorg utility Provide an overview of Data Management in DB2 Highlight customer usage scenarios including best practices, monitoring, tuning, autonomics and troubleshooting Illustrate the role of reorganization in new Viper features such as Table Partitioning, Data Row Compression,and Large RIDs 2
  3. 3. Agenda ‘DB2 Real Estate’ Overview of Reorganization Table Compression Page and Extent Size Selection DMS Tablespace Architecture Registry Variables High Water Mark Large Record Identifiers (RIDs) Log Space Consumption 3
  4. 4. A Confession! I am not a Realtor, Financial Analyst, Investment Advisor, Stock Trader, Card Counter, Poker Tour Champ, … One does not have to be an expert to realize that investing in real estate is a significant proposition By analogy … your DB2 ‘storage’ is a critical and valuable investment. Just as there are many facets/intricacies/strategies when dealing with Real Estate, so to with your management of DB2 Storage. 4
  5. 5. The Costs of ‘Information Real Estate’ Hardware, Software, Licensing, Support costs Poorly optimized and utilized database Database Administration/Management: People costs Infrastructure costs: floor space, power, cooling Frustration When it comes to storage, it is estimated that it costs (TCO) $5 for every $1 spent on physical storage. 5
  6. 6. DB2 ‘Real Estate’ What? Storage objects, DB2 tables Table types: ‘regular’ Multidimensional Clustered (MDC) Range Partitioned (RPT) Range Clustered Tables (RCT) Database Partitioned Also relevant – tablespace type and characteristics SMS vs DMS REGULAR vs. LARGE vs. TEMPORARY There are many aspects facets to managing DB2 Real Estate – I am going to focus on storage with an emphasis on table reorg or reorganization 6
  7. 7. DB2 Reorganization Many changes to table data (INSERTs/UPDATEs/DELETEs) can affect the physical organization of table and index data to the point where performance is adversely affected Goals of REORG: Defragment or compact data onto fewer data pages Physically recluster data into the same logical sequence as an index Eliminate pointer-overflow records DB2 9 - build a (new) compression dictionary and to compress the rows in the table using the compression dictionary Conversion to Large Rids Schema changes The result: Access to a reorganized object can be done with minimal I/O and bufferpool misses as well as with maximum prefetcher effectiveness i.e. maintain or improve query performance 7
  8. 8. Access Modes of Table REORG 'Offline' ==> "Classic Reorg" (as pertains to Tables) ALLOW READ ACCESS (the default) ALLOW NO ACCESS (truly 'offline') 'Online' ==> "Inplace Reorg" (not to be confused with 'in-tablespace reorg' as pertains to classic table reorg) ALLOW WRITE ACCESS (the default) ALLOW READ ACCESS OFFLINE: Table available for read only access during reorg up to copy phase ONLINE: Table available for full S/I/U/D access during reorg 8
  9. 9. Table Reorganization Command (CLP Syntax) REORG {TABLE table-name Table-Clause} [On-DbPartitionNum-Clause] Table-Clause: [INDEX index-name] [[ALLOW {READ | NO} ACCESS] [USE tablespace-name] [INDEXSCAN] [LONGLOBDATA [USE long-tablespace-name]] [KEEPDICTIONARY | RESETDICTIONARY]] | [INPLACE [ [ALLOW {WRITE | READ} ACCESS] [NOTRUNCATE TABLE] [START | RESUME] | {STOP | PAUSE} ]] Examples: db2 reorg table staff index inx1_staff inplace allow write access db2 reorg table emp inplace pause on dbpartitionnum(10 to 100) db2 reorg table emp_resume longlobdata db2 reorg table department resetdictionary db2 reorg table payroll index pr1 use tempspace1 9
  10. 10. An Overview of Classic ('Offline') Table Reorg Processing Shadow copy approach Tablespace used to hold shadow copy is governed by user (USE clause) •For DMS tablespaces, implication to the ‘High Water Mark’ (more to come) TEMP tablespace is required and it varies (next slide) Phases: Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild Dictionary Build: there is an additional scan of the table data if INDEXSCAN specified Index build/rebuild is now parallelized in Viper II (no need to set INTRA_PARALLEL cfg) Processing Modes: Reclustering via table scan sort (default) or index scan (via INDEXSCAN clause) Space reclamation (compaction) via table scan LONG/LOB data is not reorged by default ƒWhen reorged, XML data is not "reorged", only empty pages are removed 10
  11. 11. ‘Offline’ Table REORG - TEMP Space Usage Recall the phases of table reorg: Dictionary Build, Sort, Build, Replace(or Copy), Index Rebuild Three of these phases can consume TEMP tablespace Sort: table scan sort (default) processing if sort spills to disk Build: if the shadow copy is to be built in a temp (USE clause) Index Rebuild: if associating sort processing spills to disk If multiple temporary tablepaces exist The table reorg ‘USE <tempspace>’ clause only guarantees that the specified tempspace is used for the table shadow copy • Index recreate and scan sort processing can use another available temp space (the choice is governed internally according to temp usage) 11
  12. 12. 'Offline' Table Reorg Reclustering:Table Scan Sort (default) Table is scanned and records are sorted in order to create new reorganized version of the table rather (reclustering index is not scanned) A reorg may be required because clustering index isn't well clustered so a table scan sort will give better I/O characteristics (may be slower for sparse tables where index itself is somewhat small) Caveats: Table scan sort is disabled 'under-the-covers' if LONG/LOB data is to reorganized Length of sort record is too large (RID is included in sort record) Index recreate optimization: If reclustering index is SMS type or unique DMS type, recreation of this index will not require a sort. Rather this index is rebuilt by simply scanning the newly reorganized data table. Any other indexes that require recreation will involve a sort If just reclustering index (of the required type) exists no temp space considerations in this case 12
  13. 13. 'Offline' Table REORG - Scan Sort Temp Storage db2 reorg table T1 index I1 db2 reorg table T1 index I1 use TEMPSPACE1 3x TEMP SORT 2x TEMP TDASPILL SHADOW T1 TDASPILL TDAMERGE T1 TDAMERGE SHADOW USERSPACE1 TEMPSPACE1 USERSPACE1 TEMPSPACE1 13
  14. 14. Inplace or ‘Online’ Table Reorganization Inplace Table Reorganization Rows moved within existing table object to re-establish clustering, reclaim free space, and eliminate overflows Executes as asynchronous background application (process name - db2reorg) Table must be at least 3 pages in size Cannot inplace reorg LONG/LOB data (use 'offline' reorg) Attributes: Minimal extra storage requirement Incremental: benefit of effects seen immediately No iterative log processing phase Table quiesce for object 'switch over' at end can be avoided Think of it as a Trickle Reorg 14
  15. 15. Online Table Reorganization Reclustering: vs. Space Reclamation: db2 reorg table t1 index i1 inplace db2 reorg table t1 inplace TIME VACATE PAGE RANGE: MOVE & CLEAN to make space Move rows from end of table, filling up holes at the start free space FILL PAGE RANGE: MOVE & CLEAN to fill space VACATE PAGE RANGE: MOVE & CLEAN to make space Backward scan starts at end, fills holes earlier Uses clustering index during FILL phases in table identified by simultaneous forward scan 15
  16. 16. REORGCHK - Table Statistics db2 reorgchk on table bminor.staff3 Table statistics: F1: 100 * OVERFLOW / CARD < 5 F2: 100 * (Effective Space Utilization of Data Pages) > 70 F3: 100 * (Required Pages / Total Pages) > 80 SCHEMA NAME CARD OV NP FP ACTBLK TSIZE F1 F2 F3 REORG ---------------------------------------------------------------------------------------- Table: BMINOR.STAFF3 6144 0 153 153 - 276480 0 45 100 -*- ---------------------------------------------------------------------------------------- F1: 100 * OVERFLOW / CARD < 5 The total number of Overflow records in the table should be less than 5% F2: 100 * (Effective Space Utilization of Data Pages) > 70 There should be less than 30% free space in the table F3: 100 * (Required Pages / Total Pages) > 80 The number of pages that contains no rows at all should be less than 20% of the total number of pages in the table 16
  17. 17. Classic vs Inplace Table Reorg Reorg Mode Approach Storage Phases Key Options Other Avail-ability Objects Classic Shadow copy If TEMP 0) Dictionary built Clustering index Indexes By default, Reorg technique: tablespace (if necessary) vs no clustering always table is rebuilds table specified, table 1) Sort index reorganized available for in different rebuilt there, then 2) Build Table read until storage; copied back Scan/sort vs LOB/LF/XML phase 3) indexes 3) Replace/Copy index scan optionally rebuilt and Else, table 4) Index Rebuild reorganized Can select truncated to rebuild directly in no access new size original tablespace. Inplace Trickle row Table 1) Move rows No truncation Neither By default, Reorg movement reorganized 2) Truncate (opt) indexes nor table is technique: within original LOB/LF/XML available for Moves rows table storage reorganized R/W access within existing table to re- If/when establish truncate is clustering done, table and/or pack is available rows so as to for read reclaim access space. 17
  18. 18. Locks Acquired for Table Reorg Table Reorg Mode Catalog Locks Table Being Reorganized (SYSCAT.TABLES) Classic Table Reorg - IS Table Lock - IX Tablespace Lock - NS Row Lock - U Table Lock - Upgrade to Z Table Lock for Copy Phase Inplace Table Reorg - IS Table Lock -IX Tablespace Lock - NS Row Lock - IS Table Lock - X Alter Table Lock -S Row Lock on rows moved/cleaned - Upgrade to S Table Lock to prepare for Truncation -Special Z Table Lock for drain/wait on Truncate 18
  19. 19. Table Reorganization Support Matrix Reorg Mode DPF Table MDC Partitioning Classic Reorg Fully supported Supported Fully supported (can be invoked (invoked on all on all or table partitions) specified DB partitions) Inplace Reorg Fully supported Not supported Not supported (can be invoked (not needed for on all or reclustering) specified DB partitions) 19
  20. 20. Monitoring Table REORGs Table Snapshot db2 get snapshot for tables on SAMPLE db2pd tool db2pd -db SAMPLE -reorgs file=reorg_pd.out (db2pd -db SAMPLE -tcbstats) (db2pd -db SAMPLE -mempools) LIST HISTORY db2 list history reorg all for SAMPLE View and Table Functions db2 select * from sysibmadm.snaptab_reorg db2 select * from table(sysproc.snap_get_tab_reorg('SAMPLE', dbpartitionnum)) as tb db2 select * from table(sysproc.admin_list_hist( )) as listhistory db2 select * from table(sysproc.admin_get_tab_info(‘<schema>’, ‘<tabname>’)) as t db2 select * from table(sysproc.admin_get_compress_ tab_info(‘<schema>’, ‘<tabname>’, ‘<exec-mode>’)) as t Administrator Notification Log $HOME/sqllib/db2dump/<instance_name>.nfy 20
  21. 21. REORG Table - History File (Example) db2 list history reorg all for SAMPLE Operation= Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID REORG -- --- ------------------ ---- --- ------------ ------------ -------------- Log file being G T 20070313103600 N S0000022.LOG written to when ---------------------------------------------------------------------------- REORG started Table: "BMINOR "."STAFF2" ---------------------------------------------------------------------------- Comment: REORG START Start Time: 20070313103600 'Online' ("N"=Online, "F"=Offline) End Time: 20070313103600 REORG Table REORG ---------------------------------------------------------------------------- ("T"=Table,"I"=Index) Op Obj Timestamp+Sequence Type Dev Earliest Log Current Log Backup ID -- --- ------------------ ---- --- ------------ ------------ -------------- REORG G T 20070313103729 N S0000023.LOG Status ---------------------------------------------------------------------------- Table: "BMINOR "."STAFF2" NOTE: "Comment" ---------------------------------------------------------------------------- field reports REORG Comment: REORG Done Status only for Start Time: 20070313103729 'online' case. For End Time: 20070313103729 Log file being 'offline' it specifies written to when reclustering index ---------------------------------------------------------------------------- REORG completed and temp space ids. 21
  22. 22. REORG Monitoring - Table Snapshot db2 reorg table staff3 index i3 Table Schema = BMINOR (Temp table for spilling sort) Table Name = STAFF3 Table Schema = <140><BMINOR > Table Type = User Table Name = TEMP (00007,00002) Data Object Pages = 1184 Table Type = Temporary Index Object Pages = 190 Data Object Pages = 820 Rows Read = Not Collected Rows Read = Not Collected Rows Written = 20736 Rows Written = 72178 Overflows = 0 Overflows = 0 Page Reorgs = 0 Page Reorgs = 0 Table Reorg Information: Reorg Type = 'Offline' reorg (read access up until Replace phase) Reclustering Table Reorg Reclustering reorg via table scan sort Allow Read Access Recluster Via Table Scan Reorg Data Only ID of index being used to recluster by Reorg Index = 1 Reorg Tablespace = 2 Start Time = 02/26/2007 13:48:48.908388 Reorg Phase = 1 - Sort Sort phase (phases only applicable to offline reorg) Max Phase = 5 Total number of phases to occur: Phase Start Time = 02/26/2007 13:48:48.923862 Status = Started Dictionary Build,Sort, Build, Replace, Current Counter = 986 Index Recreate Max Counter = 1183 Completion = 0 Progress indicator - currently 83% complete (986/1183x100) End Time = 22
  23. 23. ADMIN_LIST_HIST( ) Table Function - DPF Example Database Connection Information Database server = DB2/AIX64 9.1.2 SQL authorization ID = BMINOR Local database alias = SAMPLE DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME -------------- ---------- ----------- -------------- 0 T - 20070303152449 1 record(s) selected. myhost: db2 connect to sample completed ok db2_all "db2 connect to sample; db2 select Database Connection Information dbpartitionnum,objecttype,sqlcode,start_time from table'(('sysproc.admin_list_hist'())' Database server = DB2/AIX64 9.1.2 as listhistory where operation='G'" SQL authorization ID = BMINOR Local database alias = SAMPLE DBPARTITIONNUM OBJECTTYPE SQLCODE START_TIME -------------- ---------- ----------- -------------- 100 T -964 20070303152118 1 record(s) selected. OLR encounter sql0964 - 'log full', on myhost: db2 connect to sample completed ok dbpartitionnum 100 23
  24. 24. 'Offline' or 'Online' Table REORG? 'Offline' Table REORG: PROS: Provides the fastest table reorganization especially if LOBs/LONGs are not required to be reorged (if they are only classic reorg supported for reorging LONG/LOBs) Indexes are rebuilt once the table has been reorganized Original version of table can be read only up until the last phase of reorg (replace phase) The only way to rebuild a new compression dictionary, and/or to compress all rows in table using the existing or newly created compression dictionary CONS: Large space requirement: shadow copy approach so need approximately twice as much space as the original table Limited access: read-only until Replace/Copy phase All-or-nothing process Can only be stopped by the app or user who understands how to stop the process Recommendation: Choose this method if you can reorganize tables during a maintenance window 24
  25. 25. 'Offline' or 'Online' Table REORG? 'Online' Table REORG: PROS: Allows apps to access the table while executing Can be paused and resumed Runs asynchronously Requires less working storage since table is incrementally processed CONS: Slower than Classic method (~10-20x) Only allowed for tables with type-2 indexes Cannot reorganize LONG/LOBs Indexes are maintained, not rebuilt, so index reorganization may subsequently be required Requires more log space Recommendation: Choose this method for 24x7 operations with minimal maintenance windows 25
  26. 26. Additional REORG Notes Different tables can be reorged simultaneously as long as no resource constraints or limitations Restriction: for offline table reorg, DMS temp spaces cannot be shared by simultaneous reorgs If the table contains mixed row format because the table value compression has been activated or deactivated, an offline table reorganization can convert all the existing rows into the target row format If the table is partitioned onto several database partitions, and the table reorganization fails on any of the affected database partitions, only the failing database partitions will have the table reorganization rolled back The granularity of table reorg is at the Database Partition level not the Table Range Partition level Table Ranges are reorg sequentially one after the other and global indexes rebuilt once all ranges have been reorganized 26
  27. 27. Reducing the Need to Reorganize Tables ALTER TABLE to add PCTFREE space to each data page ƒConsidered only by the load and table reorg. Range is from 0 to 99% with default value of 0 Sort the data Load the data Creating multi-dimensional clustering (MDC) tables ƒ (For MDC tables, clustering is maintained on the columns that you specify as arguments to the ORGANIZE BY DIMENSIONS clause of the CREATE TABLE statement. However, REORGCHK might recommend reorganization of an MDC table if it considers that there are too many unused blocks or that blocks should be compacted) APPEND mode tables If the index key values of these new rows are always new high key values for example then the clustering attribute of the table will try to place them at the end of the table. Having free space in other pages will do little to preserve clustering. Hence, placing the table in append mode may be a better choice than a clustering index Automatic Dictionary Creation on Table Growth (TLU-1242A) Dictionary created as table is populated and reaches a certain threshold in size (Viper II) 27
  28. 28. Online Index Reorg : Overview REORG {TABLE table-name Table-Clause | INDEXES ALL FOR TABLE table-name Index-Clause} [On-DbPartitionNum-Clause] Index-Clause: [ALLOW {READ | NO | WRITE} ACCESS] [{CLEANUP ONLY [ALL | PAGES] | CONVERT}] Goals: To improve physical clustering Remove fragmentation The table and original index are available for concurrent transactions There are 4 phases involved in OLIR: Build Phase: All indexes on the table are rebuilt in a new index storage object – a “shadow object” (as opposed to the “ghost index”) Log Catch up Phase: The catch up is done for all indexes on the table Object Switch Phase: Super exclusive table lock acquired “Shadow object” becomes THE index object Cleanup Phase: Old index object removed 28
  29. 29. Online Index Reorg : Table Partitioning and MDC Notes and Limitations MDC REORG with ALLOW WRITE not supported • Note: ALLOW READ is supported Table Partitioning Supports ability to reorg individual indexes (as opposed to ALL indexes of a table) • Supported in all availability modes (ALLOW NONE, ALLOW READ, ALLOW WRITE) • Natural thing to do, since with table partitioning, each index for the table is in it’s own storage object (and OLIR operates on a storage object basis) Also supports REORG INDEXES ALL in ALLOW NONE 29
  30. 30. OLIR Hints & Tips Enlarge the util_heap_sz if you see ADM9500W in the Administration Notification Log (it will also appear in the db2diag.log) Informational log records are buffered in the utility heap If the utility heap is exhausted performance will suffer as the catch up phase will involve reading log files, and, possibly, retrieving them from archive Ensure the tablespace is large enough for the shadow/ghost object/index Remember for Reorg, the shadow object will contain all indexes, so will require (very approximately) the same amount of space as the current index object on the table For Create, the ghost index will simply require the space for the newly created index Use LARGE tablespaces Ensure you commit as soon as possible after index creations Minimizes time table S lock held 30
  31. 31. Locking Associated with Online Index Reorg Reorg Mode Catalog Locks Table for Indexes Being Reorganized (SYSCAT.TABLES) Online Index Reorg - IS Table Lock -“ALLOW NO ACCESS”: Z lock on table - NS Row Lock -“ALLOW READ ACCESS”: S lock on table -“ALLOW WRITE ACCESS”:IN lock on table -S drain lock for each index (all writers must be aware) -S lock at end to perform final catch-up -Quiesce concurrent writers: Z lock to perform index switch 31
  32. 32. Reorgchk Index Statistics Index statistics: F4: CLUSTERRATIO or normalized CLUSTERFACTOR > 80 The clustering ratio of an index should be greater than 80% (Low cluster ratio means index sequence not the same as table sequence) F5: 100 * (Space used on leaf pages / Space available on non-empty leaf pages) > MIN(50, (100 - PCTFREE)) Less than 50% of the space reserved for index entries should be empty F6: (100 - PCTFREE) * (Amount of space available in an index with one less level / Amount of space required for all keys) < 100 Determine if recreating the index would result in a tree having fewer levels F7: 100 * (Number of pseudo-deleted RIDs / Total number of RIDs) < 20 The number of pseudo-deleted RIDs on non-pseudo-empty pages should be less than 20 percent F8: 100 * (Number of pseudo-empty leaf pages / Total number of leaf pages) < 20 The number of pseudo-empty leaf pages should be less than 20 percent of the total number of leaf pages 32
  33. 33. Monitoring Online Index Reorg - Administrator Log 2007-03-12- Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:15 Database:SAMPLE ADM9501W BEGIN online index reorganization on table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). ^^ 2007-03-12- Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE <instanceName>.nfy ADM9503W Online index reorganization proceeds on index ID "1" in table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). NOTIFYLEVEL=3 (default) ^^ 2007-03-12- Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:18 Database:SAMPLE ADM9503W Online index reorganization proceeds on index ID "2" in table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). ^^ 2007-03-12- Instance:bminor Node:000 PID:40710(db2agent (SAMPLE)) TID:1 Appid:*LOCAL.bminor.070313032618 relation data serv sqlrreorg_indexes Probe:31 Database:SAMPLE ADM9502W END online index reorganization on table "BMINOR .STAFF" (ID "3") and table space "USERSPACE1" (ID "2"). 33
  34. 34. DB2 9 Deep Compression Estimate Compression: INSPECT ROWCOMPESTIMATE Enable Compression: CREATE TABLE …. [COMPRESS {NO | YES}] ALTER TABLE … [COMPRESS {NO | YES}] Compress a table: REORG TABLE <tabname> … [KEEPDICTIONARY | RESETDICTIONARY] (Session TLU-1242: Deep Compression) 34
  35. 35. Table REORG – Deep Compression EMPTY TABLE Uncompressed Row Data Compressed Row Data Dictionary Table INSERT REORG COMPRESS YES LOAD INDEX 35
  38. 38. Page Size Selection Default DB page size is 4K, can override on CREATE DATABASE CREATE DATABASE SAMPLE PAGESIZE 16 k Must always have a system temporary table space with a page size that matches the catalog table space (SYSCATSPACE) page size All CREATE BUFFERPOOL and CREATE TABLESPACE statements will default to the database page size unless explicitly specified Larger page sizes allow Larger capacity limits for objects (REGULAR or LARGE table spaces) Longer rows in tables, larger keys in indexes (25% of page size) Fewer logical and/or physical page reads (more things on each page) Smaller page sizes allow Possibly less page contention (fewer rows/keys on each page) Possibly better I/O behavior for pure OLTP environments page size * extent size == space per block for MDC tables Very, very important to prevent sparse blocks/cells 38
  39. 39. Extent Size Selection Unit of disk allocation to table storage objects Allocate an extent of “page size” pages on init and extend of objects Round robin approach across all containers SMS allocates page by page until size exceeds 1 extent DMS table spaces have an EMP (Extent Map) storage object for each table storage object – 2 extents minimum per object (data, index) Larger extents allow for Less frequent allocations during growth Less frequent EMP mapping during table scan Best for large tables Smaller extents allow for Most optimal storage, less waste due to partial extents being used Less storage for empty or very small tables extent size == block size for MDC tables Very, very important to prevent sparse blocks/cells 39
  40. 40. DMS Tablespace Architecture – ‘The Extent of Extents’ create tablespace dms1 managed -create table t2 by database using (file ‘c.1’ 50000) insert into t1 -load into t1 extent size 4 create table t1 … -load into t2 0 Tablespace Header xx xx xxx xxxx 1 First Extent of SMPs xx xx xxxx x T1 T1 T1 2 Object Table Extent T2 z zz zzz 3 Extent Map for T1 EMP T1 EMP T1 4 First Extent of Data DAT T1 DAT T1 Pages for T1 5 DAT T1 5 yy EMP T2 6 DAT T2 7 31968 8 DAT T1 9 DAT T2 40
  41. 41. DB2_OBJECT_TABLE_ENTRIES Registry Variable db2set DB2_OBJECT_TABLE_ENTRIES=nnnnn Specifies the expected number of objects in a table space. If you know that a large number of objects (for example, 1000 or more) will be created in a DMS table space, you should set this registry variable to the approximate number before creating the table space. This will reserve contiguous storage for object metadata during table space creation. Reserving contiguous storage reduces the chance that an online backup will block operations which update entries in the metadata (for example, CREATE INDEX, IMPORT REPLACE). It will also make resizing the table space easier because the metadata will be stored at the start of the table space. Tablespace Header Tablespace Header xx xx First Extent of SMPs First Extent of SMPs xx Object Table Extent Object Table Extent Object Table Extent Object Table Extent 41
  42. 42. DB2_TRUNCATE_REUSESTORAGE Registry Variable db2set DB2_TRUNCATE_REUSESTORAGE=IMPORT You can use this variable to resolve lock contention between the IMPORT with REPLACE command and the BACKUP ... ONLINE command. In some situations, online backup and truncate operations are unable to execute concurrently. When this occurs, you can set DB2_TRUNCATE_REUSESTORAGE to "IMPORT" or "import", and physical truncation of the object, including data, indexes, long fields, large objects and block maps (for multi-dimensional clustering tables), is skipped and only logical truncation is performed. That is, the IMPORT with REPLACE command empties the table, causing the object's logical size to decrease, but the storage on disk remains allocated. This registry variable is dynamic; you can set it or unset it without having to stop and start instance. You can set DB2_TRUNCATE_REUSESTORAGE before an online backup starts and then unset it after online backup completes. For multi-partitioned environments, the registry variable will only be active on the nodes on which the variable is set. DB2_TRUNCATE_REUSESTORAGE is only effective on DMS permanent objects. admin_get_tab_info( ) DATA_OBJECT_P_SIZE vs DATA_OBJECT_L_SIZE 42
  43. 43. The "High Water Mark" (HWM) It is the page number of the highest allocated page in a DMS tablespace HWM is impacted by: ƒOffline' REORG of a table within the DMS tablespace that the table resides in ƒIndex REORG with either ALLOW READ ACCESS or ALLOW WRITE ACCESS HWM affects: ƒ Redirected Restore - redefinition of containers allowing tablespace to shrink in size; cannot be shrunk lower than HWM ƒ Dropping or reducing the size of container via ALTER TABLESPACE only affects extents above the HWM T1 T1 DMS PERM db2 reorg table T1 TABLESPACE T1SHADOW T1' HWM 43
  44. 44. The "High Water Mark" (HWM) If no free extents below the HWM then the only way to reduce the HWM is to drop the object holding it up db2dart /DHWM Displays detailed tablespace information including which extents are free, which are in use and what object is using them as well as information about the object holding up the HWM db2dart /LHWM provides guidance as to how the HWM might potentially be lowered If DMS table data object holding up HWM then 'offline' REORG of table within the DMS tablespace that the table resides can be used to lower the HWM if enough free extents exist below the HWM to contain the shadow copy If DMS index object holding up HWM, index reorg may be able to reduce HWM db2dart /RHWM If empty SMP extent holding up HWM http://www-1.ibm.com/support/docview.wss?rs=71&context=SSEPGG&q1=high+water+mark&uid=swg21234267&loc=en_US&cs=utf- 8&lang=en Viper II: ALTER TABLESPACE REDUCE and Online Backup will remove these 44
  45. 45. Large RID – the new default in DB2 9 RID – Row Identifier A reference to the location of a row in a table Contains the page number and the slot number (location on page) Before DB2 9 RID is 4 bytes, 3 byte page number and 1 byte slot number Default table space data type was REGULAR Tables (data part) could not be placed in LARGE table spaces DB2 9 New 6 byte RID, 4 byte page number and 2 byte slot number Infrastructure - runtime, sections, sort, log records, locks – all large RID Default table space data type for DMS table spaces is now LARGE Tables can now be placed in LARGE table spaces Indexes contain regular or large RIDs only, based on the table space type where the table data is stored; it has nothing to do with the type of table space where the index resides 45
  46. 46. Large RIDs – More pages, More rows, Bigger Tables New to DB2 9 More Pages: Maximum tablespace size by page size (default) Page Size 4 Byte RID 6 Byte RID (‘Large RIDs’) 4 KB 64 GB 2 TB For tables in LARGE table 8 KB 128 GB 4 TB spaces (DMS only). Also all SYSTEM and USER 16 KB 256 GB 8 TB temporary table spaces 32 KB 256 GB 16 TB For tables in all tablespace types: regular, temporary, More Rows DMS, SMS Maximum rows per page by page size per Page: Page Size REG REG LARGE LARGE TBSP TBSP TBSP TBSP Min Rec Max Min Rec Max Length Records Length Records 4 KB 14 251 12 287 Maximum number of rows: 8 KB 30 253 12 580 Large RIDs - 1.1x1012 16 KB 62 254 12 1165 4 byte RIDs - 4x109 32 KB 127 253 12 2335 46
  47. 47. Converting Existing Tablespaces To LARGE ALTER TABLESPACE <name> CONVERT TO LARGE New option must be the only option, cannot be combined with other alter capabilities Fully logged and supports ROLLBACK and RESTORE/ROLLFORWARD If table space is defined with AUTORESIZE YES If MAXSIZE is NONE, then growth of the table space is automatic! Else MAXSIZE is restricting table space growth and should be increased Otherwise, storage has to be increased to benefit from a larger capacity Enable AUTORESIZE or Add a new stripe set or Extend existing containers New tables created will fully support large RIDs, both page and slot numbers Previously existing tables continue to be restricted to ~255 rows/page and to 3 byte page numbers until a reorganization of the table or indexes occur SQL1236N Table "<table-name>" cannot allocate a new page because the index with identifier "<index-id>" does not yet support large RIDs BEST PRACTICE: Perform the ALTER TABLESPACE during upgrade/migration Be pro-active in rebuilding indexes on tables (or reorganizing tables) afterwards 47
  48. 48. Large RIDs - What Actions Need To Be Taken? The table will not support a larger 4 byte page number until all indexes on the table support large RIDs SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_RIDS = ‘P’ The table will not support >255 rows (slots) per page until the table itself has been reorganized with the classic/offline REORG TABLE SELECT TABNAME, TABSCHEMA, DBPARTITIONNUM FROM TABLE (ADMIN_GET_TAB_INFO( '', '' )) AS T WHERE LARGE_SLOTS = ‘P’ Can my table benefit from large slots (more rows per page)? SELECT TABSCHEMA, TABNAME, AVGROWSIZE FROM SYSCAT.TABLES If the (average row size - 2) for a table is smaller than the minimum record length for the page size used, then there could be storage benefits when converting the table space to large and reorganizing the table to enable large slots 48
  49. 49. Log Consumption – INSERT and DELETE Row images are logged so that DB2 can redo or undo actions Real log space from active log is written and consumed Virtual log space from active log is reserved for rollback INSERT Row image being inserted is logged (required for redo!) Reserve log space for “delete” on undo • Space for row image is not required in reserved space DELETE Row image being deleted is logged (required for undo!) Reserve log space for “insert” on undo • Space for row image is required in reserved space When row compression is active, the row images are compressed, resulting in fewer bytes logged, reserved, and less log files usage 49
  50. 50. Log Consumption - UPDATE There are three different types of UPDATE log records written by DB2: 1. Full before and after row image logging. The entire before and after image of the row is logged. This is the only type of logging performed on tables enabled with DATA CAPTURE CHANGES. 2. Full XOR logging. The XOR differences between the before and after row images, from the first byte that is changing until the end of the smaller row, then any residual bytes in the longer row. 3. Partial XOR logging. The XOR differences between the before and after row images, from the first byte that is changing until the last byte that is changing. Byte positions may be first/last bytes of a column. Row images must be the exact same length. 50
  51. 51. Log Consumption – UPDATE Examples 1. Full before and after row image logging (DATA CAPTURE CHANGES) Fred 500 10000 Plano TX 24355 John 500 10000 Plano TX 24355 Fred 500 10000 Plano TX 24355 John 500 10000 Plano TX 24355 2. Full XOR logging (row length changing updated) Fred 500 10000 Plano TX 24355 Frank 500 10000 Plano TX 24355 11011010100101001100100101001010101011010010101010110101010101010 01 3. Partial XOR logging (row length does not change) Fred 500 10000 Plano TX 24355 John 500 10000 Plano TX 24355 110110101001010011 51
  52. 52. Log Consumption – Full XOR Logging Details When the total length of the row is changing, which is common when variable length columns are updated and also when row compression is enabled, DB2 will determine which byte is first to be changing and log a Full XOR log record. 1. Full XOR logging (length change) with changed column at/near beginning of row Fred 500 10000 Plano TX 24355 Frank 500 10000 Plano TX 24355 11011010100101001100100101001010101011010010101010110101010101010 01 2. Full XOR logging (length change) with changed column at/near end of row 500 10000 Plano TX 24355 Fred 500 10000 Plano TX 24355 Frank 110110101001010011 01 52
  53. 53. Log Consumption – Partial XOR Logging Details When the total length of the row is not changing, even when row compression is enabled, DB2 will compute and write the most optimal Partial XOR log record possible. 1. Partial XOR logging (no length change) with a gap between columns being changed Fred 500 10000 Plano TX 24355 John 500 12345 Plano TX 24355 11011010100000000000000001001001 2. Partial XOR logging (no length change) with no gap between columns being changed 500 Fred 10000 Plano TX 24355 500 John 12345 Plano TX 24355 1101101010010100110101 53
  54. 54. Log Consumption – Best Practices Columns which are updated frequently (changing value) should be: grouped together defined towards or at the end of the table definition These recommendations are independent of Row compression Row format (default or null/value compression) The benefit would be: better performance less bytes logged less log pages written smaller active log requirement for transactions performing a large number of updates. 54
  55. 55. Summary Real Estate is a BIG investment Knowing details about your DB2 ‘Real Estate’ will allow you to better leverage that investment With DB2 9 (Viper) and Viper II (DB2 9.5) significant new functionality has been developed to help with the management of storage Going forward, one can expect the trend to continue 55
  56. 56. DB2 Real Estate – Buy, Invest, Sell, … Reorg?! Bill Minor - IBM Toronto Lab bminor@ca.ibm.com TLU- 1243A Data Servers - DB2 for Linux, UNIX, Windows THANK YOU!!! . Your Feedback is greatly appreciated. 56