The Big Deal
              About Making Things

the years, but these three are the most      page set header page and just before the     isn’t sufficient enough to incre...
Experience IDUG: Face-to-Face
                                                     For your DB2 education, training, and n...
In DB2 V6 and V7, some DB2 users         delivered in DB2 9 for z/OS. You could       REBUILD INDEX or REORG can be
had is...
REPLACE use the utility keyword             The DSN1COMP Utility                              For a tablespace in message
The Big Deal
Upcoming SlideShare
Loading in …5

The Big Deal


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

The Big Deal

  1. 1. The Big Deal About Making Things Smaller: DB2 Compression By Willie Favero C ompression is fascinating. It’s storage conditions. One suggested compression techniques, often referred been around in one form or method for reducing the amount of to as LZ77 (LZ1) and LZ78 (LZ2), which another for almost forever, or storage being used by DB2 was to turn remain widely used today. LZ stands for at least as long as we’ve had off compression. The compression dic- Lempel-Ziv. The 77 and 78 are the years data. In the land of DB2, we started out tionary had to be loaded into memory, a they came up with and improved their with software compression. A compres- resource that became increasingly valu- lossless compression algorithm. Various sion “program” would be installed in able with the newer releases of DB2. forms of LZ compression are employed place of the EDITPROC to compress Unfortunately, the amount of data in when you work with Graphic your data as it was added to DB2. At DB2 also had significantly increased. It Interchange Format (GIF), Tagged one point, compression software could wasn’t all just data warehousing and Image File (TIF), Adobe Acrobat be a company’s primary claim to fame. Enterprise Resource Planning (ERP) Portable Document Format (PDF), It worked, it was impressive, and it applications causing it, either. The ARC, PKZIP, COMPRESS and saved disk space. However, it could be amount of data supporting everyday COMPACT on the Unix platform, and expensive in terms of CPU usage. You Online Transaction Processing (OLTP) StuffIt on the Macintosh platform. used it when you absolutely needed to also was increasing at an outstanding LZ77 is an adaptive, dictionary- reduce the size of a table- space. DB2, rate. Compression has become a must based compression algorithm that works for a while, even supplied a sample have and must use tool. off a window of data using the data just compression routine based on the Compression is a topic that needs to read to compress the next data in the Huffman algorithm that could be used be revisited, given the number of cus- buffer. Not being completely satisfied in place of the EDITPROC. Compression tomers planning and upgrading to DB2 with the efficiency of LZ77, Lempel-Ziv was considered a necessary evil. Version 8. This article will cover some developed LZ78. This variation is based In December 1993, that changed. compression basics, including why on all the data available rather than just Hardware-assisted compression, or sometimes you’re told to avoid com- a limited amount. Enterprise Systems Architecture (ESA) pression, tablespace compression, and In 1984, a third name joined the compression, became a feature of DB2. finally, index compression. group when Terry Welch published an Because of the hardware assist, the cost improved implementation of LZ78 of the compression process was signifi- The Basics known as LZW (Lempel-Ziv-Welch). cantly reduced. Compression began to Back in 1977, two information theo- The Welch variation improved the speed gain in popularity. Turn the clock for- rists, Abraham Lempel and Jacob Ziv, of implementation, but it’s not usually ward about 10 years, and although com- thought long strings of data should (and considered optimal because it does its pression is now widely accepted, another could) be shorter. This resulted in the analysis on only a portion of the data. problem reared its ugly head: short on development of a series of lossless data There have been other variations over 50  •  z/Journal  • February/March 008
  2. 2. the years, but these three are the most page set header page and just before the isn’t sufficient enough to increase the significant. IBM documentation refers to space map page. Compression also is at number of rows per page, there’s no only Lempel-Ziv; it doesn’t distinguish the partition level. This is good because advantage, or disk savings, using com- between which variation DB2 uses. it lets you compress some partitions pression. A 4,000 byte row with 45 per- The term “lossless compression” is while leaving other partitions in an cent compression is still 2,200 bytes. significant for our discussion. When uncompressed format. But with all With only a maximum of 4,054 bytes you expand something that has been good, there’s always a chance for bad. A available in a data page, the compressed compressed and you end up with the single partitioned tablespace could cre- length is still too large to fit a second exact same thing you started with, that’s ate an awful lot of dictionaries; 4,096 in row into a single page. If you’re using called lossless compression. It differs all in V8 if you had defined the maxi- large rows, don’t give up immediately from “lossy compression,” which mum number of partitions. on compression. Remember, DB2 has describes what occurs when images Not all rows in a tablespace, or all row sizes other than 4K. Getting good (photographs) are compressed in com- tablespaces for that matter, can be com- compression and saving on space may monly used formats such as Joint pressed. If the row after compression require you to move to a larger 8K or Photographic Experts Group (JPEG). isn’t shorter than the original uncom- 16K page size. You may be able to fit The more you save a JPEG (recompress- pressed row, the row remains uncom- only one 3,000 byte row into a 4K page, ing an already compressed image), the pressed. You also can’t turn on but you can fit five of them into a 16K more information about that image you compression for the catalog, directory, page and you just saved the equivalent lose until the image is no longer accept- work files, and LOB tablespaces. of a 4K page by simply increasing the able. This process would be problematic The compression dictionary can take page size. when working with data. up to 64K (16 x 4K page) of storage in What if you’re now taking advantage the DBM1 address space. Fortunately, of encryption? Usually, once you encrypt DB2’s Use of Compression the dictionary goes above the bar in something, you’ll gain little by com- Data compression today (and since DB2 V8 and later releases. pressing it. For that reason, consider DB2 V3) relies on hardware to assist in A data row is compressed as it’s compressing first, and then encrypting. the compression and decompression inserted. However, if a row needs to be Taking advantage of compression can process. The hardware is what prevents updated, the row must be expanded, realize some significant disk savings, the high CPU cost that was once associ- updated, and recompressed to complete but did you know there may be benefits ated with compression. Hardware com- the operation—making the UPDATE to using compression such as a bit of a pression keeps getting faster because potentially the most expensive SQL performance boost? If you were to dou- chip speeds increase with every new operation when compression is turned ble the number of rows in a page because generation of hardware. The z9 on. The good news is that all changed of compression, when DB2 loads that Enterprise Class (EC) processors are rows, including inserted rows, are page in a buffer pool, it will be loading even faster than zSeries machines, logged in their compressed format, so twice as many rows. When compression which were faster than their predeces- you might save a few bytes on your logs. is on, data pages are always brought into sors. Because compression support is And remember, the larger page sizes the buffer pool in their compressed built into a chip, compression speed available in DB2 may result in better state. Having more rows in the same gets faster as new processors get faster. compression. After all, the resulting row size pool could increase your hit ratio Currently, a compression unit shares after compression is variable length, so and reduce the amount of I/O necessary the same Processor Unit (PU) as the CP you might be able to fit more rows with to satisfy the same number of getpage Assist for Cryptographic Function on a less wasted space in a larger page size. requests. Less I/O is always a good z9 EC server. However, that’s where any Index compression, discussed later, thing. similarities stop. Data compression doesn’t use a dictionary. Another potential gain could come should never be considered another in the form of logging. When a com- form of encryption. Don’t confuse the When to Use Data Compression pressed row is updated or inserted, it’s two concepts. First, the dictionary used There are situations where compres- logged in its compressed format. For during compression is based on a spe- sion may not be the right choice. You inserts and deletes, this could mean cific algorithm, so the dictionary, in must choose objects that will benefit reduced logging, and possibly reduced theory, could be figured out. Next, from compression. Some tablespaces log I/O. Update, though, may be anoth- there’s no guarantee that every row will just don’t realize any benefit from com- er story. Updates may or may not bene- get compressed. If a shorter row doesn’t pression. If rows are already small and a fit from compression. But that is an result from compression, the row is left page contains close to the 255 maxi- entirely different discussion that doesn’t in its original state. In fact, your meth- mum rows per page, making the rows fit here. odology should be to first compress smaller won’t get you any additional What about those recommendations your data and then encrypt the com- rows on the page. You’ll be paying the against data compression? Actually, they pressed data. You do the opposite going cost of compressing and just ending up really weren’t recommendations against the other way: decrypt and then decom- with a bunch of free space in the page compression as much as a warning press (expand). that you will never be able to take about the storage compression diction- advantage of. Maybe someday the 255 aries could take up and a suggestion to The Dictionary rows per page restriction will be lifted be careful to use compression where The dictionary is created by only the from DB2, opening up many new com- you can actually gain some benefit. This LOAD and REORG utilities. When pression opportunities. guidance was all about storage shortages using a 4K table space page size, it occu- Consider the other extreme: a table and, in almost all cases, emerged prior pies 16 pages immediately following the with quite large rows. If compression to DB2 Version 8. 5  •  z/Journal  • February/March 008
  3. 3. Experience IDUG: Face-to-Face For your DB2 education, training, and networking needs look no further than IDUG, the International DB2 Users Group. Experience IDUG face-to-face at our 2008 events and walk away with a host of new ideas, proven techniques, and professional contacts to take your utilization of DB2 to the next level. IDUG 2008 – Australasia IDUG 2008 – North America IDUG 2008 – India March 5–7 May 18–22 August 21–23 Sydney, Australia Dallas, Texas, USA Bangalore, India IDUG 2008 – Europe IDUG 2008 – Brazil October 13–17 November – Dates TBD Warsaw, Poland Sao Paolo, Brazil For more information on IDUG events, visit The International DB2 Users Group (IDUG®) is an independent, not-for-profit, user-run organization whose mission is to support and strengthen the information services community by providing the highest quality education and services designed to promote the effective utilization of the DB2 family of products. The DB2 Product Family includes DB2 for z/OS; DB2 for Linux, UNIX, Windows; DB2 Data Warehouse Edition, DB2 for iSeries; DB2 for VSE and VM; and DB2 Everyplace. IDUG Headquarters | 401 N. Michigan Avenue | Chicago, IL 60611- 4267 | T: +1.312.321.6881 | F: +1.312.673.6688 | E: | W:
  4. 4. In DB2 V6 and V7, some DB2 users delivered in DB2 9 for z/OS. You could REBUILD INDEX or REORG can be had issues with storage shortages. possibly see even more benefit than used to remove the pending state. In Increased business needs translate to with tablespace compression. addition, if the index is using version- more resources necessary for DB2 to ing, altering that index to use com- accomplish what customers have grown Index Compression pression will place the index in to expect. The DBM1 address space Taking liberties with the Monty REBUILD-pending status. took more and more storage to accom- Python catch phrase, “And Now for With the addition of index compres- plish its expected daily activities. This Something Completely Different…”: sion, a new column, COMPRESS, has eventually resulted in storage shortage index compression. Here it fits so well. been added to the DB2 catalog table issues. One of the many tuning knobs Although DB2 adds index compression SYSIBM.SYSINDEXES. It’s used to used to address this problem was reduc- in DB2 9, the term compression is where identify if compression is turned on, has ing the usage of compression and there- similarities to data compression quickly a value of Y (yes), or off, a value of N fore reducing the number of dictionaries end: (no) for an index. that needed to be loaded into the DBM1 If you’re doing warehousing, and for address space. Each compression dic- • There’s no hardware assist some types of OLTP, it’s quite possible tionary can take up 64K of storage— • It doesn’t use the Lempel-Ziv algorithm they could use as much, if not more, storage that many pools compete for. • No dictionary is used or created. disk space for indexes than for the data. The entire compression dictionary is With that in mind, index compression loaded into memory when a tablespace An index is compressed only on can make a huge difference when it (or tablespace partition) is opened. disk; it isn’t compressed in the buffer comes to saving disk space. Trimming the number of objects defined pools or on the logs. DB2 uses prefix with COMPRESS YES could yield sig- compression, similar to VSAM, because The LOAD and REORG Utilities nificant savings. Many shops were sim- the index keys are ordered and it com- The critical part of data compression ply turning on compression for presses both index keys and RIDSs. is building the dictionary. The better everything, with little analysis of the However, only leaf pages are com- the dictionary reflects your data, the possible benefits, if any, they might gain pressed, and compression takes place at higher your compression rates will be. vs. the cost of keeping all those diction- the page level, not the row level, as Assuming a tablespace has been created aries in memory. index pages are moved in and out of the or altered to COMPRESS YES, you have Enter DB2 V8 and its 64-bit archi- buffer pools. A compressed index page two choices for building your diction- tecture. In V8, the compression diction- also is always 4K on disk. When brought ary: either the LOAD or REORG utili- aries are loaded above the 2GB bar. You into the buffer pool, it’s expanded to 8K, ties. These two utilities are the only still must be cognizant of the amount of 16K or 32K, so the index must be mechanism available to you to create a storage you’re using to ensure you don’t defined to the appropriate buffer pool. dictionary for DB2’s data compression. use more virtual storage than you have The effectiveness of index compression The LOAD utility uses the first “x” real storage. Keep that virtual backed by is higher, depending on the buffer pool number of rows to create the dictionary. 100 percent real at all times with DB2. size you choose. However, there are no rows compressed But dictionaries are one area that should When an index page is moved into a while the LOAD utility is actually build- no longer be of major concern. If it was buffer pool, the page is expanded; when ing the dictionary. Once the dictionary necessary for you to turn off compres- moved to disk, the page is compressed. is created, the remaining rows being sion or you just weren’t considering Because of this, CPU overhead is sensi- loaded will be considered for compres- using compression in V7 because you tive to buffer pool hit ratio. You want to sion. With the dictionary in place, any had storage concerns, with DB2 V8, make sure your pools are large enough rows inserted (SQL INSERT) also will even while in Compatibility Mode so the index pages remain in the buffer be compressed, assuming the com- (CM), you can start to reverse some of pool to avoid expansion and compres- pressed row is shorter than the original those decisions. You can realize signifi- sion of pages. You also should be aware uncompressed row. cant disk savings, and possibly even a that an increase in the number of index The REORG utility is usually the bet- bit of a performance boost, through the levels is possible with compression com- ter choice for creating the dictionary. use of DB2’s data compression. So it’s pared to an uncompressed index using REORG sees all the data rows because time to start reviewing all those the same page size. the dictionary is built during its tablespaces not using compression and One of the nice features of index UNLOAD phase. It has the potential to consider turning them back on in cases compression is its lack of a dictionary. create a more accurate, efficient diction- where you can determine there’s a ben- With no dictionary, there’s no need to ary than the LOAD utility. REORG also efit. However, even with the virtual run the REORG or LOAD utilities will compress all the rows in the table- storage that comes with 64-bit process- prior to actually compressing your space during the RELOAD phase because ing, you still need to apply common index data. When compression is the dictionary is now available. In theory, sense. If, for example, you suddenly turned on for an index, key and RID the more information used to create the start investing in a bunch of tablespaces compression begins immediately. dictionary, the better your compression. with 4,096 partitions, turning compres- However, if you alter compression off If you have a choice, pick the REORG sion on for every partition may not be for an index (ALTER COMPRESS NO) utility to create the dictionary. in your best interest. after already having used index com- Creating a dictionary has the poten- If you think this is all pretty cool pression, that index will be placed in tial to be a CPU expensive operation. If when used with a tablespace, then you’re REBUILD-pending (RBDP) or pageset you’re satisfied with your compression going to be thrilled with the next part of REBUILD-pending (PSRBD) state dictionary, why repeatedly pay that this discussion: index compression depending on the index type. expense? Both REORG and LOAD 5  •  z/Journal  • February/March 008
  5. 5. REPLACE use the utility keyword The DSN1COMP Utility For a tablespace in message KEEPDICTIONARY to suppress build- How do you know if compression DSN1940I, you’re given statistics with ing a new dictionary. This will avoid the will save you anything? Making a com- and without compression, and the per- cost of a dictionary rebuild, a task that pression decision for an object in DB2 is centage you should expect to save in could increase both your elapsed time anything but hit or miss. DB2 includes a kilobytes. It gives you the number of and CPU time for the REORG process. mechanism to help you estimate what rows scanned to build the dictionary This all seems straightforward to this your disk savings could be. This stand- and the number of rows processed to point. alone utility, DSN1COMP, will tell you deliver the statistics in the report. In However, your upgrade to DB2 9 for what your expected savings could be addition, it lists the average row length z/OS will add a slight twist to the above should you choose to take advantage of before and after compression, the size of discussion. With DB2 9, the first access data or index compression. You can run the dictionary in pages, the size of the to a tablespace by REORG or LOAD this utility against a tablespace or index tablespace in pages before and after REPLACE changes the row format from space underlying VSAM data set, the compression, and the percentage of Basic Row Format (BRF), the pre-DB2 9 output from a full image copy, or the pages that would have been saved. row format, to Reordered Row Format output from DSNCOPY. You can’t run If DSN1COMP is run against an (RRF). (See “Structure and Format DSN1COMP against LOB tablespaces, index, it reports on the number of leaf Enhancements in DB2 9 for z/OS” in the catalog (DSNDB06), the directory pages scanned, the number of keys and the August/September 2007 issue of (DSNDB01), or workfiles (i.e., RIDs processed, how many kilobytes of z/Journal for more details on RRF.) This DSNDB07). Using DSN1COMP with key data were processed, and the num- row format change does assume you’re image copies and DSN1COPY outputs ber of kilobytes of compressed keys in DB2 9 New Function Mode (NFM) can make gathering information about produced. The report that comes out of and none of the tables in the tablespace potential compression savings com- DSN1COMP for any index provides the are defined with an EDITPROC or pletely unobtrusive. possible percent reduction and buffer VALIDPROC. When choosing a VSAM LDS to run pool space usage for both 8K and 16K The change to RRF is good news for against, be careful if you’re using online index leaf page sizes. This will help con- the most part. When using variable REORG (REORG SHRLEVEL siderably when trying to determine the length data, you will more than likely CHANGE). Online REORG flips correct leaf page size. end up with a more efficient compres- between the I0001 and J0001 for the sion dictionary when you rebuild the fifth qualifier of the VSAM data sets. Conclusion dictionary after converting to RRF. A You can query the IPREFIX column in Whether you’re interested in com- potential problem though, is that many SYSTABLEPART or SYSINDEXPART pressing your data, indexes, or both, shops have KEEPDICTIONARY speci- catalog tables to find out which qualifier compression can provide a wealth of fied in their existing REORG and LOAD is in use. benefits, including saving you tons of jobs and the dictionary won’t be rebuilt. You also should take the time to use disk space and possibly even improving The IBM development lab doesn’t want the correct execution time parameters your performance. With DB2 Version 8 to force everyone to change all their job so the results you get are usable. Things (and above), there are a few reasons not streams for just one execution of REORG s u c h a s PA G E S I Z E , D S S I Z E , to take advantage, so now is a good time or LOAD just to rebuild the dictionary. FREEPAGE, and PCTFREE should be to put your compression plan together. Their solution: APAR PK41156 changes set exactly as the object you’re running As you do, take time to ensure that REORG and LOAD REPLACE so they DSN1COMP against to ensure that its compression will actually accomplish ignore KEEPDICTIONARY for that estimates are accurate. If you plan to the goal you have in mind. Never man- one-time run when the rows are reor- build your dictionaries using the date it or simply reject it without proper dered and allows for a rebuild of the REORG utility as recommended, you’ll analysis. Z dictionary regardless of the also want to specify REORG at run- KEEPDICTIONARY setting. time. If you don’t, DSN1COMP assumes For More Information: What if you really don’t want to do the dictionary will be built by the LAOD • IBM Redbook, DB2 for OS/390 and Data Compression a rebuild of the dictionary right now, utility. If you’re using a full image copy (SG24-5261) regardless of what DB2 might want to as input to DSN1COMP, make sure • z/Architecture Principles of Operation (SA22-7832), do? How do you get around this you’ve specified the FULLCOPY key- particularly for insight on the hardware instruction APAR’s change? Well, the APAR also word to obtain the correct results. CMPSC introduces a new keyword for REORG When running DSN1COMP against • RedPaper, “Index Compression With DB2 9 for z/OS” and LOAD REPLACE that gives you a an index, you can specify the number of (REDP-4345). work-around and still doesn’t require leaf pages that should be scanned using you to change your jobs if you simply the LEAFLIM keyword. If this keyword About the Author want DB2 to rebuild the dictionary. is omitted, the entire index will be In the past 29 years, wiLLiE FaVEro has been a The ne w ke y word is HONOR_ scanned. Specifying LEAFLIM could customer, worked for IBM, worked for a vendor, KEEPDICTIONARY and it defaults to limit how long it might take DSN1COMP and is now an IBM employee again. He has always NO. So, if you don’t specify this key- to complete. worked with databases and has more than 20 years of DB2 experience. A well-known, frequent speaker at word, your dictionary will be rebuilt. So, how does DSN1COMP help you conferences and author of numerous articles on DB2, However, if you do want to “honor” determine if compression is right for you? he’s currently with North American Lab Services for DB2 your current dictionary, you can add When it completes, it gives a short report, for z/OS, part of IBM’s Software Group. this keyword to your REORG or accounting for all the above information, Email: LOAD REPLACE job and things will which details what might have happened Website: behave as they did in the past. if compression had been turned on. db2zos/ z/Journal  •  February/March 008  •  55