The Big Deal
About Making Things
By Willie Favero
ompression is fascinating. It’s storage conditions. One suggested compression techniques, often referred
been around in one form or method for reducing the amount of to as LZ77 (LZ1) and LZ78 (LZ2), which
another for almost forever, or storage being used by DB2 was to turn remain widely used today. LZ stands for
at least as long as we’ve had off compression. The compression dic- Lempel-Ziv. The 77 and 78 are the years
data. In the land of DB2, we started out tionary had to be loaded into memory, a they came up with and improved their
with software compression. A compres- resource that became increasingly valu- lossless compression algorithm. Various
sion “program” would be installed in able with the newer releases of DB2. forms of LZ compression are employed
place of the EDITPROC to compress Unfortunately, the amount of data in when you work with Graphic
your data as it was added to DB2. At DB2 also had significantly increased. It Interchange Format (GIF), Tagged
one point, compression software could wasn’t all just data warehousing and Image File (TIF), Adobe Acrobat
be a company’s primary claim to fame. Enterprise Resource Planning (ERP) Portable Document Format (PDF),
It worked, it was impressive, and it applications causing it, either. The ARC, PKZIP, COMPRESS and
saved disk space. However, it could be amount of data supporting everyday COMPACT on the Unix platform, and
expensive in terms of CPU usage. You Online Transaction Processing (OLTP) StuffIt on the Macintosh platform.
used it when you absolutely needed to also was increasing at an outstanding LZ77 is an adaptive, dictionary-
reduce the size of a table- space. DB2, rate. Compression has become a must based compression algorithm that works
for a while, even supplied a sample have and must use tool. off a window of data using the data just
compression routine based on the Compression is a topic that needs to read to compress the next data in the
Huffman algorithm that could be used be revisited, given the number of cus- buffer. Not being completely satisfied
in place of the EDITPROC. Compression tomers planning and upgrading to DB2 with the efficiency of LZ77, Lempel-Ziv
was considered a necessary evil. Version 8. This article will cover some developed LZ78. This variation is based
In December 1993, that changed. compression basics, including why on all the data available rather than just
Hardware-assisted compression, or sometimes you’re told to avoid com- a limited amount.
Enterprise Systems Architecture (ESA) pression, tablespace compression, and In 1984, a third name joined the
compression, became a feature of DB2. finally, index compression. group when Terry Welch published an
Because of the hardware assist, the cost improved implementation of LZ78
of the compression process was signifi- The Basics known as LZW (Lempel-Ziv-Welch).
cantly reduced. Compression began to Back in 1977, two information theo- The Welch variation improved the speed
gain in popularity. Turn the clock for- rists, Abraham Lempel and Jacob Ziv, of implementation, but it’s not usually
ward about 10 years, and although com- thought long strings of data should (and considered optimal because it does its
pression is now widely accepted, another could) be shorter. This resulted in the analysis on only a portion of the data.
problem reared its ugly head: short on development of a series of lossless data There have been other variations over
50 • z/Journal • February/March 008
the years, but these three are the most page set header page and just before the isn’t sufficient enough to increase the
significant. IBM documentation refers to space map page. Compression also is at number of rows per page, there’s no
only Lempel-Ziv; it doesn’t distinguish the partition level. This is good because advantage, or disk savings, using com-
between which variation DB2 uses. it lets you compress some partitions pression. A 4,000 byte row with 45 per-
The term “lossless compression” is while leaving other partitions in an cent compression is still 2,200 bytes.
significant for our discussion. When uncompressed format. But with all With only a maximum of 4,054 bytes
you expand something that has been good, there’s always a chance for bad. A available in a data page, the compressed
compressed and you end up with the single partitioned tablespace could cre- length is still too large to fit a second
exact same thing you started with, that’s ate an awful lot of dictionaries; 4,096 in row into a single page. If you’re using
called lossless compression. It differs all in V8 if you had defined the maxi- large rows, don’t give up immediately
from “lossy compression,” which mum number of partitions. on compression. Remember, DB2 has
describes what occurs when images Not all rows in a tablespace, or all row sizes other than 4K. Getting good
(photographs) are compressed in com- tablespaces for that matter, can be com- compression and saving on space may
monly used formats such as Joint pressed. If the row after compression require you to move to a larger 8K or
Photographic Experts Group (JPEG). isn’t shorter than the original uncom- 16K page size. You may be able to fit
The more you save a JPEG (recompress- pressed row, the row remains uncom- only one 3,000 byte row into a 4K page,
ing an already compressed image), the pressed. You also can’t turn on but you can fit five of them into a 16K
more information about that image you compression for the catalog, directory, page and you just saved the equivalent
lose until the image is no longer accept- work files, and LOB tablespaces. of a 4K page by simply increasing the
able. This process would be problematic The compression dictionary can take page size.
when working with data. up to 64K (16 x 4K page) of storage in What if you’re now taking advantage
the DBM1 address space. Fortunately, of encryption? Usually, once you encrypt
DB2’s Use of Compression the dictionary goes above the bar in something, you’ll gain little by com-
Data compression today (and since DB2 V8 and later releases. pressing it. For that reason, consider
DB2 V3) relies on hardware to assist in A data row is compressed as it’s compressing first, and then encrypting.
the compression and decompression inserted. However, if a row needs to be Taking advantage of compression can
process. The hardware is what prevents updated, the row must be expanded, realize some significant disk savings,
the high CPU cost that was once associ- updated, and recompressed to complete but did you know there may be benefits
ated with compression. Hardware com- the operation—making the UPDATE to using compression such as a bit of a
pression keeps getting faster because potentially the most expensive SQL performance boost? If you were to dou-
chip speeds increase with every new operation when compression is turned ble the number of rows in a page because
generation of hardware. The z9 on. The good news is that all changed of compression, when DB2 loads that
Enterprise Class (EC) processors are rows, including inserted rows, are page in a buffer pool, it will be loading
even faster than zSeries machines, logged in their compressed format, so twice as many rows. When compression
which were faster than their predeces- you might save a few bytes on your logs. is on, data pages are always brought into
sors. Because compression support is And remember, the larger page sizes the buffer pool in their compressed
built into a chip, compression speed available in DB2 may result in better state. Having more rows in the same
gets faster as new processors get faster. compression. After all, the resulting row size pool could increase your hit ratio
Currently, a compression unit shares after compression is variable length, so and reduce the amount of I/O necessary
the same Processor Unit (PU) as the CP you might be able to fit more rows with to satisfy the same number of getpage
Assist for Cryptographic Function on a less wasted space in a larger page size. requests. Less I/O is always a good
z9 EC server. However, that’s where any Index compression, discussed later, thing.
similarities stop. Data compression doesn’t use a dictionary. Another potential gain could come
should never be considered another in the form of logging. When a com-
form of encryption. Don’t confuse the When to Use Data Compression pressed row is updated or inserted, it’s
two concepts. First, the dictionary used There are situations where compres- logged in its compressed format. For
during compression is based on a spe- sion may not be the right choice. You inserts and deletes, this could mean
cific algorithm, so the dictionary, in must choose objects that will benefit reduced logging, and possibly reduced
theory, could be figured out. Next, from compression. Some tablespaces log I/O. Update, though, may be anoth-
there’s no guarantee that every row will just don’t realize any benefit from com- er story. Updates may or may not bene-
get compressed. If a shorter row doesn’t pression. If rows are already small and a fit from compression. But that is an
result from compression, the row is left page contains close to the 255 maxi- entirely different discussion that doesn’t
in its original state. In fact, your meth- mum rows per page, making the rows fit here.
odology should be to first compress smaller won’t get you any additional What about those recommendations
your data and then encrypt the com- rows on the page. You’ll be paying the against data compression? Actually, they
pressed data. You do the opposite going cost of compressing and just ending up really weren’t recommendations against
the other way: decrypt and then decom- with a bunch of free space in the page compression as much as a warning
press (expand). that you will never be able to take about the storage compression diction-
advantage of. Maybe someday the 255 aries could take up and a suggestion to
The Dictionary rows per page restriction will be lifted be careful to use compression where
The dictionary is created by only the from DB2, opening up many new com- you can actually gain some benefit. This
LOAD and REORG utilities. When pression opportunities. guidance was all about storage shortages
using a 4K table space page size, it occu- Consider the other extreme: a table and, in almost all cases, emerged prior
pies 16 pages immediately following the with quite large rows. If compression to DB2 Version 8.
5 • z/Journal • February/March 008
Experience IDUG: Face-to-Face
For your DB2 education, training, and networking needs look no further than
IDUG, the International DB2 Users Group.
Experience IDUG face-to-face at our 2008 events and walk away with a host of
new ideas, proven techniques, and professional contacts to take your utilization of
DB2 to the next level.
IDUG 2008 – Australasia IDUG 2008 – North America IDUG 2008 – India
March 5–7 May 18–22 August 21–23
Sydney, Australia Dallas, Texas, USA Bangalore, India
IDUG 2008 – Europe IDUG 2008 – Brazil
October 13–17 November – Dates TBD
Warsaw, Poland Sao Paolo, Brazil
For more information on IDUG events, visit
The International DB2 Users Group (IDUG®) is an independent, not-for-profit, user-run organization whose mission is to support and strengthen the information services
community by providing the highest quality education and services designed to promote the effective utilization of the DB2 family of products.
The DB2 Product Family includes DB2 for z/OS; DB2 for Linux, UNIX, Windows; DB2 Data Warehouse Edition, DB2 for iSeries; DB2 for VSE and VM; and DB2 Everyplace.
IDUG Headquarters | 401 N. Michigan Avenue | Chicago, IL 60611- 4267 | T: +1.312.321.6881 | F: +1.312.673.6688 | E: firstname.lastname@example.org | W: www.idug.org
In DB2 V6 and V7, some DB2 users delivered in DB2 9 for z/OS. You could REBUILD INDEX or REORG can be
had issues with storage shortages. possibly see even more benefit than used to remove the pending state. In
Increased business needs translate to with tablespace compression. addition, if the index is using version-
more resources necessary for DB2 to ing, altering that index to use com-
accomplish what customers have grown Index Compression pression will place the index in
to expect. The DBM1 address space Taking liberties with the Monty REBUILD-pending status.
took more and more storage to accom- Python catch phrase, “And Now for With the addition of index compres-
plish its expected daily activities. This Something Completely Different…”: sion, a new column, COMPRESS, has
eventually resulted in storage shortage index compression. Here it fits so well. been added to the DB2 catalog table
issues. One of the many tuning knobs Although DB2 adds index compression SYSIBM.SYSINDEXES. It’s used to
used to address this problem was reduc- in DB2 9, the term compression is where identify if compression is turned on, has
ing the usage of compression and there- similarities to data compression quickly a value of Y (yes), or off, a value of N
fore reducing the number of dictionaries end: (no) for an index.
that needed to be loaded into the DBM1 If you’re doing warehousing, and for
address space. Each compression dic- • There’s no hardware assist some types of OLTP, it’s quite possible
tionary can take up 64K of storage— • It doesn’t use the Lempel-Ziv algorithm they could use as much, if not more,
storage that many pools compete for. • No dictionary is used or created. disk space for indexes than for the data.
The entire compression dictionary is With that in mind, index compression
loaded into memory when a tablespace An index is compressed only on can make a huge difference when it
(or tablespace partition) is opened. disk; it isn’t compressed in the buffer comes to saving disk space.
Trimming the number of objects defined pools or on the logs. DB2 uses prefix
with COMPRESS YES could yield sig- compression, similar to VSAM, because The LOAD and REORG Utilities
nificant savings. Many shops were sim- the index keys are ordered and it com- The critical part of data compression
ply turning on compression for presses both index keys and RIDSs. is building the dictionary. The better
everything, with little analysis of the However, only leaf pages are com- the dictionary reflects your data, the
possible benefits, if any, they might gain pressed, and compression takes place at higher your compression rates will be.
vs. the cost of keeping all those diction- the page level, not the row level, as Assuming a tablespace has been created
aries in memory. index pages are moved in and out of the or altered to COMPRESS YES, you have
Enter DB2 V8 and its 64-bit archi- buffer pools. A compressed index page two choices for building your diction-
tecture. In V8, the compression diction- also is always 4K on disk. When brought ary: either the LOAD or REORG utili-
aries are loaded above the 2GB bar. You into the buffer pool, it’s expanded to 8K, ties. These two utilities are the only
still must be cognizant of the amount of 16K or 32K, so the index must be mechanism available to you to create a
storage you’re using to ensure you don’t defined to the appropriate buffer pool. dictionary for DB2’s data compression.
use more virtual storage than you have The effectiveness of index compression The LOAD utility uses the first “x”
real storage. Keep that virtual backed by is higher, depending on the buffer pool number of rows to create the dictionary.
100 percent real at all times with DB2. size you choose. However, there are no rows compressed
But dictionaries are one area that should When an index page is moved into a while the LOAD utility is actually build-
no longer be of major concern. If it was buffer pool, the page is expanded; when ing the dictionary. Once the dictionary
necessary for you to turn off compres- moved to disk, the page is compressed. is created, the remaining rows being
sion or you just weren’t considering Because of this, CPU overhead is sensi- loaded will be considered for compres-
using compression in V7 because you tive to buffer pool hit ratio. You want to sion. With the dictionary in place, any
had storage concerns, with DB2 V8, make sure your pools are large enough rows inserted (SQL INSERT) also will
even while in Compatibility Mode so the index pages remain in the buffer be compressed, assuming the com-
(CM), you can start to reverse some of pool to avoid expansion and compres- pressed row is shorter than the original
those decisions. You can realize signifi- sion of pages. You also should be aware uncompressed row.
cant disk savings, and possibly even a that an increase in the number of index The REORG utility is usually the bet-
bit of a performance boost, through the levels is possible with compression com- ter choice for creating the dictionary.
use of DB2’s data compression. So it’s pared to an uncompressed index using REORG sees all the data rows because
time to start reviewing all those the same page size. the dictionary is built during its
tablespaces not using compression and One of the nice features of index UNLOAD phase. It has the potential to
consider turning them back on in cases compression is its lack of a dictionary. create a more accurate, efficient diction-
where you can determine there’s a ben- With no dictionary, there’s no need to ary than the LOAD utility. REORG also
efit. However, even with the virtual run the REORG or LOAD utilities will compress all the rows in the table-
storage that comes with 64-bit process- prior to actually compressing your space during the RELOAD phase because
ing, you still need to apply common index data. When compression is the dictionary is now available. In theory,
sense. If, for example, you suddenly turned on for an index, key and RID the more information used to create the
start investing in a bunch of tablespaces compression begins immediately. dictionary, the better your compression.
with 4,096 partitions, turning compres- However, if you alter compression off If you have a choice, pick the REORG
sion on for every partition may not be for an index (ALTER COMPRESS NO) utility to create the dictionary.
in your best interest. after already having used index com- Creating a dictionary has the poten-
If you think this is all pretty cool pression, that index will be placed in tial to be a CPU expensive operation. If
when used with a tablespace, then you’re REBUILD-pending (RBDP) or pageset you’re satisfied with your compression
going to be thrilled with the next part of REBUILD-pending (PSRBD) state dictionary, why repeatedly pay that
this discussion: index compression depending on the index type. expense? Both REORG and LOAD
5 • z/Journal • February/March 008
REPLACE use the utility keyword The DSN1COMP Utility For a tablespace in message
KEEPDICTIONARY to suppress build- How do you know if compression DSN1940I, you’re given statistics with
ing a new dictionary. This will avoid the will save you anything? Making a com- and without compression, and the per-
cost of a dictionary rebuild, a task that pression decision for an object in DB2 is centage you should expect to save in
could increase both your elapsed time anything but hit or miss. DB2 includes a kilobytes. It gives you the number of
and CPU time for the REORG process. mechanism to help you estimate what rows scanned to build the dictionary
This all seems straightforward to this your disk savings could be. This stand- and the number of rows processed to
point. alone utility, DSN1COMP, will tell you deliver the statistics in the report. In
However, your upgrade to DB2 9 for what your expected savings could be addition, it lists the average row length
z/OS will add a slight twist to the above should you choose to take advantage of before and after compression, the size of
discussion. With DB2 9, the first access data or index compression. You can run the dictionary in pages, the size of the
to a tablespace by REORG or LOAD this utility against a tablespace or index tablespace in pages before and after
REPLACE changes the row format from space underlying VSAM data set, the compression, and the percentage of
Basic Row Format (BRF), the pre-DB2 9 output from a full image copy, or the pages that would have been saved.
row format, to Reordered Row Format output from DSNCOPY. You can’t run If DSN1COMP is run against an
(RRF). (See “Structure and Format DSN1COMP against LOB tablespaces, index, it reports on the number of leaf
Enhancements in DB2 9 for z/OS” in the catalog (DSNDB06), the directory pages scanned, the number of keys and
the August/September 2007 issue of (DSNDB01), or workfiles (i.e., RIDs processed, how many kilobytes of
z/Journal for more details on RRF.) This DSNDB07). Using DSN1COMP with key data were processed, and the num-
row format change does assume you’re image copies and DSN1COPY outputs ber of kilobytes of compressed keys
in DB2 9 New Function Mode (NFM) can make gathering information about produced. The report that comes out of
and none of the tables in the tablespace potential compression savings com- DSN1COMP for any index provides the
are defined with an EDITPROC or pletely unobtrusive. possible percent reduction and buffer
VALIDPROC. When choosing a VSAM LDS to run pool space usage for both 8K and 16K
The change to RRF is good news for against, be careful if you’re using online index leaf page sizes. This will help con-
the most part. When using variable REORG (REORG SHRLEVEL siderably when trying to determine the
length data, you will more than likely CHANGE). Online REORG flips correct leaf page size.
end up with a more efficient compres- between the I0001 and J0001 for the
sion dictionary when you rebuild the fifth qualifier of the VSAM data sets. Conclusion
dictionary after converting to RRF. A You can query the IPREFIX column in Whether you’re interested in com-
potential problem though, is that many SYSTABLEPART or SYSINDEXPART pressing your data, indexes, or both,
shops have KEEPDICTIONARY speci- catalog tables to find out which qualifier compression can provide a wealth of
fied in their existing REORG and LOAD is in use. benefits, including saving you tons of
jobs and the dictionary won’t be rebuilt. You also should take the time to use disk space and possibly even improving
The IBM development lab doesn’t want the correct execution time parameters your performance. With DB2 Version 8
to force everyone to change all their job so the results you get are usable. Things (and above), there are a few reasons not
streams for just one execution of REORG s u c h a s PA G E S I Z E , D S S I Z E , to take advantage, so now is a good time
or LOAD just to rebuild the dictionary. FREEPAGE, and PCTFREE should be to put your compression plan together.
Their solution: APAR PK41156 changes set exactly as the object you’re running As you do, take time to ensure that
REORG and LOAD REPLACE so they DSN1COMP against to ensure that its compression will actually accomplish
ignore KEEPDICTIONARY for that estimates are accurate. If you plan to the goal you have in mind. Never man-
one-time run when the rows are reor- build your dictionaries using the date it or simply reject it without proper
dered and allows for a rebuild of the REORG utility as recommended, you’ll analysis. Z
dictionary regardless of the also want to specify REORG at run-
KEEPDICTIONARY setting. time. If you don’t, DSN1COMP assumes For More Information:
What if you really don’t want to do the dictionary will be built by the LAOD • IBM Redbook, DB2 for OS/390 and Data Compression
a rebuild of the dictionary right now, utility. If you’re using a full image copy (SG24-5261)
regardless of what DB2 might want to as input to DSN1COMP, make sure • z/Architecture Principles of Operation (SA22-7832),
do? How do you get around this you’ve specified the FULLCOPY key- particularly for insight on the hardware instruction
APAR’s change? Well, the APAR also word to obtain the correct results. CMPSC
introduces a new keyword for REORG When running DSN1COMP against • RedPaper, “Index Compression With DB2 9 for z/OS”
and LOAD REPLACE that gives you a an index, you can specify the number of (REDP-4345).
work-around and still doesn’t require leaf pages that should be scanned using
you to change your jobs if you simply the LEAFLIM keyword. If this keyword About the Author
want DB2 to rebuild the dictionary. is omitted, the entire index will be In the past 29 years, wiLLiE FaVEro has been a
The ne w ke y word is HONOR_ scanned. Specifying LEAFLIM could customer, worked for IBM, worked for a vendor,
KEEPDICTIONARY and it defaults to limit how long it might take DSN1COMP and is now an IBM employee again. He has always
NO. So, if you don’t specify this key- to complete. worked with databases and has more than 20 years
of DB2 experience. A well-known, frequent speaker at
word, your dictionary will be rebuilt. So, how does DSN1COMP help you conferences and author of numerous articles on DB2,
However, if you do want to “honor” determine if compression is right for you? he’s currently with North American Lab Services for DB2
your current dictionary, you can add When it completes, it gives a short report, for z/OS, part of IBM’s Software Group.
this keyword to your REORG or accounting for all the above information, Email: email@example.com
LOAD REPLACE job and things will which details what might have happened Website: www.ibm.com/software/data/db2/support/
behave as they did in the past. if compression had been turned on. db2zos/
z/Journal • February/March 008 • 55