Data Warehousing With DB2 for z/OS . . . Again!


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Data Warehousing With DB2 for z/OS . . . Again!

  1. 1. Data Warehousing With DB2 for z/OS . . . Again! By Willie Favero D ecision support has always When DB2 became generally avail- available to just a few, it’s being lever- been in DB2’s genetic make- able in 1985, it was initially difficult to aged by larger numbers of customers, up; it’s just been a bit reces- convince customers to use it for Online including their customers. The focus is sive for a while. It’s been Transaction Processing (OLTP). At that getting the information to the correct evolving over time, so suggesting DB2 point, many customers thought DB2’s person when that person needs it. for z/OS for your database warehousing only purpose was decision support and Information must arrive rapidly and shouldn’t be a surprise. Today, DB2 for information centers. Decision support accurately. It’s starting to sound a lot z/OS is a real warehouse contender and prompted talk about warehousing and like OLTP, isn’t it? What better DBMS you should be giving it some serious Business Intelligence (BI). It took a few than DB2 for z/OS to satisfy these mod- consideration—again. Recall the origi- years for DB2 to evolve into the transac- ern challenges. nal DB2 announcement materials from tional support database manager it is Before evaluating DB2’s usefulness June 7, 1983, which said: “DB2 offers a today. DB2 is more ready for your data for warehousing, let’s establish some powerful new data base management warehouse today than ever before. terms. Webster’s New Millennium system for use where business and appli- DB2’s recently renewed presence in Dictionary defines a data warehouse as cation requirements and data structures the warehouse world isn’t coincidental, “a computer system that collects, stores, are subject to frequent changes. … As nor is it just DB2 trying to re-establish and manages large amounts of data for a such, DB2 may be used to address deci- itself. Warehousing has been changing. company, to be used for business analy- sion support systems as well as tradi- Instead of determining what happened sis and strategy.” DB2 for z/OS definitely tional application areas.” in the past, customers want to use all satisfies this definition. That same dic- Decision Support Systems (DSS) are that information to make immediate tionary defines BI as “a broad range of still actively used today; they contribut- decisions today. Instead of making applications and technologies for gath- ed to the need for a data warehouse. information kept in the warehouse ering, storing, analyzing, and providing   •  z/Journal  • June/July 008
  2. 2. access to data to help make business • DB2 V4 also introduced data sharing, archive log space needed. Compression decisions.” DB2 helps with BI, but needs which can be an asset in a warehouse also can improve your buffer pool hit additional application support for this environment. It allows access to the ratios. With more rows in a single page area and gets it from Cognos (an IBM operational data by the warehouse and after compression, fewer pages need to company), IBM DataQuant, and DB2 analytics yet still lets you separate be brought into the buffer pool to satisfy Alphablox. those applications into their own DB2, a query’s getpage request. One of the Your data warehouse is how you reducing the chances of the warehouse additional advantages of DB2’s hard- store the data; BI is how the data is ana- activity impacting operational trans- ware compression is the hard speed. As lyzed. Both are integral parts of a com- actions. hardware processor speeds increase, so prehensive solution. BI applications • DB2 V5 introduced sysplex query par- does the speed of the compression built serve no purpose if they have nothing allelism, which runs within a parallel into the hardware’s chipset. to analyze and a data warehouse is of sysplex and requires DB2’s data shar- When implementing a data ware- little use without the tools to interrogate ing. A query is run across multiple house, the size is often considered prob- the information it contains. CPs on multiple Central Electronic lematic, regardless of platform. DB2’s Finally, this article assumes that the Complexes (CECs) in the parallel sys- hardware compression can help address terms data mart, Operational Data Store plex. Although there’s some additional that concern by reducing the amount of (ODS), and DSS are all similar to the CPU used for setup when DB2 first disk needed to fulfill your data ware- data warehouse concept and are some- decides to run a query in parallelism, house storage requirements. times used in conjunction with, and there’s a correlation between the sometimes in place of, the term data degree of parallelism achieved and the Parallelism: One method of reducing warehouse. elapsed time reduction. There also are the elapsed time of a long-running DSNZPARMs and bind parameters query is to split that query across more DB2’s Effect on your Warehouse that need to be set before parallelism than one processor. This is what DB2’s Enhancements in DB2 Version 8 can be used. parallelism does. Parallelism allows a support data warehousing and associat- query to run across two or more CPs. A ed analytics. DB2 9 for z/OS could With those as highlights in DB2’s query is broken into multiple parts, improve your warehouse experience evolution, let’s review in greater detail with each part running under its own even more. The latest version continues two key concepts, compression and par- Service Request Block (SRB), and per- a tradition of progress that’s worth allelism: forming its own I/O. Although there’s briefly reviewing. some additional CPU used when DB2 DSS support has been improved in Compression: DB2’s compression is first decides to take advantage of query some way with every release of DB2. specified at the tablespace level, is based parallelism for its setup, there’s a close Here are some highlights: on the Lempel-Ziv lossless compression correlation between the degree of paral- algorithm, uses a dictionary, and is lelism achieved and the query’s elapsed • DB2 V2.1 introduced Resource Limit assisted by the System z hardware. time reduction. There also are Facility (RLF), which lets you con- Compressed data also is carried through DSNZPARMs and bind parameters that trol the amount of CPU that a into the buffer pools. This means com- need to be set before parallelism can be resource, in this case a query, can pression could have a positive effect on used. actually use. This is for dynamic reducing the amount of logging you do SQL that will probably make up because the compressed information is Star Schema: There’s a specialized case most of the warehouse SQL work- carried into the logs. This will reduce of parallelism called a star schema—a load. This can be critical in control- your active log size and the amount of relational database’s way of representing ling system resources, and RLF also can help you control the degree of parallelism obtained by a query. • DB2 V3 delivered compression, which still has a major, immediate effect on data warehousing. Enabling compres- sion for tablespaces can yield a signifi- cant disk savings. In testing, numbers as high as 80 percent have been observed, though your mileage will definitely vary. • DB2 V3 introduced I/O parallelism. With this first flavor, multiple I/O could be started in parallel to satisfy a request. • DB2 V4 introduced CP parallelism, which allowed a query to run across two or more CPs. A query could be broken into multiple parts and each part could run against its own Service Request Block (SRB), performing its own I/O. Figure 1: a star schema   •  z/Journal  • June/July 008
  3. 3. multi-dimensional data, which is often by DB2 optimizer huge performance asset to a data ware- popular with data warehousing applica- • Fast cached SQL invalidation house. Having the ability to read an tions. A star schema is usually a large • Automatic space management index backward lets you avoid building fact table with lots of smaller dimension • Statements IDs of cached statements both an ascending index and a descend- tables (see Figure 1). For example, you as input to EXPLAIN ing index structure. Reducing the num- might have a fact table for sales infor- • Long-running, non-committing read- ber of indexes improves the cost of mation. This sales table would hold er alerts doing inserts and deletes. Every insert most of your data. The dimension tables • Check In (CI) size larger than 4KB or delete must update every index on could represent products that were sold, • Multi-row INSERT/FETCH the table being changed. Backward the stores where those products were • REOPT(ONCE) to reduce host vari- index scan also can reduce the cost of sold, the date the sale occurred, any able’s impact on access paths doing updates if the update occurs promotional data associated with the • Index-only access for VARCHAR col- against a column participating in the sale, and the employee responsible for umns index. In addition, backward index scan the sale. Using star joins in DB2 requires • Backward index scan reduces the amount of disk storage enabling the feature through a • Distributed Data Facility (DDF) per- required to build all those indexes. With DSNZPARM keyword. You also should formance enhancements backward index scan, you will need to check a few other ZPARMs before using • Up to 4,096 partitions use only one index; you previously star joins because they can affect a star • Longer table and column names needed two. join’s performance. • SQL statements up to 2MB • Sparse index for star join Multi-row FETCH and INSERT: This Data Warehousing and Today’s DB2 • More tables in join enhancement lets you read or insert Many enhancements in DB2 V8 • Common table expressions multiple rows with a single SQL state- could directly impact a warehouse • Recursive SQL ment via an array. This reduces the implementation; here are the ones that • Indexable unlike types CPU cost of running a FETCH or will have a positive effect on your data • Materialized Query Table (MQT). INSERT. This feature is completely warehouse and application analytics: usable in distributed applications pro- Let’s look at some of these in more cessing using Open Database • Clustering decoupled from partition- detail: Connectivity (ODBC) with arrays and ing dynamic SQL. This offers significant • Indexes created as deferred are ignored Backward index scan: Indexes are a performance advantages; it could DINO-XTINCT_Mamoth_zJournal half1 1 5/11/2008 2:06:07 PM z/Journal  •  June/July 008  •  5
  4. 4. increase the performance of FETCH • Simulating indexes in EXPLAIN processing by 50 percent and INSERT processing by 20 percent. In fact, in DB2 is (Optimization Service Center) • More autonomic buffer pools tuning some customer testing, this feature aver- for Workload Manager (WLM) syner- aged 76 percent improvement for gy FETCH and 20 percent improvement for INSERT. Improving INSERT perfor- more • Resource Limit Facility (RLF) support for end-user correlation mance and reducing INSERT CPU con- • RANK, DENSE_RANK, and ROW_ sumption could be a significant help to NUMBER your Extract, Transform, Load (ETL) processing. ready for • EXCEPT, and INTERSECT • pureXML. Indexable unlike types: Mismatched Let’s look at some of these in detail: data types can now be stage 1 and indexable. This is huge for those appli- cations that don’t support all data types your data Universal tablespace: This is at the top of my DB2 enhancements and data available in DB2 for z/OS. warehousing lists. Consider the some- times unpredictable but expected Sparse index in memory work files: For star join processing, sparse indexes can use many work files. DB2 V8 will warehouse growth of a warehouse and the high possibility that many tables could be frequently refreshed. A universal attempt to put these work files in mem- tablespace is a cross between a parti- ory. This can result in a significant per- tioned tablespace and a segmented formance improvement for warehouse queries using star joins. today tablespace, giving you many of both of its parents’ best features. When using a universal tablespace, you get the size More partitions (4,096) and automatic and growth of partitioning while retain- space management: Growth is one of those inherent qualities of a warehouse, something you just expect to happen. than ever ing the space management, mass delete performance, and insert performance of a segmented tablespace. It’s kind of like With 4,096 partitions, a warehouse having a segmented tablespace that can could grow to 16TB for a 4KB page size; grow to 128TB of data, assuming the 128TB for 32KB page. This is for just one table. DB2 V8 also gives you auto- before. right DSSIZE and right number of par- titions are specified, and that also gives matic space management, the ability to you partition independence. let DB2 manage your primary and sec- ondary space allocations. With Index compression: Your first line of DSNZPARM MGEXTSZ activated, DB2 defense against a warehouse perfor- will manage the allocation of a • Fast delete of all the rows in a parti- mance problem, after a well-written tablespace’s extents, ensuring that the tion (TRUNCATE) query, is creating indexes, lots of index- tablespace can grow to its maximum • Deleting first n rows es. With warehousing, and for some size without running out of extents. • Skipping uncommitted inserted/ types of OLTP, it’s possible you could updated qualifying rows use as much, if not more, disk space for MQTs: Aggregate data can be important • Index on expression indexes than for the data. DB2 9 for for warehousing. Summary information • Dynamic index ANDing z/OS index compression can make a can be stored in an MQT and in V8, • Reduce temporary tables materializa- huge difference when it comes to saving and the optimizer can make the deci- tion disk space. The implementation of index sion to use the MQT in place of a query • Generalizing sparse index/in-memory compression is nothing like data com- performing the summarization inline. data caching pression. It doesn’t use a dictionary and This can significantly improve the per- • Clustering decoupled from partition- there’s no hardware assist. However, the formance of a warehouse query doing ing lack of a dictionary could be a plus. aggregations. • Indexes created as deferred are ignored With no dictionary, there’s no need to by DB2 optimizer run the REORG or LOAD utilities Your data warehouse and application • Fast cached SQL invalidation before compressing your index data. analytics also will benefit from the other • Statements IDs of cached statements When compression is turned on for an V8 changes previously listed but not as input to EXPLAIN index, key and Record Identifier (RID) detailed here. • Universal tablespaces compression immediately begins. DB2 9 for z/OS delivers more • Partition-by-growth as a means to changes that will directly impact your remove non-partitioned tablespace SQL: Two new SQL statements, MERGE warehouse and application analytics, size limit and TRUNCATE, can be highly effec- including: • Implicit objects creation tive when used with a data warehouse. • Clone tables Merge, sometimes called “upsert,” lets • New row internal structure for faster • MERGE statement you change data without needing to VARCHAR processing • Identifying unused indexes know if the row already exists. With   •  z/Journal  • June/July 008
  5. 5. MERGE, if it finds an existing row, it anyone using DB2 for z/OS. originated on Linux on System z uses a updates it. If it doesn’t, then it performs hipersockets Distributed Relational an INSERT. No more doing a SELECT Data Archiving Database Architecture (DRDA) proto- first to see if the row exists or doing an Size is always a major issue for a data col and therefore uses enclaves. This INSERT or UPDATE and checking to warehouse. Warehouses can range in means that up to approximately 50 see if it failed. TRUNCATE is an easy size from a few hundred gigabytes (GB) percent of this analytic work coming way to remove all the rows from a table to dozens of terabytes (TB). from Linux on System z is zIIP eligible. with a single SQL statement. It’s espe- We’ve already discussed how DB2 (zIIPs were described in detail in the cially handy if you’re using DELETE can help with your data warehouse’s article “zIIPing Along,” which appeared triggers. Also, there’s now an APPEND growth by taking advantage of universal in the August/September 2006 issue of option on the CREATE/ALTER table tablespace’s partition-by-growth feature z/Journal.) that tells DB2 to ignore clustering dur- in DB2 9. You also should consider the ing INSERT processing. This should need for timely data in your warehouse. Conclusion improve INSERT performance by elimi- How current does it have to be? If you Will DB2 for z/OS always be the best nating the need to figure out where the need only the most current information choice for your data warehouse? row should go and just placing it at the for the last six months for your analysis, Probably not. As with all things, there’s end of the table. but you’d like to have the last two years always a best way to do things; some- available for the occasional special pro- times it’s DB2 for z/OS and sometimes Clone tables: Tables can often be com- cessing, don’t keep it all in the active it’s DB2 LUW. There are many factors to pletely replaced on a weekly, even daily, portion of your warehouse. examine when making a warehouse basis in a warehouse. Replacing a table Consider using some form of platform decision, including placement can cause an outage, even if that outage archiving to remove the less frequently of the data feeds and available support seems short. Clone table support in used data. You could use a secondary expertise. The next time a platform con- DB2 9 gives you an easy way to create a set of archive tables that you move the versation comes up, remember that: replacement table while still accessing data into and have the option to include the original table and using a command in your queries only when necessary. • 25 of the top-25 worldwide banks are to switch which table SQL actually You could use something as simple as running on DB2 for z/OS. accesses. The concept is a lot like an partitioning, keeping the more current • 23 of the top-25 U.S. retailers are run- online LOAD REPLACE, something data in the currently accessed partitions ning on DB2 for z/OS. unavailable in DB2. with hope that the SQL will select only • Nine of the top-10 global life or health the partitions with more current data. insurance providers are running on RANK: DB2 9 gives you a little bit of Of course, you always have the option DB2 for z/OS. Online Analytical Processing (OLAP) to use a tool that does archiving for you. functionality with RANK, DENSE_ You’ll want something that makes your Many companies seemingly know RANK, and ROW_NUMBER. When data available for access yet isn’t in the that if your business depends on your used in an SQL statement, they’ll return physical data sets. By removing some of database, DB2 for z/OS is a good choice. the ranking and row number as a scalar the information not needed daily, you’ll Then there’s the mainframe that DB2 value. Rank is the ordinal value of a row reduce the size of the warehouse. This runs on. The numbers there are equally in a defined set of rows. You can specify will make the warehouse more manage- impressive: that the result be returned with (RANK) able and could affect performance of and without (DENSE_RANK) gaps. queries that need to scan larger portions • 95 percent of the U.S. Fortune 500 ROW_NUMBER is a sequential row of the warehouse. companies use System z. number assigned a result row. • 45 percent of the U.S. Fortune 1000 linux on System z companies use System z. Index on expression: Sometimes, a Rather than run your analytics on • 71 percent of global Fortune 500 com- warehouse query requires changing a Unix or Windows, consider running panies use System z. column in the WHERE clause via a them under Linux on System z running function or possibly a calculation; some on an Integrated Facility for Linux (IFL). Moreover, 80 percent of the world’s kind of predicate that would preclude Better yet, run them on multiple Linux corporate data resides on or originates an index access for that column or make systems on System z running under on a mainframe. If DB2 for z/OS, that predicate stage 2—affecting your z/VM on that IFL. You get the security combined with System z, is good query’s performance. With index on of running completely within your enough to handle your business, it’s expression, you can build the index on System z box and still get to take advan- good enough to handle your data that expression, and because of the tage of your IBM System z9 Integrated warehouse … again! Z match, get an index access. This could Information Processor (zIIP) specialty be a real performance boost for some engines. warehouse and analytic queries. You can use hipersockets when About the Author willie favero is an IBM senior certified IT software There are many more new features your Linux application wants to get to specialist and the DB2 SME with IBM’s Silicon Valley Lab delivered in DB2 Version 8 and DB2 9 DB2 for z/OS in another Logical Data Warehouse on System z Swat Team. He has more that could significantly enhance a ware- Partition (LPAR). Hipersockets alone than 30 years of experience working with databases house application. Though the func- yield a performance advantage over with more than 24 years working with DB2. He speaks tions and features just discussed can be typical network solutions, but that dis- at major conferences, user groups, publishes articles, especially handy for a warehouse and its cussion is for another time. and has one of the top technical blogs on the Internet. application analytics, they do benefit Hipersockets use TCP/IP; DB2 work Email: z/Journal  •  June/July 008  •  7