IBM Software Group Presentation Template


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

IBM Software Group Presentation Template

  1. 1. ® Information Management Data Warehousing with DB2 for z/OS Paul Wirth IBM Software Group © 2005 IBM Corporation
  2. 2. Information Management Agenda  Refocus on Business Intelligence and Data Warehousing  When DB2 for z/OS?  Reference Architecture and Components  Best Practices / Benchmarking 2
  3. 3. Information Management The terminology as used in this presentation  Business Intelligence (BI) and Data Warehousing (DWH) are sometimes used interchangeably BI includes end user tools for query, reporting, analysis, dashboarding etc. Both concepts depend on each other  BI almost always assumes a Data Warehouse (DWH), Operational Data Store (ODS) or Data Mart (DM) exists with timely, trusted information – An ODS is a subject oriented database organized by business area. – up to date (vs. historical) – detailed (vs. summarized). – A DM is a database designed to support the analysis of a particular business subject area. – data has usually been transformed and aggregated from the source DWH or operational system – could be relational, multidimensional (OLAP) or statistical in nature.  A DWH depends on end user tools that turn data into information.  Both terms (DWH and BI) address desire for timely, accurate, available data delivered when, where and how the end users want it. 3
  4. 4. Information Management Traditional Topology Operational systems Extract, ETL transform, load metadata Operational data store metadata Enterprise Data Warehouse Dependent Independent Data Marts Data Mart Line of Business Data Marts 4
  5. 5. Information Management DB2 for z/OS: “Back into the ring”  July 2006 Announcement  206-181/ENUS206-181.PDF  Announces availability of products to support the mission  Time is right, shift in customer requirements 1985  “Real-Time” data access  Service Level Agreements to match Operational Executive < 50  Operational BI and Embedded Analytics  Deliver BI to customer facing humans and Manager applications (broad audience)  Integrate BI Components with Operational Systems & Information Portals Analyst  Massive number of queries  Queries semi-aggregated data, i.e. data Customer Service aggregated at a low level Customers > 10,000 2007 5
  6. 6. Information Management Operational Business Intelligence and Embedded Content – Customer Service View Operational Content (red) 6
  7. 7. Information Management Operational Business Intelligence and Embedded Content – a Customer Service View Embedded BI Object (blue) 7
  8. 8. Information Management Operational Business Intelligence and Embedded Content – a Customer Self Service View Operational Content alone delivers just data to information consumers. Operational Content (red) 8
  9. 9. Information Management Operational Business Intelligence and Embedded Content – a Customer Self Service View Business Intelligence content puts the data into perspective. Embedded BI Object (blue) 9
  10. 10. Information Management Dynamic Warehousing A New Approach to Leveraging Information Information On Demand to Optimize Real-Time Processes Dynamic Warehousing OLAP & Data Mining to Understand Why and Recommend Future Action Traditional Data Warehousing Query & Reporting to Understand What Happened 10
  11. 11. Information Management DB2 for z/OS features that support DWHing  64-Bit Addressability  Partition by growth  2000 byte index keys  Index Compression  MQT’s  Dynamic index ANDing  Multi-Row Operations  New row internal structure for faster VARCHAR processing  225 way table joins  Fast delete of all the rows in a  In-Memory Workfiles partition  Automatic Space Allocation  Deleting first n rows  Non-Uniform Distribution Statistics on  Skipping uncommitted Non-Indexed Columns inserted/updated qualifying rows  Parallel Sorting  Etc.  Data Partitioned Secondary Indexes  2MB SQL Statements A whitepaper can be downloaded from:  Etc. library/en_US/detail/A016040Z53841K98.html 11
  12. 12. Information Management When should I consider… …a Data Warehouse on DB2 for z/OS (zDWH)?  Data requires highest levels of availability, resiliency, security, and recovery  Need true real-time Operational Data Store (ODS)  Operational data is on System z  ODS must virtually be in sync with the operational data  Embedded Analytics and Operational Business Intelligence  OLTP data on System z  Keep existing data marts/warehouses on System z  Consolidate data marts/warehouse on System z  Implement an enterprise data warehouse (EDW)  SAP Business Warehouse when SAP R/3 is on System z  Want to leverage & optimize existing System z skills and investment to service the mixed workload environment 12
  13. 13. Information Management DWH Solution Architecture using DB2 for z/OS Member A Member B DB2 Data Sharing OLTP OLTP Group DB2 Data Sharing Member C CEC One CF Member D CEC Two Group DWH DWH CEC One CF CEC Two Within a data sharing environment, the data warehouse can reside in the same group as the transactional data. 13
  14. 14. Information Management DWH Solution Architecture using DB2 for z/OS Transactional Applications Centralized and Consolidated Transactional and Warehouse data in OLTP Location OLTPSubsy stem OLTPSubsy stem OLTPSubsy stem one system Alias MEMBER A MEMBER B MEMBER C All members see all data, but each member is optimized for a particular WLM / workload Resources I RD Location aliases for transaction routing Shared Resources managed by OLAP Location OLAPSubsy stem OLAPSubsy stem Workload Manager (WLM) and the Alias MEMBER D MEMBER E Intelligent Resource Director (IRD) Single subsystem option for non-data sharing environments Analy tical Applications 14
  15. 15. Information Management Initial load of the Data Warehouse Member A Member B ETL is done, by using an ETL Accelerator which DataStage Parallel Engine Extract OLTP OLTP in it’s current implementation is represented by DataStage (IBM Information Server) on a Member C Member D secondary system. Until end of 2007, pSeries and Load DWH DWH xSeries are the only supported systems. ETL Accelerator Later on the ETL Accelerator will move to zLinux CEC One CF CEC Two as soon as the Information Server is available pSeries xSeries there. zLinux Extract JDBC/ODBC Extract z/OS JDBC/ODBC Legacy data sources are Classic Federation ServerDistributed data sources The data is extracted from the OLTP informationDistributed data -sharing integratedof the Data Sources through the Classic are directly integrated - Integration group, transformed by DataStage and then loadedServer. This way data Federation into the data through the IBM warehouse tables again. is extracted from DataStage Information server. VSAM, IMS Software AG CA CA directlyDB2 UDBof the data sources out Oracle SQL Server IAM & Adabas Datacom IDMS sequential like IMS, VSAM… for LUW 15
  16. 16. Information Management Access to Data Outside of DB2 for z/OS WebSphereClassicFederation SQL via JDBC / ODBC Integrate data into the DWH Relational Query Processor All data access appears DataServer MetaData Classic Data Architect DataStage ETL Server The ETL s erver does a read access on the legacy data sources via SQL relational just as to any other relational database through the ODBC driver. The data is then stored in staging tables or already aggregated in the Mapping with Classic Data Architect DB2 warehous e database. DB2 Data Connector Connector Connector Connector Warehouse Mapping information from legacy ETL Server access the data via data source to relational model, IMS VS AM AD AB AS IDMS updated through DataServer by Classic Data Architect. ODBC / JDBC 16
  17. 17. Information Management Incremental Updates at Runtime There is m ore than one queue set up WebSphere ClassicData EventPublisher for the dis tribution of the event. The Event Publisher also makes use of an Relational Query Process or SQL via JDBC administrative queue used for synchronization and a recovery queue used of the logger exit failed. Change Capture Agent detects changes in data sources Classic Data DataServer MetaData Architect DataStage is able to read WebSphere DataStage the change events directly from WebSphere MQ and stores the changes within Correlation Service maps the data Publisher MQ the DB2 data warehouse. ETL Server Distribution Service Distribution Service and Publisher DB2 Data Warehouse send the data to the ETL Server via Correlation Service MQ Legacy Data Source Logger Exit Change Capture Agent Updating Application This is any application which is updating the legacy (IMS, VSAM, IDMS, ADABAS...) DataStage reads from MQ and data s ource. stores the changes in the warehouse with low latency Optionally stack up in staging tables for batch window 17
  18. 18. Information Management In Database ETL is triggered by DataStage Member A Member B OLTP OLTP ETL Member C Member D SQL DWH DWH ETL Accelerator pSeries CEC One CEC Two CF xSeries zLinux Simple example: Wherever possible, “In Database” transformations (ELT) are used to spare -- Aggregate by salary datadepartment into AGGRSALARYused SQL is still sent the transport of the by to the accelerator. But the from the ETL Accelerator to the database to have one place of INSERT INTO AGGRSALARY (ETL steps. AVGBAND, AVGSALARY ) documentation for all DEPTCODE, SELECT DEPTCODE, AVG( BAND ) AS AVGBAND, AVG( SALARY ) AS AVGSALARY This can also be used to shift the data up the hierarchy within the Layered FROM STAFF Data Architecutre. GROUP BY DEPTCODE 18
  19. 19. Information Management Information Server Cleanse Extract / Transform Data Transformation and Cleansing All data can be deployed as a QualityStage™ DataStage ® SOA Services and fed to the Information Server 19
  20. 20. Information Management Enterprise Data Warehouse using DB2 for z/OS The Complete Solution Architecture z/OS zLinux/xLinux/AIX.. . Analy tics and BI Query Workload Reporting Data warehouse, Transf ormation andDeploy ment Inf ormation DB2 Data Serv er I nsert/Update Sharing DataStage Extract /Query Group Quality Stage MetaData SOA Serv er WebSphere Classic Federation ChangeCapture Federation and Legacy Legacy DataSource DataSource ODBC WebSphere Classic Data Ev ent WebSphere MQ Publisher 20
  21. 21. Information Management Enterprise Data Warehouse using DB2 for z/OS The Complete Solution Architecture – Zoom Out Transactional Applications OLTP Location OLTPSubsy stem OLTPSubsy stem OLTPSubsy stem Alias MEMBER A MEMBER B MEMBER C WLM / Resources I RD z/OS zLinux/xLinux/AIX... OLAP Location AlphaBlox OLAPSubsy stem OLAPSubsy stem Alias MEMBER D MEMBER E Analy tics and BI Query Workload Reporting Data warehouse, Transf ormation DataQuant andDeploy ment Inf ormation Analy tical Applications DB2 Data Serv er QM F Insert/Update Sharing DataStage Extract/Query Group Quality Stage WebSphereClassicFederation SQL via JD BC / ODBC Rel ational Query Proces s or MetaData Class ic D ata DataStage DataServer MetaD ata Architect ETL Server The ETL s erver does a read acces s SOA Serv er on the legacy data s ources via SQL jus t as to any other relational WebSphere Classic Federation databas e through the ODBC driver. The data is then s tored in s taging tables or already aggregated in the ChangeCapture Federation and DB2 warehous e database. DB2 D ata Connector Connector Connector Connector Warehouse Legacy Legacy DataSource DataSource ODBC Mapping information from legacy data s ource to relational model, IMS VS AM AD AB AS IDMS updated through DataServer by Class ic Data Architect. WebSphere Classic Data Ev ent WebSphere MQ Publisher There is more than one queue s et up WebSphere Classic Data EventPublisher for the dis tribution of the event. The Event Publis her also makes use of an SQL via JDBC administrative queue us ed for Transform Relational Query Proces s or Cleanse s ynchronization and a recovery queue used of the logger exit failed. Classic Data DataServer MetaData Architect DataStage is able to read the change events directly from WebSphere MQ and WebSphere stores the changes within DataStage Publisher MQ the DB2 data warehouse. ETL Server QualityStage™ DataStage® Dis tribution Service DB2 Data Warehouse Correlation Service Legacy Change Logger Updating This is any application which is updating Data Capture Exit Application the legacy (IMS, VSAM, IDMS, ADABAS...) Source Agent data source. 21
  22. 22. Information Management Alternative Architectures (Reporting & Analytics)  “Pure” System z BI Solution from a Data Perspective  ODS, DWH, DMs in DB2 z/OS  End User Tools (e.g. QMF, DataQuant, Business Objects, Cognos) access DB2 z/OS directly (fat client implementation) or via browser (web server implementation)  Reporting solution may run on distributed WAS, e.g. Alphablox, QMF, DataQuant, Cognos ReportNet, Business Objects Server  “Hybrid” BI Solution from a Data Perspective  ODS & Data Warehouse in DB2 z/OS  Relational, Multidimensional (OLAP) and Statistical Datamarts on System p and/or System x supporting End User Tools, e.g. DB2 DWE, Hyperion Essbase, Cognos PowerPlay 22
  23. 23. Information Management Best Practices Accurate requirements for solution right-fit Demonstration and POC systems Boblingen Lab BI CoC / Teraplex Center Equivalent Papers in progress Work with IBM 23
  24. 24. Information Management Capacity Planning Critical Elements Number of Users Amount of Data Size and Complexity of Query Workload Growth in Maintenance Workload Critical System Resources for a Balanced System CPU Central Storage I/O Channels Controllers (storage directors, cache, non-volatile storage) and disk Parallel Sysplex / Coupling Facility Resources (links, CPU, storage). 24
  25. 25. Information Management DB2 Sizing Tool  Model based on workloads that Query were run in IBM Lab Output Profiles environments Average # Type of Processor  Continually refined of Queries Queries  Classify queries into 5 Trivial Online categories … … CPU%  Provide query category definitions Complex Online zIIP Offload%  Specify % of data “touched” in each query category  Request size of data Complex warehouse AdHoc  Compute Large Systems Perfromance Reference (LSPR) ratios for data warehouse workload  Compute required capacity including zIIP offload percentage  Alternate method - build a prototype and profile your own workload  Consider starting small and growing incrementally (benefit of System z DWH environment) 25
  26. 26. Information Management IBM zIIP leveraged by DWH workloads 1. Business Intelligence applications via DRDA® over a TCP/IP connection 2. Complex Parallel Queries  Star schema parallel queries (available June 30, 2006)  All other parallel queries (available July 31, 2006) 3. DB2 Utilities for Index Maintenance zIIP Specialty Engine 26
  27. 27. Information Management What is Star Schema?  Star schema = a relational database schema for representing multidimensional data  Sometimes graphically represented as a ‘star’ Data stored in a central fact table Surrounded by additional dimension tables holding information on each perspective of the data Example: store "facts" of the sale (units sold, price, ..) with product, time, customer, and store keys in a central fact table. Store full descriptive detail for each keys in surrounding dimension tables. This allows you to avoid redundantly storing this information (such as product description) for each individual transaction  Complex star schema parallel queries include the acts of joining several dimensions of a star schema data set (like promotion vs. product).  If the workload uses DB2 for z/OS V8+ to join star schemas, then portions of that DB2 workload will be eligible to be redirected to the zIIP. 27
  28. 28. Information Management zIIP Exploitation – DB2 Complex Parallel Query Activity Extend beyond the Star Schema… 28
  29. 29. Information Management Focus on Star Schema  Star schema workloads may benefit from two redirected tasks 1. ‘Main’ task = the DRDA request  If the request is coming in via DRDA via TCP/IP it can take advantage of the DRDA use of zIIP, just like any other network attached Enterprise Application. 2. ‘Child’ task = the star schema parallel queries  If the business intelligence and data warehousing application uses star schemas, then a significant amount of this task (star schema) processing is eligible to be redirected to the ziip.  The child (star schema) & main tasks (coming in through DRDA via TCP/IP) are additive.  Combining the child and the main tasks is expected to yield a larger amount of redirect than that of just DRDA via TCP/IP alone.  Longer running queries see higher benefit.  Benefits to a data warehousing application may vary significantly depending on the details of that application. 29
  30. 30. Information Management BI Distributed with parallel complex query Complex star schema parallel queries via DRDA over a TCP/IP connection will have portions of this work directed to the zIIP CP CP zIIP High utilization DB2/DRDA/StSch DB2/DRDA/StSch Portions of DB2/DRDA/StSch eligible DB2 BI DB2/DRDA/StSch enclave Application DB2/DRDA/StSch SRB DB2/DRDA DB2/DRDA/StSch workload TCP/IP DB2/DRDA Reduced utilization DB2/DRDA/StSch DB2/DRDA/StSch executed on (via Network or HiperSockets ™) DB2/DRDA DB2/DRDA DB2/DRDA/StSch zIIP DB2/DRDA DB2/DRDA DB2/DRDA/StSch DB2/DRDA DB2/DRDA DB2/DRDA DB2/Batch DB2/Batch DB2/DRDA For illustrative purposes only. Single application only. Actual workload redirects may vary depending on how long the queries run, how much parallelism is used, and the number of zIIPs and CPs employed 30
  31. 31. Information Management Before zIIP PTFs 1. Query enters system 2. Sliced into Parallel tasks, classified via WLM 3. Each parallel task of the query accumulates Service Units, with the total aggregated across all tasks to determine when to invoke period switch. Query WLM definition: requires 50,000 SUs to complete Period 1 - Importance 2, 1000 SUs Period 2 - Importance 4, 4000 SUs Period 3 - Importance 5, “the rest” WLM/DB2 GCP GCP GCP GCP GCP Assuming even distribution/usage across all 5 CPs: •After 200 SUs on each CP (total 1000 for query), move tasks to Period 2 •After 800 more SUs on each CP (overall total of 5000 for query), move tasks to Period 3 31
  32. 32. Information Management After PTFs – zIIP redirect execution 1. Query enters system 2. Sliced into Parallel tasks, classified via WLM 3. Each parallel task of the query accumulates Service Units, with period switch determined for each task, not the overall query. Same WLM definition Query Period 1 - Importance 2, 1000 SUs requires 50,000 SUs to complete Period 2 - Importance 4, 4000 SUs Period 3 - Importance 5, “the rest” WLM/DB2 • After a task gets 1000 SUs on CP/zIIP, move it to Period 2 (total of 5000 SUs for overall query) •After 4000 more SUs on CP/zIIP, move the task to Period 3 (accumulated total of 25,000 SUs for GCP GCP GCP zIIP zIIP overall query) If query uses 5 parallel tasks and assume no zIIPonCP time: • 2680 SUs run on CP in Period 1 • 2320 SUs run on zIIP in Period 1 • 4000 SUs run on CP in Period 2 • 16000 SUs run on zIIP in Period 2 • 5000 SUs run on CP in “the rest” • 20000 SUs run on zIIP in “the rest” 32
  33. 33. Information Management zIIP redirect  zIIP processors offer significant hardware and software savings.  Number of zIIP processors can not be more than the number of general processors in a physical server. However, an LPAR can be configured to contain more zIIPs than general processors.  A percentage of parallel task activities are eligible to run on zIIP. Actual offload percentage depends on:  Ratio of parallel to non-parallel work  Thresholds  Available zIIP capacity  RMF and OMEGAMON reports provide projection of offload percentage prior to installation of zIIP processors, but actual offload will probably be slightly lower.  z/OS dispatcher algorithm  Benchmark and internal workloads indicates offload between 50% and 80% with a typical mix of queries 33
  34. 34. Information Management zIIP Experiences 12 10 8 6 4 2 0 2 Gen CPs 2 Gen 2 CPs/1 zIIP CPs/w/Parallel The first bar represents the query processing execution time, without using parallel processing. The second bar represents the same query workload when using parallelism on general processors. The last bar represents the query execution time when leveraging a zIIP engine to complete the processing. 10 The dark bars represent the processing cycles consumed by 8 the query workload on the general 6 zIIP processing engine, and the blue Gen Proc 4 colored bars reflect the processing that was redirected to 2 the available zIIP engine. 0 2 CPs 1CP/1 zIIP 2 CPs/1 zIIP 34
  35. 35. Information Management Speaking of parallelism… … is DB2 for z/OS a “Parallel Database”?  Query I/O parallelism  Manages concurrent I/O requests for a single query, fetching pages into the buffer pool in parallel. This processing can significantly improve the performance of I/O-bound queries.  Query CP parallelism  Enables true multitasking within a query. A large query can be broken into multiple smaller queries. These smaller queries run simultaneously on multiple processors accessing data in parallel reducing the elapsed time for each query. Starting with DB2 V8, the parallel queries exploit zIIPs when they are available on the system thus reducing the costs.  Sysplex query parallelism  To further expand the processing capacity available for processor-intensive queries, DB2 can split a large query across different DB2 members in a data sharing group, known as Sysplex query parallelism. 35
  36. 36. Information Management DB2 for z/OS Parallelism – Another Graphic V3 V4 I/O Parallelism CP Parallelism Single Execution Unit Multiple Execution Units Multiple I/O streams Each has a single I/O stream TCB (originating SRBs task) (parallel tasks) Sysplex Query Parallelism V5 Sysplex Query Parallelism Parallel tasks spread across Parallel Sysplex Proce s sor Proce ss or Proce ss or I/O M em ory M em ory l I/O Channe I/O M e m ory Channe l Channel 36
  37. 37. Information Management Parallel Degree Determination DB2 for z/OS has the flexibility to choose the degree of parallelism Number of that qualify I/O CPU Parallel Degree Number of CP CP CP CP CP CP Determination Engine Optimal Degree Processor speed Parallel Task #1 Parallel Task #2 Data skew Parallel Task #3 Degree determination is done by the DBMS -- not the DBA Using statistics and costs of the query to provide the optimal access path with low overhead taking data skew into consideration 37
  38. 38. Information Management Trust with Limits – Set the Max. Degree of Parallelism ... Set the maximum degree between the # of CPs and the # of partitions CPU intensive queries - closer to the # of CPs CP 0 CP 1 CP 2 I/O intensive queries - closer to the # of partitions Data skew can reduce # of degrees 38
  39. 39. Information Management DB2 for z/OS Partitioning (Range)  Partitioning in DB2 for z/OS V8 and beyond is defined at the table level  Maximum of 4096 partitions  DB2 can generate a maximum of 254 parallel operations  Effectively cluster by two dimensions  Partition by Growth – DB2 9 feature that relates (in a way) to hash Secondary Index -- partitioned like the underlying data (DPSI) ed MAR MAR NOV MAY AUG NOV NOV DEC OCT DEC OCT APR APR SEP SEP JUN FEB FEB FEB JUN JAN JAN JUL JUL 405, JUN, MA 204, DEC, MD 102, MAR,MO 403, MAR, FL 406, OCT, NH 304, APR, MS 401, SEP, NY 105, APR, CT 301, NOV, LA 101, FEB, DE 206, FEB, NC 106, JAN, KY 302, JUL, MD 303, MAY,MN 306, JUN, NH 103, DEC, MI 205, JAN, AL 305, SEP, CT 402, NOV, IA 202, AUG, FL 104, NOV, IL 201, JUL, NJ 404, FEB, IN 203, OCT, IA Partitioned table MO MD MN NH MA MS NY NC KY DE CT NJ LA AL FL MI IN IA IL Secondary Index -- non-partitioned (NPSI) ed 39
  40. 40. Information Management Partition by Time  Each partition holds data for a certain period  days, weeks, months, years etc.  Possible data skew due to seasonal factors  Ease of operations  Enable rolling off old data at regular intervals  Back up latest data only  Coexistence of data refresh and queries load into current period Jan Feb March Oct Nov Dec 2007 2007 2007 2007 2007 2007 . . . . . . . . . empty 40
  41. 41. Information Management Partition by Time - Advantages  Queries consume less resources  DB2 uses partition scanning vs scanning entire table  Consistent query response times over time  Adding history to database does not affect query ET  Potentially smaller degrees of parallelism  Data Rolling – Alter Table Rotate Partition First to Last 2003 2004 2005 2006 2007 ta da e chiv ar After Data Rolling 2004 2005 2006 2007 2008 empty 41
  42. 42. Information Management Data Compression  DB2 compression should be strongly considered in a data warehousing environment.  Savings is generally 50% or more.  Data warehousing queries are dominated by sequential prefetch accesses, which benefit from DB2 compression.  Newer generation of z processors implement instructions directly in circuits, yielding low single digits of overhead. Index access only - no overhead.  More tables can be pinned in buffer pools.  Index compression supported in V9 42
  43. 43. Information Management Hardware-assisted data compression 100 Compressed Ratio 80 60 53% 46% Compression 61% 40 Ratios Achieved 20 0 Non-Compressed Compressed 60% Compressed Non-compressed I/O Wait I/O Wait 412 CPU Compress CPU 354 Overhead CPU Elapsed time (sec) 281 283 Effects of 200 Compression on 157 Elapsed Time I/O CPU Inte nsive Intensive 43
  44. 44. Information Management DB2 Compression Study 5.3 TB 1.6 TB uncompressed raw data indexes, system, 6.9 TB work, etc. 1.3 TB DB2 hardware compression (75+% data space savings) 1.3 TB 1.6 TB 2.9 TB (( (58% disk savings) 44
  45. 45. Information Management Workload Management  Traditional workload management approach: Screen queries before they start execution  Time consuming for DBAs.  Not always possible. Some large queries slip through the crack.  Running these queries degrades system performance.  Cancellation of the queries wastes CPU cycles. 45
  46. 46. Information Management Workload Management Think about this: The ideal workload manager policy for data warehousing: Large query submitter behavior: Consistent favoring of Short query submitter behavior: shorter running work........ keep em short through WLM period aging ** no need to pre-identify shorts either expect answer later with select favoring of critical expect answer now business users through WLM explicit Who's impacted more real time ?? prioritization of critical users Priority Business Importance Period Ageing Short pe Medium Long Ty ry ue High Medium Lo Q Business Importance w 46
  47. 47. Information Management Query Monopolization Work qu eu e s s er En dU Processors 47
  48. 48. Information Management Inconsistent Response Times for Short Running Queries Response Time Workload Activity 5 secs. Monday Tuesday 5 mins! 48
  49. 49. Information Management Workload Management Workload Manager Overview Service Class Crit ic al Ad hoc IWEB es I nt elligent t iv je c Miner s ob DB2 s in es Warehouse bu Refres h DDF WLM JES2 Rules Report Class mo Market ing OMVS nit or ing S ales STC Head- quart ers Tes t 49
  50. 50. Information Management Service Classification Performance Type text Period Duration Importance Goal 1 5,000 Velocity = 80 2 2 50,000 Velocity = 60 2 3 1,000,000 Velocity = 40 3 4 30,000,000 Velocity = 30 4 5 Discretionary CPU Usage 50
  51. 51. Information Management Query Monopolization Solution 50 Concurrent Users Killer Queries 500 Runaway queries cannot monopolize system resources +4 killer queries 400 aged to low priority class period duration v elocity importance 1 5000 80 2 300 2 50000 60 2 3 1000000 40 3 4 10000000 30 4 192 195 5 disc retionary 200 162 155 +4 killer queries 101 95 100 48 46 0 base trivial small medium large Avg response 12 10 174 181 456 461 1468 1726 time in seconds 51
  52. 52. Information Management Consistent Response Time Consistent Response Time for Short-running work 120 Avg query ET in seconds 100 80 T ri vi al S m al l 60 M ed i u m L arg e 40 20 0 20 u sers 50 u sers 100 u sers 200 u sers 52
  53. 53. Information Management High Priority Queries Service Classification Service Performance Class = Period Duration Goal Importance Type text QUERY 1 5,000 Velocity = 80 1 2 20,000 Velocity = 70 2 3 1,000,000 Velocity = 50 3 4 Velocity = 30 Service Class Performance Period Duration Importance = CRITICAL Goal 1 Velocity = 90% 1 CPU Usage 53
  54. 54. Information Management Dynamic Warehouses Service Classification WLM POLICY = BASE WLM POLICY = REFRESH Service Period Importance Performance Service Period Importance Performance Class Goal Class Goal QUERY 1 2 Velocity = 80 QUERY 1 2 Velocity = 80 Type text 2 3 Velocity = 60 2 3 Velocity = 60 3 4 Velocity = 40 3 4 Velocity = 40 4 5 Velocity = 20 4 5 Velocity = 20 Discretionary Discretionary Insert/ 1 1 Velocity = 80 Insert 1 5 Velocity = 10 UTIL /UTIL 54
  55. 55. Information Management Planning for DASD Storage and I/O Bandwidth Rules of Thumb offered by IBM include allowances for: Indexes Tables Free space Work files Active and archive logs DB2 directory and catalog Temporary tables MQTs Balance bandwidth with available processing power ROTs available based on current processor ratings 55
  56. 56. Information Management Implementation Carefully plan DB2 data set placement Balance I/O activity among different Example of Sort Workfile volumes, control units, and channels to Distribution with 4 DSG Members minimize I/O elapsed time and I/O queuing Storage SWF A.1 Use DFSMS SWF B.1 SWF C.1 SWF D.1 SWF A.2 SWF B.2 Sort Work Files – Large, Many, Spread CEC One CEC Two Vol1 SWF C.2 SWF D.2 SWF A.3 SWF B.3 Member A Member C Vol2 SWF C.3 Dynamic Parallel Access Volumes (PAV) LPAR1.1 LPAR2.1 Vol3 SWF D.3 SWF A.4 SWF B.4 CF SWF C.4 Vol4 Multiple Addresses / Unit Control Blocks SWF D.4 SWF A.5 Member B Member D Vol5 SWF B.5 versus just multiple paths / channels SWF C.5 LPAR1.2 LPAR2.2 SWF D.5 33 Modified Indirect Access Word (MIDAW) Increased channel throughput VSAM Data Striping Partition 56
  57. 57. Information Management Balanced I/O Configuration  Ran tests to show how fast a zSeries processor can scan data with different degrees of SQL complexity.  Last tests were based on z900 processors  Projected System z9 scan rates based on LSPR ratio  Will determine System z9 scan rates based on benchmark results (analysis not done yet)  IBM Storage provides tools to your IBM team to project bandwidth of DS8000 subsystems.  Balance CPU and I/O configurations by matching up scan rates. 57
  58. 58. Information Management SMS Implementation  Use DB2 Storage groups (to define and manage data sets) in conjunction with SMS Storage Classes / Groups (for data set allocation)  Spread tables, indexes and work files  Simple ACS Routines using HLQs to direct datasets to appropriate Storage Class / Storage Group  Storage Groups requiring maximum concurrent bandwidth should consist of addresses spread across the I/O configuration  For Storage Classes, set Initial Access Response Time (IART) parameter to non- zero so SMS will use internal randomization algorithm for final volume selection Tables Indexes Work files ... 58
  59. 59. Information Management Planning for Central Storage Virtual (see also Installation Guide) Real Category Your size Default Category Factor [KB] EDM pool storage size 1.0 EDM pool storage size ? 33 600 Buffer pool size ? 104 000 Buffer pool size 1.0 Sort pool size ? 2 000 Sort pool size 0.5 RID pool size ? 8 000 Data set control block storage ? 17 928 RID pool size 0.5 size Code storage size 30 000 30 000 Data set control block storage size 0.6 Working storage size ? 55 800 Code storage size 0.5 Total main storage size (above ? =252 328 16-MB line) Working storage size 1.0 Region size (below 16-MB line) 1 160 (assume SWA above the line) 59
  60. 60. Information Management Large Memory Configurations  Up to 512 GB of central memory for a single z9 server.  Benchmark workload testing shows improved performance for certain queries  Higher buffer pool hit ratios  Reduced number of I/O  Reduced CPU consumption  Optimal performance requires a good understanding of the query workload and database design 60
  61. 61. Information Management Noteworthy zPARMs in a Data Warehouse Environment Recommendation for DWH Comments ANY CDSSRDEF=ANY, Allow parallelism for DW. any: parallelism, 1: no parallelism NO CONTSTOR=NO, For best performance, specify NO for this parameter. To resolve storage constraints in DBM1 address space, specify YES. See also: MINSTOR DSVCI=YES, YES The DB2-managed data set has a VSAM control interval that corresponds to the buffer pool that is used for the table space. YES MGEXTSZ=YES, Secondary extent allocations for DB2-managed data sets are to be sized according to a sliding scale MINSTOR=NO, NO Recommendation: For best performance, specify NO for this parameter. To resolve storage constraints in DBM1 address space, specify YES. See also: CONTSTOR OPTCCOS4=ON, enables fix PK26760 (inefficient access plan) OPTIXIO=ON OPTIXIO=ON: Provides stable I/O costing w ith significantly less sensitivity to buffer pool sizes. (This is the new default and recommended setting). OPTIORC=ON OPTIORC=ON – explanation??? PARAMDEG=X, #Processors <= X <= 2*#Processors If concurrency level is low , the ratio can be higher. SRTPOOL=8000, 8000 (means 8 MB Sort Pool) STARJOIN=DISABLE, DISABLE, unless SAP BW on z/OS used MXQDC=15 Default is 15. If changed, set MXQDC=TABLES_JOINED_THRESHOLD*(2**N)-1 61
  62. 62. Information Management Why DWH on System z?  Qualities of Service  Operational data and the ODS together means  Superior Quality  Reduced complexity  Super Availability  Reduced cost  Security and Regulatory Compliance  Shared processes, tools, procedures  Scalability  Streamlined compliance and  Backup and recovery security  Positioned for the future  zIIP specialty engine improves  Web-based applications TCO  XML support  Service Oriented Architecture  Better leverage System z skills (SOA) and investment 62
  63. 63. Information Management Suggested reading list (The Classics)  DB2 for OS/390 Capacity Planning, SG24-2244  Capacity Planning for Business Intelligence Applications: Approaches and Methodologies, SG24-5689  Building VLDB for BI Applications on OS/390: Case Study Experiences, SG24-5609  Business Intelligence Architecture on S/390 Presentation Guide, SG24-5641  e-Business Intelligence: Data Mart Solutions with DB2 for Linux on zSeries, SG24-6294  63
  64. 64. Information Management New (sort of in some cases) Releases  Best Practices for SAP Business Information Warehouse on DB2 for z/OS V8, SG24-6489  DB2 UDB for z/OS: Design Guidelines for High Performance and Availability, SG24-7134  Business Performance Management . . . Meets Business Intelligence, SG24-6340  Preparing for DB2 Near-Realtime Business Intelligence, SG24-6071  Disk storage access with DB2 for z/OS, REDP-4187  How does the MIDAW facility improve the performance of FICON channels using DB2 and other workloads?, REDP-4201  Index Compression with DB2 9 for z/OS, REDP-4345  System Programmer’s Guide To: Workload Manager, SG24-6472  64
  65. 65. Information Management What is DataQuant?  DataQuant provides a comprehensive query, reporting and data visualization platform for both web and workstation-based environments.  DataQuant introduces a wide variety of powerful business intelligence capabilities, from executive dashboards and interactive visual applications to information-rich graphical reports and ad-hoc querying and data analysis.  DataQuant provides 2 components  DataQuant for Workstation – An Elipse based environment for the development of query, report and dashboard solutions  DataQuant for WebSphere – A runtime environment capable of displaying DataQuant content using a “thin client” model 65
  66. 66. Information Management What is Alphablox?  Platform for Customized Analytic Applications and Inline Analytics  Pre-built components (Blox) for analytic functionality  Allows you to create customized analytic components that are embedded into existing business processes and web applications 66