Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Traditional warehousing focused on query and reporting to understand what happened, and evolved to enable OLAP and data mining to understand the why those things happened and recommend future action. [click for transition] Dynamic warehousing is a new approach to address the primary business challenges organizations face today, which requires the ability to d eliver the right information to the right people at the right time to more effectively leverage information and enable more effective business decisions . It’s about information on demand to optimize real-time processes. [click for transition] And Dynamic Warehousing requires four key things: 1. Support for real-time access to aggregated, cleansed information, which can be delivered in the context of the activities and processes being performed; 2. Analytics that can be leveraged as part of a business process; 3. The ability to incorporate knowledge from unstructured information; and 4. A complete set of integrated capabilities that extend beyond the warehouse to enable Information on Demand The distinction between data warehousing and online transaction processing is blurring. Data warehousing and analytic applications are accessing operational or near-real-time data. Transactions have become more complex to provide better interaction and productivity for people. Dynamic warehousing has capabilities and strengths on all IBM platforms. The traditional mainframe strengths for consistency with operational data, high security, and continuous availability match well with dynamic warehousing.
  • Spot the alteration
  • SMB customers are no different from Enterprise class customers concerning BI. They need the same kinds of applications and buy the same products and services. The only difference is they are less likely to take on risky projects, start up companies as partners, or new software releases. That is, they have low tolerance for failure –it has to work the first time every time for them. Each of the phrases shown identify an application area that produces value for IBM clients. D Graham
  • Customers may use DW or BI to describe any one of a number of “solutions” Customers needs range from simple reporting off one data source to sophisticated analyses from multiple sources BI or DW is likely part of a larger “solution” Business Performance Management, Risk and Compliance, etc.
  • Dimension : Describe business data elements ( Product, Market, Time, etc.) A structural attribute of a cube, which is an organized hierarchy of categories (levels) that describe data in the fact table. These categories typically describe a similar set of members upon which the user wants to base an analysis. For example, a geography dimension might include levels for Country, Region, State or Province, and City. Members : Dimension instance (West, LA, OR, WA, Qtr1, Jan, Feb, Mar, etc) Measures : Dimension that contain numeric values of facts (Sales, Profit, Margin, etc.) Slice : A subset of dimensions and members Cell : A unit of data representing the intersection of dimensions in a Cube
  • Dimensions often contain hierarchical relationships. Year -> Quarter -> Month Hierarchies define the navigation paths for the dimension. Hierarchies usually define the aggregation paths for the dimension. Normalizing redundant attribute information results in a snowflake schema (not recommended). A dimension will often have more than one hierarchy.
  • Balanced : A hierarchy with meaningful levels and consistent depth . Each member's parent is in the level directly above it. Example: Time Unbalanced : Parent-child relationship between Levels Inconsistent Semantic Meaning for Levels Non-uniform Depth Example: Organizational Chart Ragged Consistent Level Semantics with Inconsistent Depths Some Leaf Members have gaps in the Ancestor Levels Example: Geography Network No semantic meaning to specific Level Order No parent – child semantics Defines Navigation Paths Does not define Aggregation Paths Example: Demographics
  • Warehouse schema (Data Model) Logical OLAP Model Physical Model – mapping of logical OLAP model to the physical data model (schema)
  • Beginning with InfoSphere Warehouse 9.5.2, Cubing Services cubes are now first class data providers to Cognos 8 BI, enabling the full suite of Cognos clients and applications to take advantage of these powerful warehouse-based data cubes, including but not limited to: Query and Reporting Studios Analysis Studio, for ad hoc analysis Dashboarding and Scorecarding Cognos Analytics for Excel (Cafe) In addition to Cognos, the following clients can also natively access Cubing Services cubes: DB2 Query Management Facility (QMF) Enterprise Edition V9 Fix Pack 8 – XMLA interface IBM DataQuant for Multiplatforms V1.2 Fix Pack 4– XMLA interface IBM DataQuant for z/OS V1.2 Fix Pack 4 – XMLA interface Microsoft Excel - ODBO interface Cubeware Cockpit - ODBO interface Any tools that support the standard ODBO or XMLA interfaces can natively access Cubing Services cubes. IBM Alphablox can also natively access Cubing Services cube. ODBO = OLE DB for OLAP XMLA = XML for Analysis

    1. 1. O n L ine A nalytical P rocessing (OLAP) Andy Perkins zWarehouse SWAT Team [email_address]
    2. 2. The right information To the right people At the right time
    3. 3. Reporting grows up.. Historical Reports Query & Reporting to Understand What Happened Operational Reports Transaction Systems to understand what is happening in the business RIGHT NOW OLAP & Data Mining to Understand Why and Recommend Future Action Information Analysis
    4. 4. … to become Business Intelligence <ul><li>Simple reporting no longer sufficient </li></ul><ul><li>Business Intelligence : the process of gathering, consolidating, and analyzing data from multiple sources for strategic and tactical decision making. </li></ul><ul><ul><li>derives new value from transactional data </li></ul></ul><ul><ul><li>supports strategic planning, monitoring, and efficiency </li></ul></ul><ul><ul><li>delivers knowledge of the customer, suppliers, and channels </li></ul></ul><ul><ul><li>unifies the enterprise with actionable information for operational Business Intelligence </li></ul></ul><ul><li>Top quality BI relies on a secure, high performing, warehouse oriented infrastructure to deliver Information on Demand—based on open standards </li></ul>Analysis
    5. 5. Examples of Business Intelligence <ul><li>Financial Analytics </li></ul><ul><ul><li>Financial consolidation </li></ul></ul><ul><ul><li>Business Performance Monitoring (BPM) </li></ul></ul><ul><ul><li>Balanced Scorecards </li></ul></ul><ul><ul><li>ERP reporting </li></ul></ul><ul><li>CRM Analytics </li></ul><ul><ul><li>Customer segmentation </li></ul></ul><ul><ul><li>Customer acquisition & retention </li></ul></ul><ul><ul><li>Profitability analysis </li></ul></ul><ul><ul><li>Campaign management </li></ul></ul><ul><ul><li>Market basket analysis </li></ul></ul><ul><li>Other Analytics </li></ul><ul><ul><li>Demand Planning </li></ul></ul><ul><ul><li>Pricing elasticity analysis </li></ul></ul><ul><ul><li>Risk analysis </li></ul></ul><ul><ul><li>Inventory Forecasting </li></ul></ul><ul><ul><li>Supply chain forecasting </li></ul></ul><ul><ul><li>Supplier scorecards </li></ul></ul><ul><ul><li>Workforce analysis </li></ul></ul><ul><ul><li>Logistics trend analysis </li></ul></ul><ul><ul><li>Procurement analysis </li></ul></ul><ul><ul><li>Category management </li></ul></ul>DB2 Performance And Usage
    6. 6. Business Intelligence requires good foundation… <ul><li>Business Intelligence (BI) and Data Warehousing (DW) are sometimes used interchangeably </li></ul><ul><ul><li>Typically BI includes end user tools for query, reporting, analysis, dashboarding etc. </li></ul></ul><ul><ul><li>Includes advanced analytics such as Online Analytic Processing (OLAP) and data mining </li></ul></ul><ul><ul><li>Both concepts depend on each other </li></ul></ul><ul><ul><ul><li>BI almost always assumes a Warehouse (WH), Operational Data Store (ODS) or Data Mart (DM) exists with timely, trusted information </li></ul></ul></ul><ul><ul><ul><li>A DW depends on end user tools that turn data into information. </li></ul></ul></ul><ul><li>Both terms (DW and BI) address desire for timely, accurate, available data delivered when, where and how the end users want it </li></ul><ul><li>NEW TERM: Operational Intelligence or Operational BI </li></ul>
    7. 7. Multidimensional Reporting/Analysis <ul><li>A style of viewing information from various perspectives and aggregation levels over time </li></ul><ul><li>Start at a high level for seeing trends and finding outliers </li></ul><ul><ul><li>Sales vs Costs by Region by Product Category by Quarter for the last 5 quarters </li></ul></ul><ul><li>Drill down for more detail </li></ul><ul><ul><li>Sales vs Costs by Stores in the South Region by Product Category by Month for the last 2 quarters </li></ul></ul><ul><li>Change perspective and filter </li></ul><ul><ul><li>Sales vs Costs by Sales Person by Product Category by Month for the last 2 quarters for Store 25 </li></ul></ul>Not dependent on any particular technology Can be accomplished by iteratively requesting batch reports
    8. 8. OnLine Analytical Processing (OLAP) <ul><li>Interactive multidimensional analysis at the “speed of thought” </li></ul><ul><li>Great Calculations </li></ul><ul><ul><li>Simple aggregations: sums, averages </li></ul></ul><ul><ul><li>Time based calculations – 3 month moving averages </li></ul></ul><ul><ul><li>Multi-pass calculations – rank, percentage of total </li></ul></ul><ul><li>Aggregation </li></ul><ul><ul><li>Express queries in terms of dimensions </li></ul></ul><ul><ul><li>Aggregate using dimension hierarchies </li></ul></ul><ul><ul><li>Identify key indicators using business terminology </li></ul></ul><ul><li>Navigation </li></ul><ul><li>Dimensions: Product, Geography, Time </li></ul><ul><li>Dimensions have attributes: Products have colors, sizes, price ranges </li></ul><ul><li>Dimensions have hierarchical levels: Region->State->City </li></ul>
    9. 9. General OLAP Architecture Data Warehouse Multidimensional Server Excel Report Server SQL MDX Web Server MDX Modeling and Admin Tooling
    10. 10. Desktop OLAP (DOLAP) Data Warehouse Multidimensional Server Client Desktop/Laptop SQL Extract
    11. 11. Multidimensional Storage OLAP (MOLAP) Data Warehouse Multidimensional Server Excel Report Server MDX Web Server MDX Modeling and Admin Tooling Extract
    12. 12. Relational OLAP (ROLAP) Data Warehouse Multidimensional Metadata Excel Report Server Web Server Modeling and Admin Tooling SQL
    13. 13. Hybrid OLAP (HOLAP) Data Warehouse Multidimensional Server Excel Report Server MDX Web Server MDX Modeling and Admin Tooling SQL
    14. 14. Data Modeling <ul><li>Star Schema model is predominant modeling style for the relational database </li></ul>Fact table(s) Store values Dept Dimension Time Dimension Account Dimension Project Dimension XYZ Dimension <ul><li>Dimension Tables </li></ul><ul><ul><li>Define the categories that organize the analyzed metrics </li></ul></ul><ul><ul><li>E.g., Stores, Time, Customer </li></ul></ul><ul><ul><li>Contain everything about that category that the business analysis might need (attributes) </li></ul></ul><ul><ul><li>Primary key identifies a single member at the lowest level of grain . </li></ul></ul><ul><li>Fact Tables </li></ul><ul><ul><li>Contain all the metrics (measures) for the business analysis </li></ul></ul><ul><ul><li>At the same grain as the dimension tables* </li></ul></ul><ul><ul><li>Foreign keys join back to the dimension tables to enable grouping and aggregating. </li></ul></ul>
    15. 15. Sample Star Schema
    16. 16. Snowflake Schema
    17. 17. Advantages of Star and Snowflake Schemas <ul><li>Reflect the dimensional nature of the business and the business questions </li></ul><ul><ul><li>SQL query is (longer but) very similar to the business question: </li></ul></ul>What were sales of shoes in Q1 by region ? Select SUM(Fact.Sales) , Store.Region From Fact, Store, Product, Time Where Fact.time_id = Time.time_id AND Fact.produ_id=Proeuct.prod_id AND Fact.store_id=Store.store_id AND Time.Qtr = ‘Q1’ AND Product.product = ‘Shoes ” Group By Store.Region For the Snowflake, we would simply see more joins to join the table that has the granularity of the business question.
    18. 18. Advantages of Star and Snowflake Schemas <ul><li>SQL is straightforward for a tool to generate </li></ul><ul><li>Denormalized for faster reads </li></ul><ul><li>Optimized for n-way joins on the fact table </li></ul><ul><ul><li>With good RI (enforced or informational) the DB2 optimizer can do a good job with star joins </li></ul></ul><ul><li>Optimized for aggregations on the dimensional hierarchies </li></ul><ul><ul><li>Advisors and MQTs can help materialize aggregations </li></ul></ul><ul><li>Column calculations on the same row are efficient </li></ul><ul><ul><li>E.g, Profit = sales_col – COGS_col </li></ul></ul>
    19. 19. What is a OLAP Cube ? Slice Cell Dimension  Dice PRODUCT Qtr1 WA Group1 Qtr1 WA Prod11 Qtr1 WA Prod12 Qtr1 WA Prod13 Qtr1 WA Prod14 Qtr1 WA PRODUCT Qtr1 OR Group1 Qtr1 OR Prod11 Qtr1 OR Prod12 Qtr1 OR Prod13 Qtr1 OR Prod14 Qtr1 OR PRODUCT Qtr1 LA Group1 Qtr1 LA Prod11 Qtr1 LA Prod12 Qtr1 LA Prod13 Qtr1 LA Prod14 Qtr1 LA PRODUCT Qtr1 WEST Group1 Qtr1 WEST Prod11 Qtr1 WEST Prod12 Qtr1 WEST Prod13 Qtr1 WEST Prod14 Qtr1 WEST PRODUCT Qtr1 MARKET Group1 Qtr1 MARKET Prod11 Qtr1 MARKET Prod12 Qtr1 MARKET Prod13 Qtr1 MARKET Prod14 Qtr1 MARKET PRODUCT Mar WA Group1 Mar WA Prod11 Mar WA Prod12 Mar WA Prod13 Mar WA Prod14 Mar WA PRODUCT Mar OR Group1 Mar OR Prod11 Mar OR Prod12 Mar OR Prod13 Mar OR Prod14 Mar OR PRODUCT Mar LA Group1 Mar LA Prod11 Mar LA Prod12 Mar LA Prod13 Mar LA Prod14 Mar LA PRODUCT Mar WEST Group1 Mar WEST Prod11 Mar WEST Prod12 Mar WEST Prod13 Mar WEST Prod14 Mar WEST PRODUCT Mar MARKET Group1 Mar MARKET Prod11 Mar MARKET Prod12 Mar MARKET Prod13 Mar MARKET Prod14 Mar MARKET PRODUCT Feb WA Group1 Feb WA Prod11 Feb WA Prod12 Feb WA Prod13 Feb WA Prod14 Feb WA PRODUCT Feb OR Group1 Feb OR Prod11 Feb OR Prod12 Feb OR Prod13 Feb OR Prod14 Feb OR PRODUCT Feb LA Group1 Feb LA Prod11 Feb LA Prod12 Feb LA Prod13 Feb LA Prod14 Feb LA PRODUCT Feb WEST Group1 Feb WEST Prod11 Feb WEST Prod12 Feb WEST Prod13 Feb WEST Prod14 Feb WEST PRODUCT Feb MARKET Group1 Feb MARKET Prod11 Feb MARKET Prod12 Feb MARKET Prod13 Feb MARKET Prod14 Feb MARKET PRODUCT Jan WA Group1 Jan WA Prod11 Jan WA Prod12 Jan WA Prod13 Jan WA Prod14 Jan WA PRODUCT Jan OR Group1 Jan OR Prod11 Jan OR Prod12 Jan OR Prod13 Jan OR Prod14 Jan OR PRODUCT Jan LA Group1 Jan LA Prod11 Jan LA Prod12 Jan LA Prod13 Jan LA Prod14 Jan LA PRODUCT Jan WEST Group1 Jan WEST Prod11 Jan WEST Prod12 Jan WEST Prod13 Jan WEST Prod14 Jan WEST PRODUCT Jan MARKET Group1 Jan MARKET Prod11 Jan MARKET Prod12 Jan MARKET Prod13 Jan MARKET Prod14 Jan MARKET PRODUCT Time WA Group1 Time WA Prod11 Time WA Prod12 Time WA Prod13 Time WA Prod14 Time WA PRODUCT Time OR Group1 Time OR Prod11 Time OR Prod12 Time OR Prod13 Time OR Prod14 Time OR PRODUCT Time LA Group1 Time LA Prod11 Time LA Prod12 Time LA Prod13 Time LA Prod14 Time LA PRODUCT Time WEST Group1 Time WEST Prod11 Time WEST Prod12 Time WEST Prod13 Time WEST Prod14 Time WEST PRODUCT Time MARKET Group1 Time MARKET Prod11 Time MARKET Prod12 Time MARKET Prod13 Time MARKET Prod14 Time MARKET Measures Sales Sales Sales Sales Sales Members
    20. 20. Levels <ul><li>Defines the “Resolution” or Granularity of the Dimension. </li></ul><ul><li>Consists of </li></ul><ul><ul><li>Level Key Attribute(s) </li></ul></ul><ul><ul><li>Default Attribute </li></ul></ul><ul><ul><li>Ordering Attribute(s) </li></ul></ul><ul><ul><li>Related Attribute(s) </li></ul></ul><ul><li>The Level Key uniquely identifies every member of the level. </li></ul>
    21. 21. Hierarchies <ul><li>Ordered Collection of Levels </li></ul><ul><li>Defines Navigation and Aggregation Paths </li></ul><ul><ul><li>Month -> Quarter -> Year </li></ul></ul><ul><ul><li>Week -> Year </li></ul></ul><ul><li>Various Types and Deployments </li></ul>2004 Qtr1 Jan Feb Mar Qtr2 Apr
    22. 22. Hierarchy Types Balanced Unbalanced Ragged Network
    23. 23. Issues of Measures <ul><li>Measure definitions </li></ul><ul><ul><li>Can be a simple mapping to a fact table column or calculated. </li></ul></ul><ul><ul><li>Calculated measures based on fact columns c an and will be represented in the MQTs </li></ul></ul><ul><ul><li>Calculated measures defined by MDX statements will not be calculated in the MQT, which has performance implications. </li></ul></ul><ul><li>Aggregation functions define how the measures will be summarized up the hierarchy </li></ul><ul><ul><li>Defined: Calculate the values then aggregate the results. </li></ul></ul><ul><ul><li>None: Aggregate the inputs, then calculate the aggregates. </li></ul></ul><ul><ul><li>The order of aggregation to calculation is extremely important for non-additive functions. </li></ul></ul>
    24. 24. Cube Model Relational tables in DB2 fact table dimension tables dimension tables Join Attribute Attribute Join Hierarchy Measure Facts Dimension Cube Model Measure Level Join Attribute Cube dimension Cube Facts Cube hierarchy Cube Level Cube
    25. 25. InfoSphere Warehouse on System z <ul><li>Development tooling IDE - Design Studio </li></ul><ul><ul><li>Data Warehouse and OLAP tooling over DB2 for System z </li></ul></ul><ul><ul><li>Physical data modeling for relational tables </li></ul></ul><ul><ul><li>DB2-based data movement and transformation (SQW) </li></ul></ul><ul><ul><li>OLAP Modeling </li></ul></ul><ul><li>Runtime tooling </li></ul><ul><ul><li>OLAP Cube Server </li></ul></ul><ul><ul><li>Data movement and transformation runtime services (SQW) </li></ul></ul><ul><ul><li>Web-based Administration Console </li></ul></ul>
    26. 26. OLAP on DB2 for System z InfoSphere Warehouse IBM Cognos 8 BI IBM DataQuant, DB2 QMF, IBM Alphablox Microsoft Excel Universal Cube Access (MDX, ODBO, XMLA) Portals, Web Applications, Dashboards, Interactive Reports, Ad Hoc Analysis, Common Desktop Tools
    27. 27. InfoSphere Warehouse – Cubing Services InfoSphere Warehouse DB2 for z/OS Cube Server Excel Cognos 8 BI Server Linux LPAR Metadata and Data Cache SQL MDX Web Server MDX Design Studio & Admin Console Linux LPAR
    28. 28. Cube Server in Action – Startup Cubing Services DB2 Start Cube OLAP Metadata OLAP Metadata MQTs. SQL Dim Member Cache
    29. 29. Cube Server in Action – Query Processing Cubing Services DB2 MDX Query MDX calculation engine Data cache Can pre-populate cache with an MDX seed query OLAP Metadata OLAP Metadata MQTs SQL MDX Dim Member Cache
    30. 30. Develop Star Schema model and create tables
    31. 31. Populate Dimension and Fact Tables Populate Group dimension table Populate Fact table
    32. 32. Create Cube Model and Cube definition Measures Facts Dimensions Levels Hierarchies Cubes Facts Subset Dim Subset Cube Model - Superset Cube - Subset
    33. 33. Deploy model/cubes <ul><li>Deploy Cube </li></ul><ul><ul><li>Moves “definition” of the cubes to the runtime environment, ie: Cube Server </li></ul></ul><ul><ul><li>Step 1 – use Design Studio to deploy to the metadata repository </li></ul></ul><ul><ul><li>Step 2 – use Administration Console to define and start a Cube Server </li></ul></ul><ul><ul><li>Step 3 – assign a Cube to cube server and start </li></ul></ul>
    34. 34. Optimize Cube <ul><li>Optimization of a cube means to optimize the cube’s access to DB2 </li></ul><ul><ul><li>This is accomplished by means of defining a performance layer of MQTs </li></ul></ul><ul><ul><li>The Optimization Wizard creates a recommended set of MQTs based on the Cube Model and sampling of data </li></ul></ul>
    35. 35. Cube Performance Statistics <ul><li>From cube server performance log </li></ul><ul><li>Captured 54 queries from from demo prep </li></ul><ul><li>Cube server started at 6:21am PST </li></ul><ul><li>Last MDX request at 8:48am PST </li></ul><ul><li>Queries < 1 sec </li></ul><ul><ul><li>42 total queries </li></ul></ul><ul><ul><li>33 queries satisfied from cube cache </li></ul></ul><ul><ul><li>9 queries went back to DB2 </li></ul></ul><ul><ul><li>Probably routed to MQTs </li></ul></ul><ul><li>Queries 1-10 secs </li></ul><ul><ul><li>8 total queries </li></ul></ul><ul><ul><li>8 queries went back to DB2 </li></ul></ul><ul><ul><li>Probably hit MQTs </li></ul></ul><ul><li>Queries 10-20 secs </li></ul><ul><ul><li>2 total queris </li></ul></ul><ul><ul><li>2 queries went back to DB2 </li></ul></ul><ul><ul><li>Maybe hit MQTs?? </li></ul></ul><ul><li>Long Queries – Opportunity for adding MQTs or database tuning </li></ul><ul><ul><li>1 at 120.81 secs – 120.30 seconds in DB2 </li></ul></ul><ul><ul><li>1 at 143.70 secs – 143.70 seconds in DB2 </li></ul></ul>Using only initial MQT recommendation Of 20 small MQTs Started with a cold cache When data in cache, time to satisfy query Was mostly < .010 sec Fact table size in DB2 ~ 2M rows
    36. 36. Further Reading <ul><li>Books </li></ul><ul><ul><li>Data Warehouse – from Architecture to Implementation by Barry Devlin </li></ul></ul><ul><ul><li>Building the Data Warehouse, 4th Edition - by W. H. (Bill) Inmon </li></ul></ul><ul><ul><li>The Data Warehouse Toolkit, by Ralph Kimball </li></ul></ul><ul><li>IBM Redbooks ( </li></ul><ul><ul><li>Dimensional Modeling: In a Business Intelligence Environment (SG24-7138) </li></ul></ul><ul><ul><li>Enterprise Data Warehousing with DB2 9 for z/OS (SG24-7637) </li></ul></ul><ul><ul><li>InfoSphere Warehouse: Cubing Services and Client Access Interfaces (SG24-7582) </li></ul></ul><ul><li>Websites </li></ul><ul><ul><li>International DB2 Users Group – </li></ul></ul><ul><ul><li>The Data Warehousing Institite – </li></ul></ul><ul><ul><li>BeyeNetwork - </li></ul></ul><ul><ul><li>IBM – </li></ul></ul>