Dimensional Modelling Session 2


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dimensional Modelling Session 2

  1. 1. Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
  2. 2. Dimension Model vs. ER Model <ul><li>ER Model: </li></ul><ul><li>Normalization to remove redundancy, anomaly and improve integrity up to 6NF </li></ul><ul><li>3 major types of relation, one-to-one, one-to-many, many-to-many </li></ul><ul><li>Optimized for INSERT, UPDATE and DELETE type operation </li></ul><ul><li>i.e. Perfect for OLTP applications (high volume of small transactions) </li></ul><ul><li>Things to consider: </li></ul><ul><ul><li>ER does not really model a business; rather modelling the micro relationships amount data elements </li></ul></ul><ul><ul><li>Query optimization </li></ul></ul>
  3. 3. Dimension Model vs. ER Model (cont…) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
  4. 4. Case Study 1 Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
  5. 5. Case Study 2 Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
  6. 6. Case Study 3 Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly – minimise data retrieval
  7. 7. Case Study 4 Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
  8. 8. Skills we have now <ul><ul><li>Dimension modeling techniques/templates for different processes and subject areas </li></ul></ul><ul><ul><li>Practiced appropriate dimensional modeling techniques in different scenarios; Conformed Dimension , Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension, Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table </li></ul></ul><ul><ul><li>Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology </li></ul></ul><ul><ul><li>Data Warehouse architecture for Dimensional Modeling </li></ul></ul><ul><ul><li>Dimensional Modeling Workshop procedures </li></ul></ul><ul><ul><li>ETL mapping documentations </li></ul></ul><ul><ul><li>Reporting with Dimensional Model; multi-pass SQL </li></ul></ul><ul><ul><li>Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions </li></ul></ul>
  9. 9. Technologies we have now <ul><ul><li>State-of-the-art Teradata hardware </li></ul></ul><ul><ul><li>GDW in 3rd NF </li></ul></ul><ul><ul><li>Essbase Studio (EIS) </li></ul></ul><ul><ul><li>DataStage </li></ul></ul><ul><ul><li>Oracle Grid coming online? </li></ul></ul><ul><ul><li>OBIEE </li></ul></ul><ul><li>Next </li></ul>
  10. 10. Shared Dimension (Conformed) and Drill Across Drill across to different business process fact can be enable via confromed dimension
  11. 11. Shared Dimension (Conformed) and Drill Across (cont…) To produce the following drill across report: SELECT Customer, Actual Amount, Forecast Amount FROM   --Subquery “Act” returns Actuals ( SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales Fact, Customer JOIN …) Act INNER JOIN   --Subquery “Fsct returns Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount FROM Forecast Fact, Customer JOIN …)Fsct   --Join for the above 2 result sets ON Act.Customer = Fcst.Customer AND … Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
  12. 12. Junk Dimension <ul><ul><li>Grouping of flags and indicators </li></ul></ul><ul><ul><li>Clean up cluttered design that already has too many dimensions </li></ul></ul><ul><ul><li>4 indicators (as above example) collapsed into a single integer surrogate key in the fact table </li></ul></ul><ul><ul><li>Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle) </li></ul></ul><ul><li>See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions) http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf </li></ul><ul><li>Back </li></ul>Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N 3 Y Y N Y 4 Y N Y Y
  13. 13. Accumulating Snapshot Schema Useful to track a multi-step business process – capture the process history in a single row Design to ease the query design and query performance Back
  14. 14. Roadmap <ul><ul><li>Conformed Dimensions (Product, Department, Date…) with full Slowly Changing Dimension (SCD) capability </li></ul></ul><ul><ul><li>Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard………) </li></ul></ul><ul><ul><li>Star Schema design review process (we build it and we kill it until it can’t be killed!) </li></ul></ul><ul><ul><li>Dimensional Modeling trainings </li></ul></ul><ul><ul><li>Code generator: DataStage, Oracle Warehouse Builder?? </li></ul></ul>
  15. 15. Myth busted <ul><ul><li>Teradata do not support Star Schema </li></ul></ul><ul><ul><li>Star Schema cannot support large volume of data </li></ul></ul><ul><ul><li>Column-Store vs. Row-Store “..column-store is able to process column-oriented data so effectively…finding that late materialization improves performance by a factor of three…compression provides about a factor of two on average… ”[1] </li></ul></ul><ul><li>[1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In SIGMOD’08. </li></ul>
  16. 16. The road is long but we won’t get lost! <ul><li>Books are on the way to our library! </li></ul><ul><ul><li>The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall) </li></ul></ul><ul><ul><li>Building the Data Warehouse (William E. Inmon) </li></ul></ul><ul><ul><li>Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson) </li></ul></ul><ul><ul><li>The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball) </li></ul></ul><ul><li>Online materials (Kimball Group http://www.kimballgroup.com ) </li></ul><ul><li>Bus Matrix Diagram </li></ul><ul><li>Some more interesting academic papers/research on my desk! </li></ul>
  17. 17. Bus Matrix Back
  18. 18. Previous presentations