Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
Dimension Model vs. ER Model ER Model: Normalization to remove redundancy, anomaly and improve integrity up to 6NF 3 major types of relation, one-to-one, one-to-many, many-to-many Optimized for INSERT, UPDATE and DELETE type operation i.e. Perfect for OLTP applications (high volume of small transactions) Things to consider: ER does not really model a business; rather modelling the micro relationships amount data elements Query optimization
Dimension Model vs. ER Model (cont…) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
Case Study 1 Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
Case Study 2 Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
Case Study 3 Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly – minimise data retrieval
Case Study 4 Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
Skills we have now Dimension modeling techniques/templates for different processes and subject areas Practiced appropriate dimensional modeling techniques in different scenarios;  Conformed Dimension ,  Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension,  Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology Data Warehouse architecture for Dimensional Modeling Dimensional Modeling Workshop procedures ETL mapping documentations Reporting with Dimensional Model; multi-pass SQL Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions
Technologies we have now State-of-the-art Teradata hardware GDW in 3rd NF Essbase Studio (EIS) DataStage Oracle Grid coming online? OBIEE Next
Shared Dimension (Conformed) and Drill Across Drill across to different business process fact can be enable via confromed dimension
Shared Dimension (Conformed) and Drill Across (cont…) To produce the following drill across report: SELECT  Customer, Actual Amount, Forecast Amount  FROM    --Subquery “Act” returns Actuals ( SELECT  Customer, SUM(Sales Amount) AS Actual Amount FROM  Sales Fact, Customer JOIN  …) Act INNER JOIN    --Subquery “Fsct returns Forecast ( SELECT  Customer, SUM(Forecast Amount) AS Forecast Amount FROM  Forecast Fact, Customer JOIN  …)Fsct    --Join for the above 2 result sets ON  Act.Customer = Fcst.Customer AND  … Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
Junk Dimension Grouping of flags and indicators Clean up cluttered design that already has too many dimensions 4 indicators (as above example) collapsed into a single integer surrogate key in the fact table Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle) See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions)  http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf Back Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N 3 Y Y N Y 4 Y N Y Y
Accumulating Snapshot Schema Useful to track a multi-step business process – capture the process history in a single row Design to ease the query design and query performance Back
Roadmap Conformed Dimensions (Product, Department, Date…) with full Slowly Changing Dimension (SCD) capability Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard………) Star Schema design review process (we build it and we kill it until it can’t be killed!) Dimensional Modeling trainings Code generator: DataStage, Oracle Warehouse Builder??
Myth busted Teradata do not support Star Schema Star Schema cannot support large volume of data Column-Store vs. Row-Store  “..column-store is able to process column-oriented data so effectively…finding that late materialization improves performance by a factor of three…compression provides about a factor of two on average… ”[1] [1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In  SIGMOD’08.
The road is long but we won’t get lost! Books are on the way to our library! The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall) Building the Data Warehouse (William E. Inmon) Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson) The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball) Online materials (Kimball Group  http://www.kimballgroup.com ) Bus Matrix Diagram Some more interesting academic papers/research on my desk!
Bus Matrix Back
Previous presentations

Dimensional Modelling Session 2

  • 1.
    Dimensional Modeling (2)Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
  • 2.
    Dimension Model vs.ER Model ER Model: Normalization to remove redundancy, anomaly and improve integrity up to 6NF 3 major types of relation, one-to-one, one-to-many, many-to-many Optimized for INSERT, UPDATE and DELETE type operation i.e. Perfect for OLTP applications (high volume of small transactions) Things to consider: ER does not really model a business; rather modelling the micro relationships amount data elements Query optimization
  • 3.
    Dimension Model vs.ER Model (cont…) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
  • 4.
    Case Study 1Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
  • 5.
    Case Study 2Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
  • 6.
    Case Study 3Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly – minimise data retrieval
  • 7.
    Case Study 4Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
  • 8.
    Skills we havenow Dimension modeling techniques/templates for different processes and subject areas Practiced appropriate dimensional modeling techniques in different scenarios; Conformed Dimension , Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension, Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology Data Warehouse architecture for Dimensional Modeling Dimensional Modeling Workshop procedures ETL mapping documentations Reporting with Dimensional Model; multi-pass SQL Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions
  • 9.
    Technologies we havenow State-of-the-art Teradata hardware GDW in 3rd NF Essbase Studio (EIS) DataStage Oracle Grid coming online? OBIEE Next
  • 10.
    Shared Dimension (Conformed)and Drill Across Drill across to different business process fact can be enable via confromed dimension
  • 11.
    Shared Dimension (Conformed)and Drill Across (cont…) To produce the following drill across report: SELECT Customer, Actual Amount, Forecast Amount FROM   --Subquery “Act” returns Actuals ( SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales Fact, Customer JOIN …) Act INNER JOIN   --Subquery “Fsct returns Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount FROM Forecast Fact, Customer JOIN …)Fsct   --Join for the above 2 result sets ON Act.Customer = Fcst.Customer AND … Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
  • 12.
    Junk Dimension Groupingof flags and indicators Clean up cluttered design that already has too many dimensions 4 indicators (as above example) collapsed into a single integer surrogate key in the fact table Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle) See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions) http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf Back Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N 3 Y Y N Y 4 Y N Y Y
  • 13.
    Accumulating Snapshot SchemaUseful to track a multi-step business process – capture the process history in a single row Design to ease the query design and query performance Back
  • 14.
    Roadmap Conformed Dimensions(Product, Department, Date…) with full Slowly Changing Dimension (SCD) capability Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard………) Star Schema design review process (we build it and we kill it until it can’t be killed!) Dimensional Modeling trainings Code generator: DataStage, Oracle Warehouse Builder??
  • 15.
    Myth busted Teradatado not support Star Schema Star Schema cannot support large volume of data Column-Store vs. Row-Store “..column-store is able to process column-oriented data so effectively…finding that late materialization improves performance by a factor of three…compression provides about a factor of two on average… ”[1] [1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In SIGMOD’08.
  • 16.
    The road islong but we won’t get lost! Books are on the way to our library! The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall) Building the Data Warehouse (William E. Inmon) Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson) The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball) Online materials (Kimball Group http://www.kimballgroup.com ) Bus Matrix Diagram Some more interesting academic papers/research on my desk!
  • 17.
  • 18.