Your SlideShare is downloading. ×
0
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Dimensional Modelling Session 2
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dimensional Modelling Session 2

1,828

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,828
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
60
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Dimensional Modeling (2) Gregory Ng Data Warehouse / Business Intelligence Designer 17th March 2008
  • 2. Dimension Model vs. ER Model
    • ER Model:
    • Normalization to remove redundancy, anomaly and improve integrity up to 6NF
    • 3 major types of relation, one-to-one, one-to-many, many-to-many
    • Optimized for INSERT, UPDATE and DELETE type operation
    • i.e. Perfect for OLTP applications (high volume of small transactions)
    • Things to consider:
      • ER does not really model a business; rather modelling the micro relationships amount data elements
      • Query optimization
  • 3. Dimension Model vs. ER Model (cont…) Dimension Model: Denormalized to 2NF (reduce number of tables and join paths), creates redundancy 1 major type of relationship, one-to-many Ideal for SELECT operation Top down approach: focus on business process Designed to support analytical queries and user access Handle anomaly within ETL Predictable SQL Perfect for OLAP applications
  • 4. Case Study 1 Project Writeaway (2009) Database SQL Server 2000 Reporting Hyperion IR Star Schema 4 No. of records ~ 1 mil Load Complete refresh Typical report generation time ~3 seconds Project build time 4 months Highlights Drill Across Factless Fact Table Dimension Outrigger Dimension Bridging Junk Dimension
  • 5. Case Study 2 Project Absenteeism (2006) Database SQL Server 2000 Reporting Cognos Star Schema 1 No. of records ~ 1 mil Load Incremental Typical report generation time ~25 seconds Project build time 4 weeks Highlights Drill Across Slowly Changing Dimension Active Data Warehouse
  • 6. Case Study 3 Project Mortgage Wealth DNA (2009) Database Teradata Reporting Hyperion IR Star Schema 3 No. of records ~ 150 mil Load Incremental Typical report generation time ~20-30 seconds Project build time 3 months Highlights Drill Across Aggregate Join Index Partitioning/Multi-Partitioning 99% aggregation done on Teradata on the fly – minimise data retrieval
  • 7. Case Study 4 Project Commway (2005) Database SQL Server 2000 Reporting Cognos Star Schema 3 No. of records ~ 10 mil Load Incremental Typical report generation time ~ 30 seconds Project build time 18 months Highlights Drill Across Slowly Changing Dimension Active Data Warehouse .NET Front End for Data Entry (4000+ Users)
  • 8. Skills we have now
      • Dimension modeling techniques/templates for different processes and subject areas
      • Practiced appropriate dimensional modeling techniques in different scenarios; Conformed Dimension , Junk Dimension , Outrigger Dimension, Rapid Changing Dimension, Dimension Bridging, Degenerate Dimension, Accumulating Snapshot Fact Table , Late Arrival Fact, Factless Fact Table
      • Refining ETL coding techniques; fact-to-dimension foreign key lookup via natural key, source staging/staging/helper/interim table methodology
      • Data Warehouse architecture for Dimensional Modeling
      • Dimensional Modeling Workshop procedures
      • ETL mapping documentations
      • Reporting with Dimensional Model; multi-pass SQL
      • Practiced Star Schema friendly Teradata functions; AJI, Partition, Multi-Partitions
  • 9. Technologies we have now
      • State-of-the-art Teradata hardware
      • GDW in 3rd NF
      • Essbase Studio (EIS)
      • DataStage
      • Oracle Grid coming online?
      • OBIEE
    • Next
  • 10. Shared Dimension (Conformed) and Drill Across Drill across to different business process fact can be enable via confromed dimension
  • 11. Shared Dimension (Conformed) and Drill Across (cont…) To produce the following drill across report: SELECT Customer, Actual Amount, Forecast Amount FROM   --Subquery “Act” returns Actuals ( SELECT Customer, SUM(Sales Amount) AS Actual Amount FROM Sales Fact, Customer JOIN …) Act INNER JOIN   --Subquery “Fsct returns Forecast ( SELECT Customer, SUM(Forecast Amount) AS Forecast Amount FROM Forecast Fact, Customer JOIN …)Fsct   --Join for the above 2 result sets ON Act.Customer = Fcst.Customer AND … Back Customer Actual Amount Forecast Amount Bill Owen $76859 $75768 James Brown $63548 $85676
  • 12. Junk Dimension
      • Grouping of flags and indicators
      • Clean up cluttered design that already has too many dimensions
      • 4 indicators (as above example) collapsed into a single integer surrogate key in the fact table
      • Provide a smaller, quicker point of entry for queries (probably not so relevant for database with BITMAP indices, e.g. Oracle)
    • See Also: Kiball Design Tip #48: De-Clutter With Junk (Dimensions) http:www.kimballgroup.com/html/designtipsPDF/DesignTips2003/KimballDT48DeClutter.pdf
    • Back
    Key Indicator1 Indicator2 Indicator3 Indicator4 1 Y Y Y Y 2 Y Y Y N 3 Y Y N Y 4 Y N Y Y
  • 13. Accumulating Snapshot Schema Useful to track a multi-step business process – capture the process history in a single row Design to ease the query design and query performance Back
  • 14. Roadmap
      • Conformed Dimensions (Product, Department, Date…) with full Slowly Changing Dimension (SCD) capability
      • Best practice ETL (Error handling, batch controls, slowly changing dimension ETL, foreign key lookup, assigning surrogate key, entity start/end date generation, naming standard………)
      • Star Schema design review process (we build it and we kill it until it can’t be killed!)
      • Dimensional Modeling trainings
      • Code generator: DataStage, Oracle Warehouse Builder??
  • 15. Myth busted
      • Teradata do not support Star Schema
      • Star Schema cannot support large volume of data
      • Column-Store vs. Row-Store “..column-store is able to process column-oriented data so effectively…finding that late materialization improves performance by a factor of three…compression provides about a factor of two on average… ”[1]
    • [1] D. J. Abadi, S. R. Madden, N. Hachem, Column-Stores vs. Row-Store: How Different Are They Really? In SIGMOD’08.
  • 16. The road is long but we won’t get lost!
    • Books are on the way to our library!
      • The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data (Ralph Kimall)
      • Building the Data Warehouse (William E. Inmon)
      • Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance (Christopher Adamson)
      • The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Ralph Kimball)
    • Online materials (Kimball Group http://www.kimballgroup.com )
    • Bus Matrix Diagram
    • Some more interesting academic papers/research on my desk!
  • 17. Bus Matrix Back
  • 18. Previous presentations

×