04 Dimensional Analysis - v6


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A simplistic transactional schema showing 7 tables relating to sales orders
  • This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema
    State and Country have been denormalized under Customer
    Dimensions are in Blue
    These are the things that we analyse “by” (eg. By Time, By Customer, By Region)
    Fact is yellow
    These are ususally quantitative things that we are interested in
  • We already have the data in a data model – why create another data model…? Well…
    What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems”
    We already have all the data in the OLTP system, why replicate it in a dimensional model?
    Atomic - Summary
    Supports Transaction throughput – Supports Aggregate queries
    Current - Historic
  • Facts work best if they are additive
    Dimensions allow us to “slice & dice” the facts into meaningful groups. The provide context
  • There are some changes where it is valid to overwrite history. When someone gets married and changes their name, they may want to carry the history of their previous purchases over to their new name rather than see a split history.
  • This makes inserts into your fact table more expensive as you always need to match on the effective dates as well as the business key. Sometimes people kept a “Current” flag. Another approach rather than putting nulls in the End date is to put an arbitrary date well in the future, this can make the join logic a bit simpler.
  • This type of change tracking is more useful when there is a once off change like a change in sales regions where you want to see history re-cast into the new regions, but may also want to compare the old and new regions.
  • 04 Dimensional Analysis - v6

    1. 1. Business Information Systems Dimensional Analysis Prithwis Mukerjee, Ph.D.
    2. 2. Dimensional Models A denormalized relational model  Made up of tables with attributes  Relationships defined by keys and foreign keys Organized for understandability and ease of reporting rather than update Queried and maintained by SQL or special purpose management tools. 2
    3. 3. From Relational to Dimensional Relational Model  Designed from the perspective of process efficiency Dimensional Model   Sales  Marketing  Sales    Customers  “Normalised” data structures  Entity Relationship Model  Used for transactional, or operational systems  Based on data that is  Current  Non Redundant “De-normalised” data structures in blatant violation of normalisation Used for analysis of aggregated data  OLAP : OnLine Analytical Processing  OLTP : OnLine Transaction Processing Designed from the perspective of subject  Based on data that is  Historical  May be redundant 3
    4. 4. ER vs. Dimensional Models One table per entity Minimize data redundancy Optimize update The Transaction Processing Model One fact table for data organization Maximize understandability Optimized for retrieval The data warehousing model 4
    5. 5. Strengths of the Dimensional Model Predictable, standard framework Respond well to changes in user reporting needs Relatively easy to add data without reloading tables Standard design approaches have been developed There exist a number of products supporting the dimensional model “The Data Warehouse Toolkit” by Ralph Kimball & Margy Ross “The Data Warehouse Lifecycle Toolkit” by Ralph Kimball & Margy Ross 5
    6. 6. A Transactional Database Countries Addresses Customers CustomerID AddressID Name States CountryID AddressID StateID Description StateID CountryID Street Desc OrderHeader OrderHeaderID CustomerID OrderDate FreightAmount Products OrderDetails ProductID OrderHeaderID Description ProductID Size Amount 6
    7. 7. A Dimensional Model Customers CustomerID Time Name TimeID Street Date FactSales Month CustomerID Quarter ProductID Year TimeID Products SalesAmount ProductID State Country Description Size Subcategory Category 7
    8. 8. Extract Transform Load Relational Dimensional Model Process Oriented Subject Oriented Transactional Aggregate Current Historic 8
    9. 9. Facts & Dimensions • There are two main types of objects in a dimensional model – Facts are quantitative measures that we wish to analyse and report on. – Dimensions contain textual descriptors of the business. They provide context for the facts. 9
    10. 10. Fact & Dimension Tables FACTS DIMENSIONS Contains two or more foreign keys Contain text and descriptive information Tend to have huge numbers of records 1 in a 1-M relationship Useful facts tend to be numeric and additive Generally the source of interesting constraints Typically contain the attributes for the SQL answer set. 10
    11. 11. Fact Table Measurements associated with a specific business process Grain: level of detail of the table Process events produce fact records Facts (attributes) are usually   Numeric Additive Derived facts included Foreign (surrogate) keys refer to dimension tables (entities) Classification values help define subsets 11
    12. 12. Dimension Tables Entities describing the objects of the process Conformed dimensions cross processes Attributes are descriptive   Text Numeric Surrogate keys Less volatile than facts (1:m with the fact table) Null entries Date dimensions Produce “by” questions 12
    13. 13. The Bus Matrix Date Product Store Promotion Warehouse Vendor Retail Sales X X X X Retail Inventory X X X Retail Deliveries X X X Warehouse Inventory X X X Warehouse Deliveries X X X X Purchase Orders X X X X X Contract Shipper X X Process 13
    14. 14. Business Model As always in life, there are some disadvantages to 3NF: Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives. The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data. 14
    15. 15. The 4 Step Design Process Choose the Data Mart Declare the Grain Choose the Dimensions Choose the Facts 15
    16. 16. Structural Dimensions The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database. The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions. 16
    17. 17. Steps in dimensional modeling Select an associative entity for a fact table Determine granularity Replace operational keys with surrogate keys Promote the keys from all hierarchies to the fact table Add date dimension Split all compound attributes Add necessary categorical dimensions Fact (varies with time) / Attribute (constant) 17
    18. 18. The Big Picture Customer ID Cust Name Cust Address Order ID Customer ID (FK) Date Order ID (FK) Item ID Product ID (FK) Quantity Value Product ID Product Name Product Desc Unit Price OLTP OLAP Customer ID Cust Name Cust Address Transaction ID Product ID (FK) Client ID (FK) Date Quantity Value Product ID Product Name Product Desc Unit Price 18
    19. 19. Converting an E-R Diagram Determine the purpose of the mart Identify an association table as the central fact table Determine facts to be included Replace all keys with surrogate keys Promote foreign keys in related tables to the fact table Add time dimension Refine the dimension tables 19
    20. 20. Fact Tables Represent a process or reporting environment that is of value to the organization It is important to determine the identity of the fact table and specify exactly what it represents. Typically correspond to an associative entity in the E-R model 20
    21. 21. Grain (unit of analysis) The grain determines what each fact record represents: the level of detail. For example  Individual transactions  Snapshots (points in time)  Line items on a document Generally better to focus on the smallest grain 21
    22. 22. Facts Measurements associated with fact table records at fact table granularity Normally numeric and additive Non-key attributes in the fact table Attributes in dimension tables are constants. Facts vary with the granularity of the fact table 22
    23. 23. Dimensions A table (or hierarchy of tables) connected with the fact table with keys and foreign keys Preferably single valued for each fact record (1:m) Connected with surrogate (generated) keys, not operational keys Dimension tables contain text or numeric attributes 23
    24. 24. CUSTOMER customer_ID (PK) customer_name purchase_profile credit_profile address STORE store_ID (PK) store_name address district floor_type CLERK clerk_id (PK) clerk_name clerk_grade ERD ORDER order_num (PK) customer_ID (FK) store_ID (FK) clerk_ID (FK) date PRODUCT SKU (PK) description brand category ORDER-LINE order_num (PK) (FK) SKU (PK) (FK) promotion_key (FK) dollars_sold units_sold dollars_cost PROMOTION promotion_NUM (PK) promotion_name price_type ad_type 24
    25. 25. TIME time_key (PK) SQL_date day_of_week month STORE store_key (PK) store_ID store_name address district floor_type CLERK clerk_key (PK) clerk_id clerk_name clerk_grade DIMENSONAL MODEL FACT time_key (FK) store_key (FK) clerk_key (FK) product_key (FK) customer_key (FK) promotion_key (FK) dollars_sold units_sold dollars_cost PRODUCT product_key (PK) SKU description brand category CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile address PROMOTION promotion_key (PK) promotion_name price_type ad_type 25
    26. 26. Date Dimensions Fiscal Year Calendar Year Fiscal Quarter Calendar Quarter Fiscal Month Calendar Month Fiscal Week Calendar Week Type of Day Day of Week Day Holiday 26
    27. 27. Attribute Name Attribute Description Day The specific day that an activity took place. Day of Week The specific name of the day. Holiday Identifies that this day is a holiday. Type of Day Indicates whether or not this day is a weekday or a weekend day. Calendar Week The week ending date, always a Saturday. Note that WE denotes Calendar Month The calendar month. Calendar Quarter Calendar Year Fiscal Week Fiscal Month Fiscal Quarter Fiscal Year Sample Values 06/04/1998; 06/05/1998 Monday; Tuesday Easter; Thanksgiving Weekend; Weekday WE 06/06/1998; WE 06/13/1998 January,1998; February, 1998 The calendar quarter. 1998Q1; 1998Q4 The calendar year. 1998 The week that represents the F Week 1 1998; corporate calendar. Note that the F F Week 46 1998 The fiscal period comprised of 4 or 5 F January, 1998; weeks. Note that the F in the data F February, 1998 The grouping of 3 fiscal months. F 1998Q1; F1998Q2 The grouping of 52 fiscal weeks / 12 F 1998; F 1999 fiscal months that comprise the financial year. 27
    28. 28. Snowflaking & Hierarchies Efficiency vs Space Understandability M:N relationships 28
    29. 29. Star Schema dimTime dimProduct … factSales dimCustomer ProductID ProductName CategoryName SubCategoryName ProductID TimeID CustomerID SalesAmount … 29
    30. 30. Snowflake Schema dimSubCategory SubCategoryID Description dimCategory CategoryID subCategoryID Description factSales ProductID TimeID CustomerID SalesAmount dimProduct ProductID CategoryID Description 30
    31. 31. Slowly Changing Dimensions (Addresses, Managers, etc.) Type 1: Store only the current value, overwrite previous value Type 2: Create a dimension record for each value (with or without date stamps) Type 3: Create an attribute in the dimension record for previous value 31
    32. 32. Examples Original SKU LeapPad Education LP2105 ProductKey Description Category SKU LeapPad Toy LP2105 ProductKey Description Category SKU 21553 LeapPad Education LP2105 44631 LeapPad Toy LP2105 ProductKey Description Category OldCat SKU 21553 Type 3 Category 21553 Type 2 Description 21553 Type 1 ProductKey LeapPad Toy Education LP2105 ProductKey Description Category OldCat SKU 21335 LeapPad Electronics Education LP2105 44631 LeapPad Electronics Toy LP2105 68122 LeapPad Education Electronics LP2105 Hybrid 32
    33. 33. Type 1 Slowly Changing Dimension The simplest form Only updates existing records Overwrites history 33
    34. 34. Type 1 Slowly Changing Dimension CustomerID Code Name State Gender 1 K001 Miranda Kerr VIC NSW F 34
    35. 35. Type 2 Slowly Changing Dimension Allows the recording of changes of state over time Generates a new record each time the state changes Usually requires the use of effective dates when joining to facts. 35
    36. 36. Type 2 Slowly Changing Dimension CustomerID Code Name State Gender Start End 1 K001 Miranda Kerr NSW F 1/1/09 23/2/09 <NULL> 2 K001 Miranda Kerr VIC F 24/2/09 <NULL> 36
    37. 37. Type 3 Slowly Changing Dimension De-normalized change tracking Only keeps a limited history Stores changes in separate columns 37
    38. 38. Type 3 Slowly Changing Dimension CustomerID Code Name 1 K001 Miranda Kerr Current Gender Prev State State NSW F <NULL> VIC 38