Successfully reported this slideshow.

Dimensional Modelling


Published on

Published in: Education, Technology
  • Be the first to comment

Dimensional Modelling

  1. 1. Business Information Systems Dimensional Analysis Prithwis Mukerjee, Ph.D.
  2. 2. Dimensional Models <ul><li>A denormalized relational model </li><ul><li>Made up of tables with attributes
  3. 3. Relationships defined by keys and foreign keys </li></ul><li>Organized for understandability and ease of reporting rather than update
  4. 4. Queried and maintained by SQL or special purpose management tools. </li></ul>
  5. 5. From Relational to Dimensional <ul><li>Relational Model </li><ul><li>Designed from the perspective of process efficiency </li><ul><li>Marketing
  6. 6. Sales </li></ul><li>“Normalised” data structures </li><ul><li>Entity Relationship Model </li></ul><li>Used for transactional, or operational systems </li><ul><li>OLTP : OnLine Transaction Processing </li></ul><li>Based on data that is </li><ul><li>Current
  7. 7. Non Redundant </li></ul></ul></ul><ul><li>Dimensional Model </li><ul><li>Designed from the perspective of subject </li><ul><li>Sales
  8. 8. Customers </li></ul><li>“De-normalised” data structures in blatant violation of normalisation
  9. 9. Used for analysis of aggregated data </li><ul><li>OLAP : OnLine Analytical Processing </li></ul><li>Based on data that is </li><ul><li>Historical
  10. 10. May be redundant </li></ul></ul></ul>
  11. 11. ER vs. Dimensional Models <ul><li>One table per entity
  12. 12. Minimize data redundancy
  13. 13. Optimize update
  14. 14. The Transaction Processing Model </li></ul><ul><li>One fact table for data organization
  15. 15. Maximize understandability
  16. 16. Optimized for retrieval
  17. 17. The data warehousing model </li></ul>
  18. 18. Strengths of the Dimensional Model <ul><li>Predictable, standard framework
  19. 19. Respond well to changes in user reporting needs
  20. 20. Relatively easy to add data without reloading tables
  21. 21. Standard design approaches have been developed
  22. 22. There exist a number of products supporting the dimensional model </li></ul>“ The Data Warehouse Toolkit” by Ralph Kimball & Margy Ross “ The Data Warehouse Lifecycle Toolkit” by Ralph Kimball & Margy Ross
  23. 23. A Transactional Database OrderDetails OrderHeaderID ProductID Amount OrderHeader OrderHeaderID CustomerID OrderDate FreightAmount Products ProductID Description Size Customers CustomerID AddressID Name Addresses AddressID StateID Street States StateID CountryID Desc Countries CountryID Description
  24. 24. A Dimensional Model FactSales CustomerID ProductID TimeID SalesAmount Products ProductID Description Size Subcategory Category Customers CustomerID Name Street State Country Time TimeID Date Month Quarter Year
  25. 25. Extract Transform Load Relational Dimensional Model Process Oriented Subject Oriented Transactional Aggregate Current Historic
  26. 26. Facts & Dimensions <ul><li>There are two main types of objects in a dimensional model </li></ul><ul><ul><li>Facts are quantitative measures that we wish to analyse and report on.
  27. 27. Dimensions contain textual descriptors of the business. They provide context for the facts. </li></ul></ul>
  28. 28. Fact & Dimension Tables <ul><li>FACTS </li></ul><ul><li>Contains two or more foreign keys
  29. 29. Tend to have huge numbers of records
  30. 30. Useful facts tend to be numeric and additive </li></ul><ul><li>DIMENSIONS </li></ul><ul><li>Contain text and descriptive information
  31. 31. 1 in a 1-M relationship
  32. 32. Generally the source of interesting constraints
  33. 33. Typically contain the attributes for the SQL answer set. </li></ul>
  34. 34. GB Video E-R Diagram Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel No CC No Expire Rental #Rental No Date Clerk No Pay Type CC No Expire CC Approval Line #Line No Due Date Return Date OD charge Pay type Requestor of Owner of Video #Video No One-day fee Extra days Weekend Title #Title No Name Vendor No Cost Name for Holder of
  35. 35. GB Video Data Mart Customer CustID Cust No F Name L Name Rental RentalID Rental No Clerk No Store Pay Type Line LineID OD Charge OneDayCharge ExtraDaysCharge WeekendCharge DaysReserved DaysOverdue CustID AddressID RentalId VideoID TitleID RentalDateID DueDateID ReturnDateID Video VideoID Video No Title TitleID TitleNo Name Cost Vendor Name Rental Date RentalDateID SQLDate Day Week Quarter Holiday Due Date DueDateID SQLDate Day Week Quarter Holiday Return Date ReturnDateID SQLDate Day Week Quarter Holiday Address AddressID Adddress1 Address2 City State Zip AreaCode Phone
  36. 36. Fact Table Measurements associated with a specific business process <ul><li>Grain: level of detail of the table
  37. 37. Process events produce fact records
  38. 38. Facts (attributes) are usually </li><ul><li>Numeric
  39. 39. Additive </li></ul><li>Derived facts included
  40. 40. Foreign (surrogate) keys refer to dimension tables (entities)
  41. 41. Classification values help define subsets </li></ul>
  42. 42. Dimension Tables Entities describing the objects of the process <ul><li>Conformed dimensions cross processes
  43. 43. Attributes are descriptive </li><ul><li>Text
  44. 44. Numeric </li></ul><li>Surrogate keys
  45. 45. Less volatile than facts (1:m with the fact table)
  46. 46. Null entries
  47. 47. Date dimensions
  48. 48. Produce “by” questions </li></ul>
  49. 49. Design Issues Relational and Multidimensional Models <ul><li>Denormalized and indexed relational models more flexible
  50. 50. Multidimensional models simpler to use and more efficient </li></ul>
  51. 51. The Bus Matrix Process Date Product Store Promotion Warehouse Vendor Contract Shipper Retail Sales X X X X Retail Inventory X X X Retail Deliveries X X X Warehouse Inventory X X X X Warehouse Deliveries X X X X Purchase Orders X X X X X X
  52. 52. The Business Model Identify the data structure, attributes and constraints for the client’s data warehousing environment. <ul><li>Stable
  53. 53. Optimized for update
  54. 54. Flexible </li></ul>
  55. 55. Business Model As always in life, there are some disadvantages to 3NF: <ul><li>Performance can be truly awful. Most of the work that is performed on denormalizing a data model is an attempt to reach performance objectives.
  56. 56. The structure can be overwhelmingly complex. We may wind up creating many small relations which the user might think of as a single relation or group of data. </li></ul>
  57. 57. The 4 Step Design Process <ul><li>Choose the Data Mart
  58. 58. Declare the Grain
  59. 59. Choose the Dimensions
  60. 60. Choose the Facts </li></ul>
  61. 61. Building a Data Warehouse from a Normalized Database The steps <ul><li>Develop a normalized entity-relationship business model of the data warehouse.
  62. 62. Translate this into a dimensional model. This step reflects the information and analytical characteristics of the data warehouse.
  63. 63. Translate this into the physical model. This reflects the changes necessary to reach the stated performance objectives. </li></ul>
  64. 64. Structural Dimensions <ul><li>The first step is the development of the structural dimensions. This step corresponds very closely to what we normally do in a relational database.
  65. 65. The star architecture that we will develop here depends upon taking the central intersection entities as the fact tables and building the foreign key => primary key relations as dimensions. </li></ul>
  66. 66. Steps in dimensional modeling <ul><li>Select an associative entity for a fact table
  67. 67. Determine granularity
  68. 68. Replace operational keys with surrogate keys
  69. 69. Promote the keys from all hierarchies to the fact table
  70. 70. Add date dimension
  71. 71. Split all compound attributes
  72. 72. Add necessary categorical dimensions
  73. 73. Fact (varies with time) / Attribute (constant) </li></ul>
  74. 74. The Big Picture OL T P OL A P
  75. 75. Converting an E-R Diagram <ul><li>Determine the purpose of the mart
  76. 76. Identify an association table as the central fact table
  77. 77. Determine facts to be included
  78. 78. Replace all keys with surrogate keys
  79. 79. Promote foreign keys in related tables to the fact table
  80. 80. Add time dimension
  81. 81. Refine the dimension tables </li></ul>
  82. 82. Choosing the Mart <ul><li>A set of related fact and dimension tables
  83. 83. Single source or multiple source
  84. 84. Conformed dimensions
  85. 85. Typically have a fact table for each process </li></ul>
  86. 86. Fact Tables Represent a process or reporting environment that is of value to the organization <ul><li>It is important to determine the identity of the fact table and specify exactly what it represents.
  87. 87. Typically correspond to an associative entity in the E-R model </li></ul>
  88. 88. Grain (unit of analysis) The grain determines what each fact record represents: the level of detail. <ul><li>For example </li><ul><li>Individual transactions
  89. 89. Snapshots (points in time)
  90. 90. Line items on a document </li></ul><li>Generally better to focus on the smallest grain </li></ul>
  91. 91. Facts Measurements associated with fact table records at fact table granularity <ul><li>Normally numeric and additive
  92. 92. Non-key attributes in the fact table </li></ul>Attributes in dimension tables are constants. Facts vary with the granularity of the fact table
  93. 93. Dimensions A table (or hierarchy of tables) connected with the fact table with keys and foreign keys <ul><li>Preferably single valued for each fact record (1:m)
  94. 94. Connected with surrogate (generated) keys, not operational keys
  95. 95. Dimension tables contain text or numeric attributes </li></ul>
  96. 96. ORDER order_num (PK) customer_ID (FK) store_ID (FK) clerk_ID (FK) date STORE store_ID (PK) store_name address district floor_type CLERK clerk_id (PK) clerk_name clerk_grade PRODUCT SKU (PK) description brand category CUSTOMER customer_ID (PK) customer_name purchase_profile credit_profile address PROMOTION promotion_NUM (PK) promotion_name price_type ad_type ORDER-LINE order_num (PK) (FK) SKU (PK) (FK) promotion_key (FK) dollars_sold units_sold dollars_cost ERD
  97. 97. TIME time_key (PK) SQL_date day_of_week month STORE store_key (PK) store_ID store_name address district floor_type CLERK clerk_key (PK) clerk_id clerk_name clerk_grade PRODUCT product_key (PK) SKU description brand category CUSTOMER customer_key (PK) customer_name purchase_profile credit_profile address PROMOTION promotion_key (PK) promotion_name price_type ad_type FACT time_key (FK) store_key (FK) clerk_key (FK) product_key (FK) customer_key (FK) promotion_key (FK) dollars_sold units_sold dollars_cost DIMENSONAL MODEL
  98. 98. Good Attributes <ul><li>Verbose
  99. 99. Descriptive
  100. 100. Complete
  101. 101. Quality assured
  102. 102. Indexed (b-tree vs bitmap)
  103. 103. Equally available
  104. 104. Documented </li></ul>
  105. 105. Date Dimensions
  106. 107. Snowflaking & Hierarchies <ul><li>Efficiency vs Space
  107. 108. Understandability
  108. 109. M:N relationships </li></ul>
  109. 110. Star Schema ProductID TimeID CustomerID SalesAmount factSales ProductID ProductName CategoryName SubCategoryName dimProduct … dimCustomer … dimTime
  110. 111. Snowflake Schema ProductID TimeID CustomerID SalesAmount factSales ProductID CategoryID Description dimProduct CategoryID subCategoryID Description dimCategory SubCategoryID Description dimSubCategory
  111. 112. Simple DW hierarchy pattern.
  112. 113. If you don’t consider changes over time your model will start out like this…
  113. 114. … but ending up like this!
  114. 115. Slowly Changing Dimensions (Addresses, Managers, etc.) <ul><li>Type 1: Store only the current value, overwrite previous value
  115. 116. Type 2: Create a dimension record for each value (with or without date stamps)
  116. 117. Type 3: Create an attribute in the dimension record for previous value </li></ul>
  117. 118. Examples Type 3 Type 2 Type 1 Original Hybrid ProductKey Description Category SKU 21553 LeapPad Education LP2105 ProductKey Description Category SKU 21553 LeapPad Toy LP2105 ProductKey Description Category SKU 21553 LeapPad Education LP2105 44631 LeapPad Toy LP2105 ProductKey Description Category OldCat SKU 21553 LeapPad Toy Education LP2105 ProductKey Description Category OldCat SKU 21335 LeapPad Electronics Education LP2105 44631 LeapPad Electronics Toy LP2105 68122 LeapPad Education Electronics LP2105
  118. 119. Type 1 Slowly Changing Dimension <ul><li>The simplest form
  119. 120. Only updates existing records
  120. 121. Overwrites history </li></ul>
  121. 122. Type 1 Slowly Changing Dimension CustomerID Code Name State Gender 1 K001 Miranda Kerr NSW F CustomerID Code Name State Gender 1 K001 Miranda Kerr VIC F
  122. 123. Type 2 Slowly Changing Dimension <ul><li>Allows the recording of changes of state over time
  123. 124. Generates a new record each time the state changes
  124. 125. Usually requires the use of effective dates when joining to facts. </li></ul>
  125. 126. Type 2 Slowly Changing Dimension 23/2/09 CustomerID Code Name State Gender Start End 1 K001 Miranda Kerr NSW F 1/1/09 <NULL> CustomerID Code Name State Gender Start End 1 K001 Miranda Kerr NSW F 1/1/09 23/2/09 2 K001 Miranda Kerr VIC F 24/2/09 <NULL>
  126. 127. Type 3 Slowly Changing Dimension <ul><li>De-normalized change tracking
  127. 128. Only keeps a limited history
  128. 129. Stores changes in separate columns </li></ul>
  129. 130. Type 3 Slowly Changing Dimension VIC NSW CustomerID Code Name Current State Gender Prev State 1 K001 Miranda Kerr F <NULL>