Your SlideShare is downloading. ×
0
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Dimensional modeling primer
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dimensional modeling primer

994

Published on

Introduction to Dimne

Introduction to Dimne

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
994
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
72
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Dimensional Data Modeling – A PrimerTerry Bunio
  • 2. Dimensional Data ModelingA Primer
  • 3. @tbuniotbunio@protegra.combornagainagilist.wordpress.comwww.protegra.com
  • 4. Agenda• Data Modeling• Relational vs Dimensional• Dimensional concepts– Facts– Dimensions• Complex Concept Introduction• Why and How?• My Top 10 Dimensional ModelingRecommendations
  • 5. What is Data Modeling?
  • 6. Definition• “A database model is a specificationdescribing how a database isstructured and used” – Wikipedia
  • 7. Definition• “A database model is a specificationdescribing how a database isstructured and used” – Wikipedia• “A data model describes how thedata entities are related to each otherin the real world” - Terry
  • 8. Data Model Characteristics• Organize/Structure like Data Elements• Define relationships between DataEntities• Highly Cohesive• Loosely Coupled
  • 9. Data Modeling- Chemistry• I like to think about the similaritiesbetween Data Modeling andChemistry
  • 10. Data Modeling- Chemistry• Organize items that share the samecharacteristics• Create standard abstractions torepresent characteristics– Solid– Liquid– Gas
  • 11. Data Modeling- Chemistry• Molecules– Define the relationships between andwithin the standard abstractions– Those relationships form patterns that canbe re-used and describe the behaviour ofthe data in real life
  • 12. Data Modeling- Chemistry• Ultimately this abstraction, structure,and patterns allow for the creation ofmodel that:– Allows for predictability– Maximizes re-use and leverage– Allows for flexibility and adaptability– Describes reality
  • 13. Database Design
  • 14. Two design methods• Relational– “Database normalizationis the process of organizingthe fields and tables of a relational database tominimize redundancy and dependency. Normalizationusually involvesdividinglarge tables into smaller (and lessredundant) tables and defining relationships betweenthem. The objectiveis to isolate data so that additions,deletions, and modifications of a field can be made in justone table and then propagated through the rest of thedatabase via the defined relationships.”.”
  • 15. Two design methods• Dimensional– “Dimensional modeling always uses the concepts of facts(measures), and dimensions (context).Facts are typically(but not always)numeric values that can be aggregated,and dimensions are groups of hierarchies and descriptorsthat define the facts
  • 16. Relational
  • 17. Relational• Relational Analysis– Database design is usually in Third NormalForm– Database is optimized for transactionprocessing. (OLTP)– Normalized tables are optimized formodification rather than retrieval
  • 18. Normal forms• 1st - Under first normal form, all occurrences of arecord type must contain the same number offields.• 2nd - Second normal form is violated when a non-key field is a fact about a subset of a key. It is onlyrelevant when the key is composite• 3rd - Third normal form is violated when a non-keyfield is a fact about another non-key fieldSource: William Kent - 1982
  • 19. Dimensional
  • 20. Dimensional• Dimensional Analysis– Star Schema/Snowflake– Database is optimized for analyticalprocessing. (OLAP)– Facts and Dimensions optimized forretrieval• Facts – Business events – Transactions• Dimensions – context for Transactions– People– Accounts– Products– Date
  • 21. Relational• 3 Dimensions• Spatial Model– No historical components except fortransactional tables• Relational – Models the one truth ofthe data– One account „11‟– One person „Terry Bunio‟– One transaction of „$100.00‟ on April 10th
  • 22. Dimensional• 4 Dimensions• Temporal Model– All tables have a time component• Dimensional – Models the data overtime– Multiple versions of Accounts over time– Multiple versions of people over time– One transaction• Transactions are already temporal
  • 23. Kimball-lytes• Bottom-up - incremental– Operational systems feed the DataWarehouse– Data Warehouse is a corporatedimensional model that Data Marts aresourced from– Data Warehouse is the consolidation ofData Marts– Sometimes the Data Warehouse isgenerated from Subject area Data Marts
  • 24. Inmon-ians• Top-down– Corporate Information Factory– Operational systems feed the DataWarehouse– Enterprise Data Warehouse is a corporaterelational model that Data Marts aresourced from– Enterprise Data Warehouse is the sourceof Data Marts
  • 25. The gist…• Kimball‟s approach is easier toimplement as you are dealing withseparate subject areas, but can be anightmare to integrate• Inmon‟s approach has more upfronteffort to avoid these consistencyproblems, but takes longer toimplement.
  • 26. Facts
  • 27. Fact Tables• Contains the measurements or factsabout a business process• Are thin and deep• Usually is:– Business transaction– Business Event• The grain of a Fact table is the level ofthe data recorded.
  • 28. Fact Tables• Contains the following elements– Primary Key - Surrogate– Timestamp– Measure or Metrics• Transaction Amounts– Foreign Keys to Dimensions– Degenerate Dimensions• Transaction indicators or Flags
  • 29. Fact Tables• Types of Measures– Additive - Measures that can be addedacross any dimensions.• Amounts– Non Additive - Measures that cannot beadded across any dimension.• Rates– Semi Additive - Measures that can beadded across some dimensions.
  • 30. Fact Tables• Types of Fact tables– Transactional - A transactional table is the most basicand fundamental. The grain associated with atransactional fact table is usually specified as "onerow per line in a transaction“.– Periodic snapshots - The periodic snapshot, as thename implies, takes a "picture of the moment", wherethe moment could be any defined period of time.– Accumulating snapshots - This type of fact table isused to show the activity of a process that has a well-defined beginning and end, e.g., the processing ofan order. An order moves through specific steps untilit is fully processed. As steps towards fulfilling the orderare completed, the associated row in the fact table isupdated.
  • 31. Special Fact Tables• Degenerate Dimensions– Degenerate Dimensions are Dimensionsthat can typically provide additionalcontext about a Fact• For example, flags that describe a transaction• Degenerate Dimensions can either bea separate Dimension table or becollapsed onto the Fact table– My preference is the latter
  • 32. Special Fact Tables• If Degenerate Dimensions are notcollapsed on a Fact table, they arecalled Junk Dimensions and remain aDimension table• Junk Dimensions can also haveattributes from different dimensions– Not recommended
  • 33. Dimensions
  • 34. Dimension Tables• Unlike fact tables, dimension tablescontain descriptive attributes that aretypically textual fields• These attributes are designed to servetwo critical purposes:– query constraining and/or filtering– query result set labeling.Source: Wikipedia
  • 35. Dimension Tables• Shallow and Wide• Usually corresponds to entities that thebusiness interacts with– People– Locations– Products– Accounts
  • 36. Time Dimension
  • 37. Time Dimension• All Dimensional Models need a timecomponent• This is either a:– Separate Time Dimension(recommended)– Time attributes on each Fact Table
  • 38. Dimension Tables• Contains the following elements– Primary Key – Surrogate– Business Natural Key• Person ID– Effective and Expiry Dates– Descriptive Attributes• Includes de-normalized reference tables
  • 39. Behavioural Dimensions• A Dimension that is computed basedon Facts is termed a behaviouraldimension
  • 40. Junk Dimensions• A Junk Dimension can be a collectionof attributes associated to a Fact –discussed earlier• It can also be a common location tostore information for convenience– I wouldn‟t recommend this
  • 41. Mini-Dimensions
  • 42. Mini-Dimensions• Splitting a Dimension up due to theactivity of change for a set ofattributes• Helps to reduce the growth of theDimension table
  • 43. Slowly Changing Dimensions• Type 1 – Overwrite the row with thenew values and update the effectivedate– Pre-existing Facts now refer to theupdated Dimension– May cause inconsistent reports
  • 44. Slowly Changing Dimensions• Type 2 – Insert a new Dimension row withthe new data and new effective date– Update the expiry date on the prior row• Don‟t update old Facts that refer to the oldrow– Only new Facts will refer to this new Dimensionrow• Type 2 Slowly Changing Dimensionmaintains the historical context of the data
  • 45. Slowly Changing Dimensions• A type 2 change results in multipledimension rows for a given natural key• A type 2 change results in multipledimension rows for a given natural key• A type 2 change results in multipledimension rows for a given natural key
  • 46. Slowly Changing Dimensions• No longer to I have one row torepresent:– Account 10123– Terry Bunio– Sales Representative 11092• This changes the mindset and querysyntax to retrieve data
  • 47. Slowly Changing Dimensions• Type 3 – The Dimension stores multipleversions for the attribute in question• This usually involves a current andprevious value for the attribute• When a change occurs, no rows areadded but both the current andprevious attributes are updated• Like Type 1, Type 3 does not retain fullhistorical context
  • 48. Slowly Changing Dimensions• You can also create hybrid versions ofType 1, Type 2, and Type 3 based onyour business requirements
  • 49. Type 1/Type 2 Hybrid• Most common hybrid• Used when you need history AND thecurrent name for some types ofstatutory reporting
  • 50. Frozen Attributes• Some times it is required to freezesome attributes so that they are notType 1, Type 2, or Type 3• Usually for audit or regulatoryrequirements
  • 51. Conformity
  • 52. Recall - Kimball-lytes• Bottom-up - incremental– Operational systems feed the DataWarehouse– Data Warehouse is a corporatedimensional model that Data Marts aresourced from– Data Warehouse is the consolidation ofData Marts– Sometimes the Data Warehouse isgenerated from Subject area Data Marts
  • 53. The problem• Kimball‟s approach can led toDimensions that are not conforming• This is due to the fact that separatedepartments define what a client orproduct is– Some times their definitions do not agree
  • 54. Conforming Dimension• A Dimension is said to be conforming if:– A conformed dimension is a set of dataattributes that have been physicallyreferenced in multiple database tables usingthe same key value to refer to the samestructure, attributes, domain values,definitions and concepts. A conformeddimension cuts across many facts.• Dimensions are conformed when theyare either exactly the same (includingkeys) or one is a perfect subset of theother.
  • 55. If you take one thing away• Ensure that your Dimensions areconformed
  • 56. Complexity
  • 57. Complexity• Most textbooks stop here only showthe simplest Dimensional Models• Unfortunately, I‟ve never run into aDimensional Model like that
  • 58. Simple
  • 59. More Complex
  • 60. Real World
  • 61. Complex Concept Introduction• Snowflake vs Star Schema• Multi-Valued Dimensions and Bridges• Multi-Valued Attributes• Factless Facts• Recursive Hierarchies
  • 62. Snowflake vs Star Schema
  • 63. Snowflake vs Star Schema
  • 64. Snowflake vs Star Schema• These extra table are termedoutriggers• They are used to address real worldcomplexities with the data– Excessive row length– Repeating groups of data within theDimension• I will use outriggers in a limited way forrepeating data
  • 65. Multi-Valued Dimensions• Multi-Valued Dimensions are when aFact needs to connect more thanonce to a Dimension– Primary Sales Representative– Secondary Sales Representative
  • 66. Multi-Valued Dimensions• Two possible solutions– Create copies of the Dimensions for eachrole– Create a Bridge table to resolve the manyto many relationship
  • 67. Multi-Valued Dimensions
  • 68. Bridge Tables
  • 69. Bridge Tables• Bridge Tables can be used to resolve anymany to many relationships• This is frequently required with morecomplex data areas• These bridge tables need to beconsidered a Dimension and they needto use the same Slowly ChangingDimension Design as the base Dimension– My Recommendation
  • 70. Multi-Valued Attributes• In some cases, you will need to keepmultiple values for an attribute or setsof attributes• Three solutions– Outriggers or Snowflake (1:M)– Bridge Table (M:M)– Repeat attributes on the Dimension• Simplest solution but can be hard to queryand causes long record length
  • 71. Factless Facts• Fact table with no metrics or measures• Used for two purposes:– Records the occurrence of activities.Although no facts are stored explicitly, theseevents can be counted, producingmeaningful process measurements.– Records significant information that is notpart of a business activity. Examples ofconditions include eligibility of people forprograms and the assignment of SalesRepresentatives to Clients
  • 72. Hierarchies and RecursiveHierarchies
  • 73. Hierarchies and RecursiveHierarchies• We would need a separate session tocover this topic• Solution involves defining Dimensiontables to record the Hierarchy with aspecial solution to address the SlowlyChanging Dimension Hierarchy• Any change in the Hierarchy can resultin needing to duplicate the Hierarchydownstream
  • 74. Why?• Why Dimensional Model?• Allows for a concise representation ofdata for reporting. This is especiallyimportant for Self-Service Reporting– We reduced from 300+ tables in ourOperational Data Store to 40+ tables inour Data Warehouse– Aligns with real world business concepts
  • 75. Why?• The most important reason –– Requires detailed understanding of thedata– Validates the solution– Uncovers inconsistencies and errors in theNormalized Model• Easy for inconsistencies and errors to hide in300+ tables• No place to hide when those tables arereduced down
  • 76. Why?• Ultimately there must be a businessrequirement for a temporal datamodel and not just a spatial one.• Although you could go through theexercise to validate yourunderstanding and not implement theDimensional Data Model
  • 77. How?
  • 78. How?• Start with your simplest Dimension and Facttables and define the Natural Keys for them– i.e. People, Product, Transaction, Time• De-Normalize Reference tables to Dimensions(And possibly Facts based on how large theFact tables will be)– I place both codes and descriptions on theDimension and Fact tables• Look to De-normalize other tables with thesame Cardinality into one Dimension– Validate the Natural Keys still define one row
  • 79. How?• Don‟t force entities on the sameDimension– Tempting but you will find it doesn‟trepresent the data and will cause issuesfor loading or retrieval– Bridge table or mini-snowflakes are notbad• I don‟t like a deep snowflake, but shallowsnowflakes can be appropriate• Don‟t fall into the Star-Schema/Snowflake HolyWar – Let your data define the solution
  • 80. How?• Iterate, Iterate, Iterate– Your initial solution will be wrong– Create it and start to define the loadprocess and reports– You will learn more by using the data thanmonths of analysis to try and get themodel right• Come to SDEC 13 if you want to hearhow our project technically did that– Star Trek Theme
  • 81. Top 10
  • 82. Top 101. Copy the design for the Time Dimensionfrom the Web. Lots of good solutionswith scripts to prepopulate thedimension2. Make all your attributes Not-Null. Thismakes Self-Service Report writing easy3. Create a single Surrogate Primary Keyfor Dimensions – This will help to simplifythe design and table width– These FKs get created on Fact tables !
  • 83. Top 104. Never reject a record– Create an Dummy Invalid record on EachDimension. Allows you to store a Fact recordwhen the relationship is missing5. Choose a Type 2 Slowly ChangingDimension as your default6. Use Effective and Expiry dates on yourDimensions to allow for maximumhistorical information– If they are Type 2!
  • 84. Top 107. SSIS 2012 has some built-infunctionality for processing SlowlyChanging Dimensions – Check it out!8. Add “Current_ind” and “Dummy_ind”attributes to each Dimension to assistin Report writing9. Iterate, Iterate, Iterate10. Read this book
  • 85. Want More?
  • 86. Whew! Questions?

×