Advanced Dimensional ModellingSQLBits 8, 9th April 2011, BrightonVincent Rainardivrainardi@gmail.comBlog: dwbi1.wordpress.com
Advanced Dimensional Modelling1. Dimensions - StructureSCD Type 6
1 or 2 Dimensions
When To Snowflake
A Dimension with Only 1 Attribute
Transaction Level Dimension2. Fact TablesFact Table Primary Key
Snapshotting Transaction Fact Tables
Aggregate Fact Tables
Vertical Fact Tables3. Dimensions - BehaviorRapidly Changing Dimension
Very Large Dimensions
Banding Dimension Rows
Stamping Dimension Rows
Dimensions with Multi Valued Attributes4. CombinationsReal Time Fact Table
Dealing with Currency Rates
Dealing with Status
4 sections: 2 dims, 1 fact, 1 combi. Lots of material, may not able to finish.
44 slides, some slides we may have to touch lightly.
Questions between sections, available after.SCD Type 61/2SCD Type 6 is a combination of Type 1, 2 & 3e.g. type 2 + type 1 : DimAccount (telco example)Business/Natural Key6 = 1 + 2 + 3   (Ref: Ross & Kimball 2005, Wikipedia)http://www.rkimball.com/html/articles_search/articles%202005/0503IE.htmlhttp://en.wikipedia.org/wiki/Slowly_changing_dimension
SCD Type 62/2Used for “As Was” reportinge.g. balances by tariff (price plan) at the end of last year,if the customers were on today’s tariff.Fact“Type 12”DimNatural Key
1 or 2 dimensions1/4a) One Dimensionb) Two DimensionsFactTableDimAccountFactTableDimAccountcustomerattributes DimCustomerWe can get the customer attributes without knowing the account key
Disadvantage: can’t go from account to customer without going through the fact table - performance
Simplicity, 1 dim
Hierarchy from customer attribute &account attribute
Use when we don’t have fact tables requiring customer grain.1 or 2 dimensions2/4c) SnowflakeFactTableDimCustomerDimAccountDim customer is needed by another fact table
Modular: 2 separate dim tables but we can combine them easily to create a bigger dimension
To get the breakdown of a measure by a customer attribute is a bit more complicated than a)select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute
1 or 2 dimensions3/4d) Two Dimensions with inter-dimension linkTry to fix weakness on b and c:We can “go” direct from account dim to customer dim
We can access dim customer directly from the fact table.FactTableDimAccountDimCustomerWeakness: maintain customer key in 2 places: fact table and dim account.a.k.a. “Star with a Back Door”
1 or 2 dimensions4/4e) One Dimension with Customer KeyFactTableFactTableTry to fix weakness of a: unable to build a fact table with grain = customer.Add a column in dim account: customer keyDimAccountNot as popular as c) and d) in solving Dim Customer issue. It is “indecisive” :trying to create Dim Customer but doesn’t want to create Dim Customer. Disadvantage: Dim Customer is hidden inside Dim Account, making it:a) more difficult to maintain (especially for a type 2), and b) less modular/flexible
When to Snowflake1/31. When the sub dim is used by several dimsCity-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsuredReplaced by Location/GeoKey pointing to DimLocation / DimGeographyAdvantage: consistent hierarchy, i.e. relationship between City, Country & Region.Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
When to Snowflake2/32. When the sub dim is used by both the main dim and the fact table(s)DimCustomer is used in DimAccount, and is also used in the fact table.
DimManufacturer is used in DimProduct, and is also used in the fact table.
DimProductGroup is used in DimProduct, and is also used in some fact table.The alternative is maintaining two full dimensions (star classic).
When to Snowflake3/33. To make “base dim” and “detail dim”Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicyPut class-specific attributes into DimMarine, DimProperty, DimAviationRef: Kimball DW Toolkit 2nd edition page 2134. To enrich a date attributeMonth, Quarter, Year, etc.Like #1, a sub dim used by several dims.
A dimension with only 1 attribute1/2Should we put the attribute in the fact table? (like DD = Degenerate Dim)Probably, if the grain = fact table,and it’s short or it’s a number.Reasons for putting single attribute in its own dim:Keep fact table slim  (4 bytes int not 100 bytes varchar)When the value changes, we don’t have to update the BIGfact table – ETL performanceGrain is much lower than fact table – small dimYes it’s only 1 attribute today, but in the future there could be another attribute. Could become a junk dim.
A dimension with only 1 attribute2/2Exception: snapshot month (or day/week/quarter)Snapshot month is used in periodic snapshot fact table. Snapshot month is in the form of an integer (201104 for April 2011). Doesn’t violate the 3 points above.It is an integer, not char(6).The value never changes, April 2011 will be April 2011 foreverThere will not be other attributes in the dim
Transaction Level Dimension1/5A dim with grain = the transaction fact tableTransaction, not accumulative or periodic snapshotExamples:IT Helpdesk DW: Dim TicketTelco DW: Dim CallBanking/Asset MgtDW: Dim TradeInsurance DW: Dim PremiumTransactionLevel DimMost granular event in any business process
Transaction Level Dimension2/5Advantages:Query PerformanceDD columns are moved to a dim, away from the heavy traffic in fact tables. DW queries don’t touch those DD columns unless they need to– performance. DD attributes totalling 30 bytes, replaced by 4 bytes int column. Slimmer fact table, better for queries.Periodic Snapshot Fact TableFor periodic snapshot fact table, saving is even greater. Monthly snapshot fact, 10 years / 120 months. Rather than specifying the DDs repeatedly 120x, they are specified once in the transaction dim. All that is left on the fact table is a slim 1 intcol: the transaction key.
Transaction Level Dimension3/5Some fact tables have grains greater than the transactionA payment from a customer is posted into 4 accounts in the GL fact table. That single financial transaction becomes 4 fact rows but only has 1 row in the trans dim. Fact table with 10m rows, trans dim only 3 million rows.Related TransactionsSome transactions are related, e.g. in retail, a purchase of a kitchen might need to be created as 2 related orders, because the worktop is made-to-order. Rather than creating a ‘related order’ column on the fact tables, it might be better (depends on how it’s used) to create it on the trans dim because: a) an order can consist of many fact rows (1 row per item) so the “related order number” will be duplicated across these fact rowsb) slimmer fact tablec) the transaction could be on many fact tables, not only one.
Transaction Level Dimension4/5Disadvantages/not suitable:Transaction fact table and the grain of the trans dim = grain of the fact table, and only 1 DD column: perhaps better leave the DD in the fact table. Not a lot of space/speed gain by putting it on trans dim.
Mart/DW only used for SSAS: there is little point of having trans dim physically. In SSAS we can create the transaction dimension “on the fly” from the fact table (“fact dimension”).
Using trans dim to put attributes as opposed to put them in the main dimensions, with the argument of: that’s the value of the attribute when the transaction happened – this is not right, use type 2 SCD for this.MainAcct typeTransLocation
Transaction Level Dimension5/5Disadvantages/not suitable:Any dim with grain = fact table (like trans dim) is questionableDo we really need this dim at this grain? Perhaps it should be divided into several dims instead?
A dim with grain = fact table - potential performance issue (unless the fact table is small). e.g. fact table = 10m rows, trans dim = 10m rows. Joining 10m to 10m potentially slow, especially if the physical ordering of the trans dim is not the joining column.1. Dimensions - StructureSCD Type 6

Advanced Dimensional Modelling

  • 1.
    Advanced Dimensional ModellingSQLBits8, 9th April 2011, BrightonVincent Rainardivrainardi@gmail.comBlog: dwbi1.wordpress.com
  • 2.
    Advanced Dimensional Modelling1.Dimensions - StructureSCD Type 6
  • 3.
    1 or 2Dimensions
  • 4.
  • 5.
    A Dimension withOnly 1 Attribute
  • 6.
    Transaction Level Dimension2.Fact TablesFact Table Primary Key
  • 7.
  • 8.
  • 9.
    Vertical Fact Tables3.Dimensions - BehaviorRapidly Changing Dimension
  • 10.
  • 11.
  • 12.
  • 13.
    Dimensions with MultiValued Attributes4. CombinationsReal Time Fact Table
  • 14.
  • 15.
  • 16.
    4 sections: 2dims, 1 fact, 1 combi. Lots of material, may not able to finish.
  • 17.
    44 slides, someslides we may have to touch lightly.
  • 18.
    Questions between sections,available after.SCD Type 61/2SCD Type 6 is a combination of Type 1, 2 & 3e.g. type 2 + type 1 : DimAccount (telco example)Business/Natural Key6 = 1 + 2 + 3 (Ref: Ross & Kimball 2005, Wikipedia)http://www.rkimball.com/html/articles_search/articles%202005/0503IE.htmlhttp://en.wikipedia.org/wiki/Slowly_changing_dimension
  • 19.
    SCD Type 62/2Usedfor “As Was” reportinge.g. balances by tariff (price plan) at the end of last year,if the customers were on today’s tariff.Fact“Type 12”DimNatural Key
  • 20.
    1 or 2dimensions1/4a) One Dimensionb) Two DimensionsFactTableDimAccountFactTableDimAccountcustomerattributes DimCustomerWe can get the customer attributes without knowing the account key
  • 21.
    Disadvantage: can’t gofrom account to customer without going through the fact table - performance
  • 22.
  • 23.
    Hierarchy from customerattribute &account attribute
  • 24.
    Use when wedon’t have fact tables requiring customer grain.1 or 2 dimensions2/4c) SnowflakeFactTableDimCustomerDimAccountDim customer is needed by another fact table
  • 25.
    Modular: 2 separatedim tables but we can combine them easily to create a bigger dimension
  • 26.
    To get thebreakdown of a measure by a customer attribute is a bit more complicated than a)select c. attribute, sum(f.measure1) from fact1 finner join dim_account a on f.account_key = a.account_keyinner join dim_customer c on a.customer_key = c.customer_keygroup by c. attribute
  • 27.
    1 or 2dimensions3/4d) Two Dimensions with inter-dimension linkTry to fix weakness on b and c:We can “go” direct from account dim to customer dim
  • 28.
    We can accessdim customer directly from the fact table.FactTableDimAccountDimCustomerWeakness: maintain customer key in 2 places: fact table and dim account.a.k.a. “Star with a Back Door”
  • 29.
    1 or 2dimensions4/4e) One Dimension with Customer KeyFactTableFactTableTry to fix weakness of a: unable to build a fact table with grain = customer.Add a column in dim account: customer keyDimAccountNot as popular as c) and d) in solving Dim Customer issue. It is “indecisive” :trying to create Dim Customer but doesn’t want to create Dim Customer. Disadvantage: Dim Customer is hidden inside Dim Account, making it:a) more difficult to maintain (especially for a type 2), and b) less modular/flexible
  • 30.
    When to Snowflake1/31.When the sub dim is used by several dimsCity-Country-Region columns exist in DimBroker, DimPolicy, DimOffice and DimInsuredReplaced by Location/GeoKey pointing to DimLocation / DimGeographyAdvantage: consistent hierarchy, i.e. relationship between City, Country & Region.Weakness: we would lose flexibility. City to Country are more or less fixed, but the grouping of countries might be different between dimensions.
  • 31.
    When to Snowflake2/32.When the sub dim is used by both the main dim and the fact table(s)DimCustomer is used in DimAccount, and is also used in the fact table.
  • 32.
    DimManufacturer is usedin DimProduct, and is also used in the fact table.
  • 33.
    DimProductGroup is usedin DimProduct, and is also used in some fact table.The alternative is maintaining two full dimensions (star classic).
  • 34.
    When to Snowflake3/33.To make “base dim” and “detail dim”Insurance classes, account types (banking), product lines, diagnosis, treatment (health care)Policies for marine, aviation & property classes have different attributes. Pull common attributes into 1 dim: DimBasePolicyPut class-specific attributes into DimMarine, DimProperty, DimAviationRef: Kimball DW Toolkit 2nd edition page 2134. To enrich a date attributeMonth, Quarter, Year, etc.Like #1, a sub dim used by several dims.
  • 35.
    A dimension withonly 1 attribute1/2Should we put the attribute in the fact table? (like DD = Degenerate Dim)Probably, if the grain = fact table,and it’s short or it’s a number.Reasons for putting single attribute in its own dim:Keep fact table slim  (4 bytes int not 100 bytes varchar)When the value changes, we don’t have to update the BIGfact table – ETL performanceGrain is much lower than fact table – small dimYes it’s only 1 attribute today, but in the future there could be another attribute. Could become a junk dim.
  • 36.
    A dimension withonly 1 attribute2/2Exception: snapshot month (or day/week/quarter)Snapshot month is used in periodic snapshot fact table. Snapshot month is in the form of an integer (201104 for April 2011). Doesn’t violate the 3 points above.It is an integer, not char(6).The value never changes, April 2011 will be April 2011 foreverThere will not be other attributes in the dim
  • 37.
    Transaction Level Dimension1/5Adim with grain = the transaction fact tableTransaction, not accumulative or periodic snapshotExamples:IT Helpdesk DW: Dim TicketTelco DW: Dim CallBanking/Asset MgtDW: Dim TradeInsurance DW: Dim PremiumTransactionLevel DimMost granular event in any business process
  • 38.
    Transaction Level Dimension2/5Advantages:QueryPerformanceDD columns are moved to a dim, away from the heavy traffic in fact tables. DW queries don’t touch those DD columns unless they need to– performance. DD attributes totalling 30 bytes, replaced by 4 bytes int column. Slimmer fact table, better for queries.Periodic Snapshot Fact TableFor periodic snapshot fact table, saving is even greater. Monthly snapshot fact, 10 years / 120 months. Rather than specifying the DDs repeatedly 120x, they are specified once in the transaction dim. All that is left on the fact table is a slim 1 intcol: the transaction key.
  • 39.
    Transaction Level Dimension3/5Somefact tables have grains greater than the transactionA payment from a customer is posted into 4 accounts in the GL fact table. That single financial transaction becomes 4 fact rows but only has 1 row in the trans dim. Fact table with 10m rows, trans dim only 3 million rows.Related TransactionsSome transactions are related, e.g. in retail, a purchase of a kitchen might need to be created as 2 related orders, because the worktop is made-to-order. Rather than creating a ‘related order’ column on the fact tables, it might be better (depends on how it’s used) to create it on the trans dim because: a) an order can consist of many fact rows (1 row per item) so the “related order number” will be duplicated across these fact rowsb) slimmer fact tablec) the transaction could be on many fact tables, not only one.
  • 40.
    Transaction Level Dimension4/5Disadvantages/notsuitable:Transaction fact table and the grain of the trans dim = grain of the fact table, and only 1 DD column: perhaps better leave the DD in the fact table. Not a lot of space/speed gain by putting it on trans dim.
  • 41.
    Mart/DW only usedfor SSAS: there is little point of having trans dim physically. In SSAS we can create the transaction dimension “on the fly” from the fact table (“fact dimension”).
  • 42.
    Using trans dimto put attributes as opposed to put them in the main dimensions, with the argument of: that’s the value of the attribute when the transaction happened – this is not right, use type 2 SCD for this.MainAcct typeTransLocation
  • 43.
    Transaction Level Dimension5/5Disadvantages/notsuitable:Any dim with grain = fact table (like trans dim) is questionableDo we really need this dim at this grain? Perhaps it should be divided into several dims instead?
  • 44.
    A dim withgrain = fact table - potential performance issue (unless the fact table is small). e.g. fact table = 10m rows, trans dim = 10m rows. Joining 10m to 10m potentially slow, especially if the physical ordering of the trans dim is not the joining column.1. Dimensions - StructureSCD Type 6