Recommendation: Include in fact table even though it can be calculated. Eliminates the possibility of user error.
For non-additive measurements such as percentages and ratios (e.g., gross margin) store the numerator (gross profit) and denominator ($ revenue) in the fact table. The ratio can be calculated in a data access tool for any slice of the fact table. Caution : Calculate the ratio of the sums, not the sum of the ratios
Ubiquitous in every data mart
See Figure 2.4, p. 39
Use verbose, self-explanatory values rather than coded values. They are used as column headers in reports. By decoding in the database, we ensure consistency across different application environments.
E.g., Holiday Indicator – use values: Holiday, Nonholiday; as opposed to Y/N
Date Key should be an integer rather than a date data type
Data warehouses need an explicit date dimension table to describe fiscal periods, seasons, holidays, weekends, and other calendar calculations that are not supported by the SQL date function.
If transaction time is of interest, we may need a separate Time Dimension table
Describes every SKU in the store
Fill this dimension with as many descriptive attributes as possible.
Query Promotion Coverage table: products under promotion on given date
From POS Sales Fact table: products sold
Answer is the set difference of above
Degenerate Dimension (DD)
Dimension keys used in fact table without corresponding dimension tables
In case study: POS Transaction #
Still useful for grouping by transaction
Common DDs: order numbers, invoice numbers
Fact table primary key: Product Key and POS Transaction Number
Retail Schema Extensibility
Original schema extends gracefully because POS transaction data was modeled at its most granular level.
Premature aggregation limits ability to extend if new dimensions do not apply to higher grain
Case study new dimensions:
Time of Day
Dimensional models can handle extensions without invalidating existing applications:
New dimension attributes – simply add columns to dimension table. If new attribute is only available after point in time, populate old dimension records with something like “Not Available”
New dimensions – add foreign field keys to fact table
New measured facts – add to fact table. If not at the same grain, then need separate fact table
Dimension becoming more granular – create new dimension. May imply more granular fact table, in which case, may have to rebuild the fact table.
Addition of a completely new data source involving existing and new dimensions – usually needs new fact table
Resisting Dimension Normalization
Snowflaking = Dimension table normalization
Redundant attributes are removed from the denormalized dimension table and are placed in normalized secondary dimension tables
Fully snowflaked schema = 3NF ER diagram
The dimension tables must not be normalized, and should remain as flat tables.
Numerous tables and joins usually translate into slower query performance.
Efforts to normalize any of the tables in a dimensional database solely in order to save disk space are a waste of time. Disk space savings gained by normalizing the dimension tables are typically less than one percent of the total disk space needed for the overall schema.
Normalized dimension tables destroy the ability to browse within a dimension or across dimensions (e.g., list package types for each brand in a category). SQL needed becomes too complex.
The fact table is naturally normalized.
Too Many Dimensions
Too many dimensions increase space requirements for the fact table.
A very large number of dimensions typically means that several dimensions are not completely independent and should be combined.
A single hierarchy should not be captured in separate dimensions.
Surrogate keys are integers assigned sequentially as needed to populate a dimension. They serve to join dimension tables to the fact table.
Avoid embedding intelligence in the data warehouse keys.
Surrogate keys buffer the DW environment from operational changes. What happens when operations decide to recycle account numbers after some period of inactivity? Fine for operational systems, but problematic for DW if it is using account numbers as a PK.
Can more easily integrate data from multiple operational systems, even if they lack consistent source keys.
Performance advantages because small size of surrogate keys leads to smaller fact tables
Surrogate keys are used to support one of the primary techniques for handling changes in dimension table attributes (Chapter 4).