Business Information Systems

Dimensional Analysis
Prithwis Mukerjee, Ph.D.
Dimensional Models
A denormalized relational model


Made up of tables with attributes



Relationships defined by keys and foreign keys

Organized for understandability and ease of reporting
rather than update
Queried and maintained by SQL or special purpose
management tools.

2
From Relational to Dimensional
Relational Model


Designed from the
perspective of process
efficiency

Dimensional Model


 Sales

 Marketing
 Sales




 Customers


“Normalised” data
structures
 Entity Relationship Model



Used for transactional, or
operational systems



Based on data that is
 Current
 Non Redundant

“De-normalised” data
structures in blatant
violation of normalisation
Used for analysis of
aggregated data
 OLAP : OnLine Analytical

Processing

 OLTP : OnLine Transaction

Processing

Designed from the
perspective of subject



Based on data that is
 Historical
 May be redundant

3
ER vs. Dimensional Models
One table per entity
Minimize data
redundancy
Optimize update
The Transaction
Processing Model

One fact table for
data organization
Maximize
understandability
Optimized for
retrieval
The data
warehousing model

4
Strengths of the Dimensional Model
Predictable, standard
framework
Respond well to changes in
user reporting needs
Relatively easy to add data
without reloading tables
Standard design approaches
have been developed
There exist a number of
products supporting the
dimensional model

“The Data Warehouse
Toolkit” by Ralph
Kimball & Margy Ross
“The Data Warehouse
Lifecycle Toolkit” by
Ralph Kimball & Margy
Ross

5
A Transactional Database
Countries
Addresses
Customers
CustomerID
AddressID
Name

States

CountryID

AddressID

StateID

Description

StateID

CountryID

Street

Desc

OrderHeader
OrderHeaderID
CustomerID
OrderDate
FreightAmount

Products
OrderDetails

ProductID

OrderHeaderID

Description

ProductID

Size

Amount

6
A Dimensional Model
Customers
CustomerID

Time

Name

TimeID

Street

Date

FactSales

Month

CustomerID

Quarter

ProductID

Year

TimeID

Products

SalesAmount

ProductID

State
Country

Description
Size
Subcategory
Category

7
Extract Transform Load

Relational

Dimensional Model

Process Oriented

Subject Oriented

Transactional

Aggregate

Current

Historic

8
Facts & Dimensions
• There are two main types of objects in a dimensional
model
– Facts are quantitative measures that we wish to analyse and
report on.
– Dimensions contain textual descriptors of the business. They
provide context for the facts.

9
Fact & Dimension Tables
FACTS

DIMENSIONS

Contains two or more
foreign keys

Contain text and
descriptive information

Tend to have huge
numbers of records

1 in a 1-M relationship

Useful facts tend to be
numeric and additive

Generally the source of
interesting constraints
Typically contain the
attributes for the SQL
answer set.

10
Fact Table
Measurements associated with a specific business
process
Grain: level of detail of the table
Process events produce fact records
Facts (attributes) are usually



Numeric
Additive

Derived facts included
Foreign (surrogate) keys refer to dimension tables
(entities)
Classification values help define subsets

11
Dimension Tables
Entities describing the objects of the process
Conformed dimensions cross processes
Attributes are descriptive



Text
Numeric

Surrogate keys
Less volatile than facts (1:m with the fact table)
Null entries
Date dimensions
Produce “by” questions

12
The Bus Matrix

Date

Product

Store

Promotion

Warehouse

Vendor

Retail Sales

X

X

X

X

Retail Inventory

X

X

X

Retail
Deliveries

X

X

X

Warehouse
Inventory

X

X

X

Warehouse
Deliveries

X

X

X

X

Purchase Orders

X

X

X

X

X

Contract

Shipper

X

X

Process

13
Business Model
As always in life, there are some disadvantages
to 3NF:
Performance can be truly awful. Most of the
work that is performed on denormalizing a data
model is an attempt to reach performance
objectives.
The structure can be overwhelmingly complex.
We may wind up creating many small relations
which the user might think of as a single
relation or group of data.

14
The 4 Step Design Process
Choose the Data Mart
Declare the Grain
Choose the Dimensions
Choose the Facts

15
Structural Dimensions
The first step is the development of the
structural dimensions. This step corresponds
very closely to what we normally do in a
relational database.
The star architecture that we will develop here
depends upon taking the central intersection
entities as the fact tables and building the
foreign key => primary key relations as
dimensions.

16
Steps in dimensional modeling
Select an associative entity for a fact table
Determine granularity
Replace operational keys with surrogate keys
Promote the keys from all hierarchies to the fact table
Add date dimension
Split all compound attributes
Add necessary categorical dimensions
Fact (varies with time) / Attribute (constant)

17
The Big Picture

Customer ID
Cust Name
Cust Address

Order ID
Customer ID (FK)
Date

Order ID (FK)
Item ID
Product ID (FK)
Quantity
Value

Product ID
Product Name
Product Desc
Unit Price

OLTP
OLAP

Customer ID
Cust Name
Cust Address

Transaction ID
Product ID (FK)
Client ID (FK)
Date
Quantity
Value

Product ID
Product Name
Product Desc
Unit Price

18
Converting an E-R Diagram
Determine the purpose of the mart
Identify an association table as the central fact
table
Determine facts to be included
Replace all keys with surrogate keys
Promote foreign keys in related tables to the
fact table
Add time dimension
Refine the dimension tables

19
Fact Tables
Represent a process or reporting environment that is of
value to the organization
It is important to determine the identity of the fact table
and specify exactly what it represents.
Typically correspond to an associative entity in the E-R
model

20
Grain (unit of analysis)
The grain determines what each fact record represents:
the level of detail.
For example


Individual transactions



Snapshots (points in time)



Line items on a document

Generally better to focus on the smallest grain

21
Facts
Measurements associated with fact table records at fact
table granularity
Normally numeric and additive
Non-key attributes in the fact table
Attributes in dimension tables are constants. Facts vary with
the granularity of the fact table

22
Dimensions
A table (or hierarchy of tables) connected with the
fact table with keys and foreign keys
Preferably single valued for each fact record
(1:m)
Connected with surrogate (generated) keys, not
operational keys
Dimension tables contain text or numeric
attributes

23
CUSTOMER
customer_ID (PK)
customer_name
purchase_profile
credit_profile
address

STORE
store_ID (PK)
store_name
address
district
floor_type
CLERK
clerk_id (PK)
clerk_name
clerk_grade

ERD

ORDER
order_num (PK)
customer_ID (FK)
store_ID (FK)
clerk_ID (FK)
date

PRODUCT
SKU (PK)
description
brand
category

ORDER-LINE
order_num (PK) (FK)
SKU (PK) (FK)
promotion_key (FK)
dollars_sold
units_sold
dollars_cost

PROMOTION
promotion_NUM (PK)
promotion_name
price_type
ad_type

24
TIME
time_key (PK)
SQL_date
day_of_week
month
STORE
store_key (PK)
store_ID
store_name
address
district
floor_type

CLERK
clerk_key (PK)
clerk_id
clerk_name
clerk_grade

DIMENSONAL
MODEL
FACT
time_key (FK)
store_key (FK)
clerk_key (FK)
product_key (FK)
customer_key (FK)
promotion_key (FK)
dollars_sold
units_sold
dollars_cost

PRODUCT
product_key (PK)
SKU
description
brand
category
CUSTOMER
customer_key (PK)
customer_name
purchase_profile
credit_profile
address

PROMOTION
promotion_key (PK)
promotion_name
price_type
ad_type

25
Date Dimensions
Fiscal Year

Calendar Year

Fiscal Quarter

Calendar
Quarter

Fiscal Month

Calendar
Month

Fiscal Week

Calendar
Week

Type of Day

Day of Week

Day

Holiday

26
Attribute Name
Attribute Description
Day
The specific day that an activity took
place.
Day of Week
The specific name of the day.
Holiday
Identifies that this day is a holiday.
Type of Day
Indicates whether or not this day is
a weekday or a weekend day.
Calendar Week
The week ending date, always a
Saturday. Note that WE denotes
Calendar Month
The calendar month.
Calendar Quarter
Calendar Year
Fiscal Week
Fiscal Month
Fiscal Quarter
Fiscal Year

Sample Values
06/04/1998; 06/05/1998
Monday; Tuesday
Easter; Thanksgiving
Weekend; Weekday

WE 06/06/1998;
WE 06/13/1998
January,1998; February,
1998
The calendar quarter.
1998Q1; 1998Q4
The calendar year.
1998
The week that represents the
F Week 1 1998;
corporate calendar. Note that the F F Week 46 1998
The fiscal period comprised of 4 or 5 F January, 1998;
weeks. Note that the F in the data
F February, 1998
The grouping of 3 fiscal months.
F 1998Q1; F1998Q2
The grouping of 52 fiscal weeks / 12 F 1998; F 1999
fiscal months that comprise the
financial year.
27
Snowflaking & Hierarchies
Efficiency vs Space
Understandability
M:N relationships

28
Star Schema

dimTime

dimProduct

…
factSales

dimCustomer

ProductID
ProductName
CategoryName
SubCategoryName

ProductID
TimeID
CustomerID
SalesAmount

…

29
Snowflake Schema

dimSubCategory
SubCategoryID
Description
dimCategory
CategoryID
subCategoryID
Description
factSales
ProductID
TimeID
CustomerID
SalesAmount

dimProduct
ProductID
CategoryID
Description

30
Slowly Changing Dimensions
(Addresses, Managers, etc.)
Type 1: Store only the current value, overwrite
previous value
Type 2: Create a dimension record for each value (with
or without date stamps)
Type 3: Create an attribute in the dimension record for
previous value

31
Examples
Original

SKU

LeapPad

Education

LP2105

ProductKey

Description

Category

SKU

LeapPad

Toy

LP2105

ProductKey

Description

Category

SKU

21553

LeapPad

Education

LP2105

44631

LeapPad

Toy

LP2105

ProductKey

Description

Category

OldCat

SKU

21553

Type 3

Category

21553

Type 2

Description

21553

Type 1

ProductKey

LeapPad

Toy

Education

LP2105

ProductKey

Description

Category

OldCat

SKU

21335

LeapPad

Electronics

Education

LP2105

44631

LeapPad

Electronics

Toy

LP2105

68122

LeapPad

Education

Electronics

LP2105

Hybrid

32
Type 1 Slowly Changing Dimension
The simplest form
Only updates existing records
Overwrites history

33
Type 1 Slowly Changing Dimension

CustomerID

Code

Name

State Gender

1

K001

Miranda Kerr

VIC
NSW

F

34
Type 2 Slowly Changing Dimension
Allows the recording of changes of state over time
Generates a new record each time the state changes
Usually requires the use of effective dates when joining
to facts.

35
Type 2 Slowly Changing Dimension

CustomerID

Code

Name

State Gender Start

End

1

K001

Miranda Kerr NSW

F

1/1/09

23/2/09
<NULL>

2

K001

Miranda Kerr VIC

F

24/2/09

<NULL>

36
Type 3 Slowly Changing Dimension
De-normalized change tracking
Only keeps a limited history
Stores changes in separate columns

37
Type 3 Slowly Changing Dimension

CustomerID Code Name
1

K001

Miranda Kerr

Current Gender Prev
State
State
NSW
F
<NULL>
VIC

38

04 Dimensional Analysis - v6

Editor's Notes

  • #7 A simplistic transactional schema showing 7 tables relating to sales orders
  • #8 This is a star schema, (later on we will discuss snowflake schemas.) showing 4 tables that relate to the previous transactional schema State and Country have been denormalized under Customer Dimensions are in Blue These are the things that we analyse “by” (eg. By Time, By Customer, By Region) Fact is yellow These are ususally quantitative things that we are interested in
  • #9 We already have the data in a data model – why create another data model…? Well… What is currently called “Data Warehousing” or “Business Intelligence” was originally often called “Decision Support Systems” We already have all the data in the OLTP system, why replicate it in a dimensional model? Atomic - Summary Supports Transaction throughput – Supports Aggregate queries Current - Historic
  • #10 Facts work best if they are additive Dimensions allow us to “slice &amp; dice” the facts into meaningful groups. The provide context
  • #35 There are some changes where it is valid to overwrite history. When someone gets married and changes their name, they may want to carry the history of their previous purchases over to their new name rather than see a split history.
  • #37 This makes inserts into your fact table more expensive as you always need to match on the effective dates as well as the business key. Sometimes people kept a “Current” flag. Another approach rather than putting nulls in the End date is to put an arbitrary date well in the future, this can make the join logic a bit simpler.
  • #39 This type of change tracking is more useful when there is a once off change like a change in sales regions where you want to see history re-cast into the new regions, but may also want to compare the old and new regions.