The document discusses dimensional modeling techniques for data warehousing and business intelligence. It covers fundamental concepts like gathering requirements, dimensional design processes, and fact and dimension table techniques. Specific topics include grain, surrogate keys, additive/non-additive facts, dimension hierarchies, and consolidating facts from multiple sources. The goal is to deliver data that is understandable, flexible for querying, and provides fast performance.
2. The Data
Warehouse
1. Think Dimensional
2. tasks and tactics of
the dimensional modeling
3. Big Data Analytics
1. Think Dimensional
2. tasks and tactics of
the dimensional modeling
3. Big Data Analytics
4. 1. Think
Dimensional
1. DW/BI : Big Picture
2. Dimensional Modeling
Techniques
3. real cases
1. DW/BI : Big Picture
2. Dimensional Modeling
Techniques
3. real cases
5. 1. DW/BI : Big Picture
Business-driven goals of data warehousing and business
intelligence
Publishing Metaphor for DW/BI Managers
Dimensional Modeling Introduction
1. Think Dimensional
6. Business-driven goals of d...
make information easily accessible
present information consistently
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
make information easily accessible
present information consistently
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
1. Think Dimensional
7. Business-driven goals of d...
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
the authoritative and trustworthy foundation for
improved decision making
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
the authoritative and trustworthy foundation for
improved decision making
1. Think Dimensional
8. Dimensional Modeling Introduction
Fact Tables for Measurements
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Fact Tables for Measurements
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
1. Think Dimensional
9. Dimensional Modeling Introduction
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Dimensional Modeling Myths
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Dimensional Modeling Myths
1. Think Dimensional
10. Fact Tables for Measurements
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
1. Think Dimensional
11. Fact Tables for Measurements
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
1. Think Dimensional
14. Dimension Tables for
Descriptive Context
Dimensions provide the entry points to the data, and the
final labels and groupings on all DW/BI analyses
1. Think Dimensional
15. Dimension Tables for Desc...
Dimension tables are integral companions to a fact table
“who, what, where, when, how, and why” associated with the event
fact table to have 50 to 100 attributes
Dimension tables tend to have fewer rows than fact tables, but can be
wide with
many large text columns.
Dimension tables are integral companions to a fact table
“who, what, where, when, how, and why” associated with the event
fact table to have 50 to 100 attributes
Dimension tables tend to have fewer rows than fact tables, but can be
wide with
many large text columns.
1. Think Dimensional
16. Dimension Tables for Desc...
many large text columns.
This normalization is called snowflaking.
Facts and Dimensions Joined in a Star Schema
many large text columns.
This normalization is called snowflaking.
Facts and Dimensions Joined in a Star Schema
1. Think Dimensional
21. Kimball’s DW/BI Architecture
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
1. Think Dimensional
22. Kimball’s DW/BI Architecture
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
1. Think Dimensional
24. Alternative DW/BI Architectures
Independent Data Mart Architecture
Hub-and-Spoke Corporate Information Factory Inmon
Architecture
Hybrid Hub-and-Spoke and Kimball Architecture
1. Think Dimensional
28. Dimensional Modeling Myths
Myth 1: Dimensional Models are Only
for Summary Data
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
Myth 1: Dimensional Models are Only
for Summary Data
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
1. Think Dimensional
29. Dimensional Modeling Myths
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
for Predictable Usage
Myth 5: Dimensional Models Can’t Be Integrated
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
for Predictable Usage
Myth 5: Dimensional Models Can’t Be Integrated
1. Think Dimensional
30. Myth 4: Dimensional
Models are Only...
The secret to query flexibility is building
act tables at the most granular level
1. Think Dimensional
31. Myth 5: Dimensional
Models Can’t Be Int...
Dimensional modeling concepts link the
business and technical communities together
as they jointly design the DW/BI deliverables.
1. Think Dimensional
42. Fact Table Structure
Fact tables are the primary target of computations and
dynamic aggregations arising from queries.
A fact table contains the numeric measures produced by
an operational measurement event in the real world
1. Think Dimensional
43. Additive, Semi-Additive,
...
additive measures can be summed across any of the dimensions
associated with the fact table.
Semi-additive measures can be summed across some dimensions, but not
all; balance amounts are common semi-additive facts because they are
additive across all dimensions except time.
some measures are completely non-additive, such as ratios.
1. Think Dimensional
44. Nulls in Fact Tables
nulls must be avoided in the fact table’s foreign keys nulls
would automatically cause a referential integrity violation
( SUM , COUNT , MIN , MAX , and AVG ) all do the “right
thing” with null facts
1. Think Dimensional
45. Conformed Facts
be disciplined in your data naming practices.
If it is impossible to conform a fact exactly, you should
give different names to the different interpretations so t...
1. Think Dimensional
46. If it is impossible to conform a fact exactly,
you should give different names to the
different interpretations so that business
users do not combine these incompatible
facts in calculations.
1. Think Dimensional
47. Transaction Fact Tables
A row in a transaction fact table corresponds to a
measurement event at a point in space and time.
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
A row in a transaction fact table corresponds to a
measurement event at a point in space and time.
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
1. Think Dimensional
48. Transaction Fact Tables
m u m t t t po t p d t m
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
rows exist only if measurements take place.
m u m t t t po t p d t m
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
rows exist only if measurements take place.
1. Think Dimensional
49. Periodic Snapshot Fact Tables
summarizes many measurement events occurring over a
standard period, such as a day, a week, or a month.
The grain is the period, not the individual transaction
1. Think Dimensional
50. Accumulating Snapshot Fa...
summarizes the measurement events occurring at
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
summarizes the measurement events occurring at
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
1. Think Dimensional
51. Accumulating Snapshot Fa...
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
critical process step, accumulating snapshot fact tables ...
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
critical process step, accumulating snapshot fact tables ...
1. Think Dimensional
52. initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact
table row is revisited and updated. This consistent
updating of accumulating snapshot fact rows is
unique among the three types of fact tables.
1. Think Dimensional
53. In addition to the date foreign keys
associated with each critical process step,
accumulating snapshot fact tables contain
foreign keys for other dimensions and
optionally contain degenerate dimensions.
1. Think Dimensional
54. Factless Fact Tables
event of a student attending a class on a given day may
not have a recorded numeric fact, but a fact row with f...
be used to analyze what didn’t happen.
1. Think Dimensional
55. event of a student attending a class on a
given day may not have a recorded numeric
fact, but a fact row with foreign keys for
calendar day, student, teacher, location,
and class is well-defi ned. Likewise
1. Think Dimensional
56. be used to analyze what d...
factless coverage table that contains all the possibilities
of events that might happen
activity table that contains the events that did happen
1. Think Dimensional
58. When the activity
is subtracted from the coverage,
the result is the set of events that
did not happen.
1. Think Dimensional
59. Aggregate Fact Tables
or OLAP Cubes
Aggregate fact tables are simple numeric
rollups of atomic fact table data built solely
to accelerate query performance.
1. Think Dimensional
60. Consolidated Fact Tables
It is often convenient to combine facts from multiple
processes together into a single...
For example, sales actuals can be consolidated with sales
forecasts in a single fact table to make the task of anal...
1. Think Dimensional
61. It is often convenient to combine facts
from multiple processes together into a
single
consolidated fact table if they can be
expressed at the same grain.
1. Think Dimensional
62. For example, sales actuals can be consolidated
with sales forecasts in a single fact table to make
the task of analyzing actuals versus forecasts
simple and fast, as compared to assembling a drill
across application using separate fact tables.
1. Think Dimensional
63. Basic Dimension Table Techniques
Dimension Table Structure
Dimension Surrogate Keys
Natural, Durable, and Supernatural Keys
Drilling Down
drill-up
Dimension Table Structure
Dimension Surrogate Keys
Natural, Durable, and Supernatural Keys
Drilling Down
drill-up
1. Think Dimensional
67. Dimension Table
Structure
Dimension table attributes are the primary
target of constraints and grouping
specifications from queries and BI applications.
1. Think Dimensional
68. Dimension Surrogate Keys
A dimension table is designed with one column serving as a unique
primary key.
This primary key cannot be the operational system’s natural
natural keys for a dimension may be created by more than one source
system, and these natural keys may be incompatible or poorly
administered
1. Think Dimensional
69. Natural, Durable, and Sup...
Surrogates keys have no business meaning.
A value that represents a real world object.
The best durable keys have a format that is independent of the original
business process and thus should be simple integers assigned in sequence
beginning with 1 .
1. Think Dimensional
70. Drilling Down
adding a row header to an existing query the new row header is a
dimension attribute appended to the GROUP BY expression in an SQL
query.
1. Think Dimensional
77. Denormalized
Flattened Dimensions
Keeping the repeated low cardinality values in the primary
dimension table is a fundamental dimensional modeling technique.
Normalizing these values into separate tables defeats the primary
goals of simplicity and performance
1. Think Dimensional
78. Resisting Normalization Urges
Snowflake Schemas with Normalized Dimensions
Outriggers
Centipede Fact Tables with Too Many Dimensions
1. Think Dimensional
79. Snowflake Schemas with ...
Snowflaking negatively impacts the users’ ability to browse within a
dimension
Most database optimizers also struggle with the snowflaked schema’s
complexity. Numerous tables and joins usually translate into slower query
performance.
1. Think Dimensional
81. Outriggers
Fixed depth hierarchies should be flattened in dimension
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
Fixed depth hierarchies should be flattened in dimension
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
1. Think Dimensional
82. Outriggers
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
You should knowingly sacrifice this dimension table space
in the spirit of performance and ease of use advantages
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
You should knowingly sacrifice this dimension table space
in the spirit of performance and ease of use advantages
1. Think Dimensional
84. Centipede Fact Tables wit...
A very large number of dimensions typically are a sign that several
dimensions are not completely independent and should be combined into
a single dimension.
It is a dimensional modeling mistake to represent elements of a single
hierarchy as separate dimensions in the fact table.
1. Think Dimensional
92. 3. Big Data
Analytics
Management Best
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Management Best
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
93. 3. Big Data
Analytics
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Data Governance Best
Practices for Big Data
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Data Governance Best
Practices for Big Data
94. Management Best Practic...
Delay Building Legacy Environments
Build From Sandbox Results
Try Simple Applications First
3. Big Data Analytics
95. Architecture Best Practice...
Plan a Data Highway
Build a Fact Extractor from Big Data
Build Comprehensive Ecosystems
Plan for Data Quality
Add Value to Data as Soon as Possible
Plan a Data Highway
Build a Fact Extractor from Big Data
Build Comprehensive Ecosystems
Plan for Data Quality
Add Value to Data as Soon as Possible
3. Big Data Analytics
96. Architecture Best Practice...
Add Value to Data as Soon as Possible
Implement Backflow to Earlier Caches
Implement Streaming Data
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Add Value to Data as Soon as Possible
Implement Backflow to Earlier Caches
Implement Streaming Data
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
3. Big Data Analytics
97. Architecture Best Practice...
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Strive for Performance Improvements
Monitor Compute Resources
Exploit In-Database Analytics
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Strive for Performance Improvements
Monitor Compute Resources
Exploit In-Database Analytics
3. Big Data Analytics
98. Data Modeling Best Practi...
Think Dimensionally
Integrate Separate Data Sources with Conformed
Dimensions
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Think Dimensionally
Integrate Separate Data Sources with Conformed
Dimensions
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
3. Big Data Analytics
99. Data Modeling Best Practi...
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Use Slowly Changing Dimensions
Declare Data Structure at Analysis Time
Rapidly Prototype Using Data Virtualization
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Use Slowly Changing Dimensions
Declare Data Structure at Analysis Time
Rapidly Prototype Using Data Virtualization
3. Big Data Analytics
100. Data Governance Best Pra...
There is No Such Thing as Big Data Governance
Dimensionalize the Data before Applying Governance
Privacy is the Most Important Governance Perspective
Don’t Choose Big Data over Governance
3. Big Data Analytics