The Data Warehouse
The Data
Warehouse
1. Think Dimensional
2. tasks and tactics of
the dimensional modeling
3. Big Data Analytics
1. Think Dimensional
2. tasks and tactics of
the dimensional modeling
3. Big Data Analytics
1. Think Dimensional
1. Think
Dimensional
1. DW/BI : Big Picture
2. Dimensional Modeling
Techniques
3. real cases
1. DW/BI : Big Picture
2. Dimensional Modeling
Techniques
3. real cases
1. DW/BI : Big Picture
Business-driven goals of data warehousing and business
intelligence
Publishing Metaphor for DW/BI Managers
Dimensional Modeling Introduction
1. Think Dimensional
Business-driven goals of d...
make information easily accessible
present information consistently
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
make information easily accessible
present information consistently
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
1. Think Dimensional
Business-driven goals of d...
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
the authoritative and trustworthy foundation for
improved decision making
must adapt to change
present information in a timely way
be a secure bastion that protects the information assets
the authoritative and trustworthy foundation for
improved decision making
1. Think Dimensional
Dimensional Modeling Introduction
Fact Tables for Measurements
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Fact Tables for Measurements
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
1. Think Dimensional
Dimensional Modeling Introduction
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Dimensional Modeling Myths
Star Schemas Versus OLAP Cubes
Dimension Tables for Descriptive Context
Kimball’s DW/BI Architecture
Alternative DW/BI Architectures
Dimensional Modeling Myths
1. Think Dimensional
Fact Tables for Measurements
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
1. Think Dimensional
Fact Tables for Measurements
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
The most useful facts are numeric and additive
event in the physical world has a one-to-one relationship
to a single row in the corresponding fact table
Fact tables express many-to-many relationships. All
others are dimension tables.
1. Think Dimensional
1. Think Dimensional
Star
Schemas
Versus
OLAP Cu...
Dimension Tables for
Descriptive Context
Dimensions provide the entry points to the data, and the
final labels and groupings on all DW/BI analyses
1. Think Dimensional
Dimension Tables for Desc...
Dimension tables are integral companions to a fact table
“who, what, where, when, how, and why” associated with the event
fact table to have 50 to 100 attributes
Dimension tables tend to have fewer rows than fact tables, but can be
wide with
many large text columns.
Dimension tables are integral companions to a fact table
“who, what, where, when, how, and why” associated with the event
fact table to have 50 to 100 attributes
Dimension tables tend to have fewer rows than fact tables, but can be
wide with
many large text columns.
1. Think Dimensional
Dimension Tables for Desc...
many large text columns.
This normalization is called snowflaking.
Facts and Dimensions Joined in a Star Schema
many large text columns.
This normalization is called snowflaking.
Facts and Dimensions Joined in a Star Schema
1. Think Dimensional
1. Think Dimensional
This normalization is called snowflaking.
This normalization is called snowflaking.
1. Think Dimensional
This normalization is called snowflaking.
This normalization is called snowflaking.
1. Think Dimensional
Facts and Dimensions Joined ...
Kimball’s DW/BI Architecture
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
1. Think Dimensional
Kimball’s DW/BI Architecture
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
Operational Source Systems
Extract, Transformation, and Load System
Presentation Area to Support Business Intelligence
Business Intelligence Applications
1. Think Dimensional
1. Think Dimensional
Alternative DW/BI Architectures
Independent Data Mart Architecture
Hub-and-Spoke Corporate Information Factory Inmon
Architecture
Hybrid Hub-and-Spoke and Kimball Architecture
1. Think Dimensional
Independe
nt Data
Mart
Architect...
Hub-and-
Spoke
Corporate
Informati...
Hybrid
Hub-and-
Spoke and
Kimball ...
Dimensional Modeling Myths
Myth 1: Dimensional Models are Only
for Summary Data
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
Myth 1: Dimensional Models are Only
for Summary Data
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
1. Think Dimensional
Dimensional Modeling Myths
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
for Predictable Usage
Myth 5: Dimensional Models Can’t Be Integrated
Myth 2: Dimensional Models are Departmental,Not
Enterprise
Myth 3: Dimensional Models are Not Scalable
Myth 4: Dimensional Models are Only
for Predictable Usage
Myth 5: Dimensional Models Can’t Be Integrated
1. Think Dimensional
Myth 4: Dimensional
Models are Only...
The secret to query flexibility is building
act tables at the most granular level
1. Think Dimensional
Myth 5: Dimensional
Models Can’t Be Int...
Dimensional modeling concepts link the
business and technical communities together
as they jointly design the DW/BI deliverables.
1. Think Dimensional
Deliver data
that’s
understandable
to the business
users.Deliver ...
Dimensional Model...
1. Think Dimensional
Deliver data that’s understandable
to the business users.Deliver fast
query performance.
1. Think Dimensional
2. Dimensional Modeling T...
Fundamental Concepts
Resisting Normalization Urges
1. Think Dimensional
Fundamental Concepts
Gather Business Requirements and Data Realities
Collaborative Dimensional Modeling Workshops
Four-Step Dimensional Design Process
Basic Fact Table Techniques
Basic Dimension Table Techniques
Gather Business Requirements and Data Realities
Collaborative Dimensional Modeling Workshops
Four-Step Dimensional Design Process
Basic Fact Table Techniques
Basic Dimension Table Techniques
1. Think Dimensional
Fundamental Concepts
Gather Business Requirements and Data Realities
Collaborative Dimensional Modeling Workshops
Four-Step Dimensional Design Process
Basic Fact Table Techniques
Basic Dimension Table Techniques
Gather Business Requirements and Data Realities
Collaborative Dimensional Modeling Workshops
Four-Step Dimensional Design Process
Basic Fact Table Techniques
Basic Dimension Table Techniques
1. Think Dimensional
Four-Step Dimensional De...
Select the business process.
Declare the grain.
Identify the dimensions.
Identify the facts.
1. Think Dimensional
Declare the grain.
Atomic grain refers to the lowest
level at which data is captured
by a given business process.
1. Think Dimensional
Basic Fact Table Techniques
Fact Table Structure
Additive, Semi-Additive, Non-Additive
Facts
Nulls in Fact Tables
Conformed Facts
Transaction Fact Tables
Fact Table Structure
Additive, Semi-Additive, Non-Additive
Facts
Nulls in Fact Tables
Conformed Facts
Transaction Fact Tables
1. Think Dimensional
Basic Fact Table Techniques
Transaction Fact Tables
Periodic Snapshot Fact Tables
Accumulating Snapshot Fact Tables
Factless Fact Tables
Aggregate Fact Tables or OLAP Cubes
Transaction Fact Tables
Periodic Snapshot Fact Tables
Accumulating Snapshot Fact Tables
Factless Fact Tables
Aggregate Fact Tables or OLAP Cubes
1. Think Dimensional
Basic Fact Table Techniques
Periodic Snapshot Fact Tables
Accumulating Snapshot Fact Tables
Factless Fact Tables
Aggregate Fact Tables or OLAP Cubes
Consolidated Fact Tables
Periodic Snapshot Fact Tables
Accumulating Snapshot Fact Tables
Factless Fact Tables
Aggregate Fact Tables or OLAP Cubes
Consolidated Fact Tables
1. Think Dimensional
Fact Table Structure
Fact tables are the primary target of computations and
dynamic aggregations arising from queries.
A fact table contains the numeric measures produced by
an operational measurement event in the real world
1. Think Dimensional
Additive, Semi-Additive,
...
additive measures can be summed across any of the dimensions
associated with the fact table.
Semi-additive measures can be summed across some dimensions, but not
all; balance amounts are common semi-additive facts because they are
additive across all dimensions except time.
some measures are completely non-additive, such as ratios.
1. Think Dimensional
Nulls in Fact Tables
nulls must be avoided in the fact table’s foreign keys nulls
would automatically cause a referential integrity violation
( SUM , COUNT , MIN , MAX , and AVG ) all do the “right
thing” with null facts
1. Think Dimensional
Conformed Facts
be disciplined in your data naming practices.
If it is impossible to conform a fact exactly, you should
give different names to the different interpretations so t...
1. Think Dimensional
If it is impossible to conform a fact exactly,
you should give different names to the
different interpretations so that business
users do not combine these incompatible
facts in calculations.
1. Think Dimensional
Transaction Fact Tables
A row in a transaction fact table corresponds to a
measurement event at a point in space and time.
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
A row in a transaction fact table corresponds to a
measurement event at a point in space and time.
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
1. Think Dimensional
Transaction Fact Tables
m u m t t t po t p d t m
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
rows exist only if measurements take place.
m u m t t t po t p d t m
Atomic transaction grain fact tables are the most
dimensional and expressive fact tables;
enables the maximum slicing and dicing of transaction
data
rows exist only if measurements take place.
1. Think Dimensional
Periodic Snapshot Fact Tables
summarizes many measurement events occurring over a
standard period, such as a day, a week, or a month.
The grain is the period, not the individual transaction
1. Think Dimensional
Accumulating Snapshot Fa...
summarizes the measurement events occurring at
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
summarizes the measurement events occurring at
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
1. Think Dimensional
Accumulating Snapshot Fa...
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
critical process step, accumulating snapshot fact tables ...
predictable steps between the beginning and the end of a
process.
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact table ro...
In addition to the date foreign keys associated with each
critical process step, accumulating snapshot fact tables ...
1. Think Dimensional
initially inserted when the order line is created. As
pipeline progress occurs, the accumulating fact
table row is revisited and updated. This consistent
updating of accumulating snapshot fact rows is
unique among the three types of fact tables.
1. Think Dimensional
In addition to the date foreign keys
associated with each critical process step,
accumulating snapshot fact tables contain
foreign keys for other dimensions and
optionally contain degenerate dimensions.
1. Think Dimensional
Factless Fact Tables
event of a student attending a class on a given day may
not have a recorded numeric fact, but a fact row with f...
be used to analyze what didn’t happen.
1. Think Dimensional
event of a student attending a class on a
given day may not have a recorded numeric
fact, but a fact row with foreign keys for
calendar day, student, teacher, location,
and class is well-defi ned. Likewise
1. Think Dimensional
be used to analyze what d...
factless coverage table that contains all the possibilities
of events that might happen
activity table that contains the events that did happen
1. Think Dimensional
When the
activity
is subtracted
from the
coverage, the...
factless coverage tabl...
activity table that con...
1. Think Dimensional
When the activity
is subtracted from the coverage,
the result is the set of events that
did not happen.
1. Think Dimensional
Aggregate Fact Tables
or OLAP Cubes
Aggregate fact tables are simple numeric
rollups of atomic fact table data built solely
to accelerate query performance.
1. Think Dimensional
Consolidated Fact Tables
It is often convenient to combine facts from multiple
processes together into a single...
For example, sales actuals can be consolidated with sales
forecasts in a single fact table to make the task of anal...
1. Think Dimensional
It is often convenient to combine facts
from multiple processes together into a
single
consolidated fact table if they can be
expressed at the same grain.
1. Think Dimensional
For example, sales actuals can be consolidated
with sales forecasts in a single fact table to make
the task of analyzing actuals versus forecasts
simple and fast, as compared to assembling a drill
across application using separate fact tables.
1. Think Dimensional
Basic Dimension Table Techniques
Dimension Table Structure
Dimension Surrogate Keys
Natural, Durable, and Supernatural Keys
Drilling Down
drill-up
Dimension Table Structure
Dimension Surrogate Keys
Natural, Durable, and Supernatural Keys
Drilling Down
drill-up
1. Think Dimensional
Basic Dimension Table Techniques
drill-up
Slice
Dice
Pivot
Degenerate Dimensions
drill-up
Slice
Dice
Pivot
Degenerate Dimensions
1. Think Dimensional
Basic Dimension Table Techniques
g
Denormalized Flattened Dimensions
Multiple Hierarchies in Dimensions
Subtopic 12
Subtopic 13
Subtopic 14
g
Denormalized Flattened Dimensions
Multiple Hierarchies in Dimensions
Subtopic 12
Subtopic 13
Subtopic 14
1. Think Dimensional
Basic Dimension Table Techniques
Subtopic 12
Subtopic 13
Subtopic 14
Subtopic 15
Subtopic 16
Subtopic 12
Subtopic 13
Subtopic 14
Subtopic 15
Subtopic 16
1. Think Dimensional
Dimension Table
Structure
Dimension table attributes are the primary
target of constraints and grouping
specifications from queries and BI applications.
1. Think Dimensional
Dimension Surrogate Keys
A dimension table is designed with one column serving as a unique
primary key.
This primary key cannot be the operational system’s natural
natural keys for a dimension may be created by more than one source
system, and these natural keys may be incompatible or poorly
administered
1. Think Dimensional
Natural, Durable, and Sup...
Surrogates keys have no business meaning.
A value that represents a real world object.
The best durable keys have a format that is independent of the original
business process and thus should be simple integers assigned in sequence
beginning with 1 .
1. Think Dimensional
Drilling Down
adding a row header to an existing query the new row header is a
dimension attribute appended to the GROUP BY expression in an SQL
query.
1. Think Dimensional
1. Think Dimensional
drill-up
Slice
Dice
Pivot
Degenerate
Dimensions
Sometimes a dimension is defi ned that
has no content except for its primary key
1. Think Dimensional
Denormalized
Flattened Dimensions
Keeping the repeated low cardinality values in the primary
dimension table is a fundamental dimensional modeling technique.
Normalizing these values into separate tables defeats the primary
goals of simplicity and performance
1. Think Dimensional
Resisting Normalization Urges
Snowflake Schemas with Normalized Dimensions
Outriggers
Centipede Fact Tables with Too Many Dimensions
1. Think Dimensional
Snowflake Schemas with ...
Snowflaking negatively impacts the users’ ability to browse within a
dimension
Most database optimizers also struggle with the snowflaked schema’s
complexity. Numerous tables and joins usually translate into slower query
performance.
1. Think Dimensional
1. Think Dimensional
Outriggers
Fixed depth hierarchies should be flattened in dimension
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
Fixed depth hierarchies should be flattened in dimension
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
1. Think Dimensional
Outriggers
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
You should knowingly sacrifice this dimension table space
in the spirit of performance and ease of use advantages
tables.
Normalized, snowfl aked dimension tables penalize cross-
attribute browsing and prohibit the use of bitmapped
indexes.
You should knowingly sacrifice this dimension table space
in the spirit of performance and ease of use advantages
1. Think Dimensional
1. Think Dimensional
Centipede Fact Tables wit...
A very large number of dimensions typically are a sign that several
dimensions are not completely independent and should be combined into
a single dimension.
It is a dimensional modeling mistake to represent elements of a single
hierarchy as separate dimensions in the fact table.
1. Think Dimensional
1. Think Dimensional
3. real cases
Retail Sales
Inventory
Procurement
Order Management
Customer Relationship Management
Retail Sales
Inventory
Procurement
Order Management
Customer Relationship Management
1. Think Dimensional
3. real cases
Customer Relationship Management
Human Resources Management
Financial Services
Telecommunications
Transportation
Customer Relationship Management
Human Resources Management
Financial Services
Telecommunications
Transportation
1. Think Dimensional
3. real cases
Telecommunications
Transportation
Education
Electronic Commerce
Insurance
Telecommunications
Transportation
Education
Electronic Commerce
Insurance
1. Think Dimensional
2. tasks and tactics of
the dimensional modeling
2. tasks and
tactics of
the
dimensional
modeling
Kimball DW/BI Lifecycle
Overview
Dimensional Modeling
Process and Tasks
3. Big Data Analytics
3. Big Data Analytics
3. Big Data
Analytics
Management Best
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Management Best
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
3. Big Data
Analytics
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Data Governance Best
Practices for Big Data
Practices for Big Data
Architecture Best
Practices for Big Data
Data Modeling Best
Practices for Big Data
Data Governance Best
Practices for Big Data
Management Best Practic...
Delay Building Legacy Environments
Build From Sandbox Results
Try Simple Applications First
3. Big Data Analytics
Architecture Best Practice...
Plan a Data Highway
Build a Fact Extractor from Big Data
Build Comprehensive Ecosystems
Plan for Data Quality
Add Value to Data as Soon as Possible
Plan a Data Highway
Build a Fact Extractor from Big Data
Build Comprehensive Ecosystems
Plan for Data Quality
Add Value to Data as Soon as Possible
3. Big Data Analytics
Architecture Best Practice...
Add Value to Data as Soon as Possible
Implement Backflow to Earlier Caches
Implement Streaming Data
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Add Value to Data as Soon as Possible
Implement Backflow to Earlier Caches
Implement Streaming Data
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
3. Big Data Analytics
Architecture Best Practice...
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Strive for Performance Improvements
Monitor Compute Resources
Exploit In-Database Analytics
Avoid Boundary Crashes
Move Prototypes to a Private Cloud
Strive for Performance Improvements
Monitor Compute Resources
Exploit In-Database Analytics
3. Big Data Analytics
Data Modeling Best Practi...
Think Dimensionally
Integrate Separate Data Sources with Conformed
Dimensions
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Think Dimensionally
Integrate Separate Data Sources with Conformed
Dimensions
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
3. Big Data Analytics
Data Modeling Best Practi...
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Use Slowly Changing Dimensions
Declare Data Structure at Analysis Time
Rapidly Prototype Using Data Virtualization
Anchor Dimensions with Durable Surrogate Keys
Expect to Integrate Structured and Unstructured Data
Use Slowly Changing Dimensions
Declare Data Structure at Analysis Time
Rapidly Prototype Using Data Virtualization
3. Big Data Analytics
Data Governance Best Pra...
There is No Such Thing as Big Data Governance
Dimensionalize the Data before Applying Governance
Privacy is the Most Important Governance Perspective
Don’t Choose Big Data over Governance
3. Big Data Analytics
The Definitive Guide to
Dimensional Modeling
Thank you

The Data Warehouse .pdf

  • 1.
  • 2.
    The Data Warehouse 1. ThinkDimensional 2. tasks and tactics of the dimensional modeling 3. Big Data Analytics 1. Think Dimensional 2. tasks and tactics of the dimensional modeling 3. Big Data Analytics
  • 3.
  • 4.
    1. Think Dimensional 1. DW/BI: Big Picture 2. Dimensional Modeling Techniques 3. real cases 1. DW/BI : Big Picture 2. Dimensional Modeling Techniques 3. real cases
  • 5.
    1. DW/BI :Big Picture Business-driven goals of data warehousing and business intelligence Publishing Metaphor for DW/BI Managers Dimensional Modeling Introduction 1. Think Dimensional
  • 6.
    Business-driven goals ofd... make information easily accessible present information consistently must adapt to change present information in a timely way be a secure bastion that protects the information assets make information easily accessible present information consistently must adapt to change present information in a timely way be a secure bastion that protects the information assets 1. Think Dimensional
  • 7.
    Business-driven goals ofd... must adapt to change present information in a timely way be a secure bastion that protects the information assets the authoritative and trustworthy foundation for improved decision making must adapt to change present information in a timely way be a secure bastion that protects the information assets the authoritative and trustworthy foundation for improved decision making 1. Think Dimensional
  • 8.
    Dimensional Modeling Introduction FactTables for Measurements Star Schemas Versus OLAP Cubes Dimension Tables for Descriptive Context Kimball’s DW/BI Architecture Alternative DW/BI Architectures Fact Tables for Measurements Star Schemas Versus OLAP Cubes Dimension Tables for Descriptive Context Kimball’s DW/BI Architecture Alternative DW/BI Architectures 1. Think Dimensional
  • 9.
    Dimensional Modeling Introduction StarSchemas Versus OLAP Cubes Dimension Tables for Descriptive Context Kimball’s DW/BI Architecture Alternative DW/BI Architectures Dimensional Modeling Myths Star Schemas Versus OLAP Cubes Dimension Tables for Descriptive Context Kimball’s DW/BI Architecture Alternative DW/BI Architectures Dimensional Modeling Myths 1. Think Dimensional
  • 10.
    Fact Tables forMeasurements The most useful facts are numeric and additive event in the physical world has a one-to-one relationship to a single row in the corresponding fact table Fact tables express many-to-many relationships. All others are dimension tables. The most useful facts are numeric and additive event in the physical world has a one-to-one relationship to a single row in the corresponding fact table Fact tables express many-to-many relationships. All others are dimension tables. 1. Think Dimensional
  • 11.
    Fact Tables forMeasurements The most useful facts are numeric and additive event in the physical world has a one-to-one relationship to a single row in the corresponding fact table Fact tables express many-to-many relationships. All others are dimension tables. The most useful facts are numeric and additive event in the physical world has a one-to-one relationship to a single row in the corresponding fact table Fact tables express many-to-many relationships. All others are dimension tables. 1. Think Dimensional
  • 12.
  • 13.
  • 14.
    Dimension Tables for DescriptiveContext Dimensions provide the entry points to the data, and the final labels and groupings on all DW/BI analyses 1. Think Dimensional
  • 15.
    Dimension Tables forDesc... Dimension tables are integral companions to a fact table “who, what, where, when, how, and why” associated with the event fact table to have 50 to 100 attributes Dimension tables tend to have fewer rows than fact tables, but can be wide with many large text columns. Dimension tables are integral companions to a fact table “who, what, where, when, how, and why” associated with the event fact table to have 50 to 100 attributes Dimension tables tend to have fewer rows than fact tables, but can be wide with many large text columns. 1. Think Dimensional
  • 16.
    Dimension Tables forDesc... many large text columns. This normalization is called snowflaking. Facts and Dimensions Joined in a Star Schema many large text columns. This normalization is called snowflaking. Facts and Dimensions Joined in a Star Schema 1. Think Dimensional
  • 17.
  • 18.
    This normalization iscalled snowflaking. This normalization is called snowflaking. 1. Think Dimensional
  • 19.
    This normalization iscalled snowflaking. This normalization is called snowflaking. 1. Think Dimensional
  • 20.
  • 21.
    Kimball’s DW/BI Architecture OperationalSource Systems Extract, Transformation, and Load System Presentation Area to Support Business Intelligence Business Intelligence Applications Operational Source Systems Extract, Transformation, and Load System Presentation Area to Support Business Intelligence Business Intelligence Applications 1. Think Dimensional
  • 22.
    Kimball’s DW/BI Architecture OperationalSource Systems Extract, Transformation, and Load System Presentation Area to Support Business Intelligence Business Intelligence Applications Operational Source Systems Extract, Transformation, and Load System Presentation Area to Support Business Intelligence Business Intelligence Applications 1. Think Dimensional
  • 23.
  • 24.
    Alternative DW/BI Architectures IndependentData Mart Architecture Hub-and-Spoke Corporate Information Factory Inmon Architecture Hybrid Hub-and-Spoke and Kimball Architecture 1. Think Dimensional
  • 25.
  • 26.
  • 27.
  • 28.
    Dimensional Modeling Myths Myth1: Dimensional Models are Only for Summary Data Myth 2: Dimensional Models are Departmental,Not Enterprise Myth 3: Dimensional Models are Not Scalable Myth 4: Dimensional Models are Only Myth 1: Dimensional Models are Only for Summary Data Myth 2: Dimensional Models are Departmental,Not Enterprise Myth 3: Dimensional Models are Not Scalable Myth 4: Dimensional Models are Only 1. Think Dimensional
  • 29.
    Dimensional Modeling Myths Myth2: Dimensional Models are Departmental,Not Enterprise Myth 3: Dimensional Models are Not Scalable Myth 4: Dimensional Models are Only for Predictable Usage Myth 5: Dimensional Models Can’t Be Integrated Myth 2: Dimensional Models are Departmental,Not Enterprise Myth 3: Dimensional Models are Not Scalable Myth 4: Dimensional Models are Only for Predictable Usage Myth 5: Dimensional Models Can’t Be Integrated 1. Think Dimensional
  • 30.
    Myth 4: Dimensional Modelsare Only... The secret to query flexibility is building act tables at the most granular level 1. Think Dimensional
  • 31.
    Myth 5: Dimensional ModelsCan’t Be Int... Dimensional modeling concepts link the business and technical communities together as they jointly design the DW/BI deliverables. 1. Think Dimensional
  • 32.
    Deliver data that’s understandable to thebusiness users.Deliver ... Dimensional Model... 1. Think Dimensional
  • 33.
    Deliver data that’sunderstandable to the business users.Deliver fast query performance. 1. Think Dimensional
  • 34.
    2. Dimensional ModelingT... Fundamental Concepts Resisting Normalization Urges 1. Think Dimensional
  • 35.
    Fundamental Concepts Gather BusinessRequirements and Data Realities Collaborative Dimensional Modeling Workshops Four-Step Dimensional Design Process Basic Fact Table Techniques Basic Dimension Table Techniques Gather Business Requirements and Data Realities Collaborative Dimensional Modeling Workshops Four-Step Dimensional Design Process Basic Fact Table Techniques Basic Dimension Table Techniques 1. Think Dimensional
  • 36.
    Fundamental Concepts Gather BusinessRequirements and Data Realities Collaborative Dimensional Modeling Workshops Four-Step Dimensional Design Process Basic Fact Table Techniques Basic Dimension Table Techniques Gather Business Requirements and Data Realities Collaborative Dimensional Modeling Workshops Four-Step Dimensional Design Process Basic Fact Table Techniques Basic Dimension Table Techniques 1. Think Dimensional
  • 37.
    Four-Step Dimensional De... Selectthe business process. Declare the grain. Identify the dimensions. Identify the facts. 1. Think Dimensional
  • 38.
    Declare the grain. Atomicgrain refers to the lowest level at which data is captured by a given business process. 1. Think Dimensional
  • 39.
    Basic Fact TableTechniques Fact Table Structure Additive, Semi-Additive, Non-Additive Facts Nulls in Fact Tables Conformed Facts Transaction Fact Tables Fact Table Structure Additive, Semi-Additive, Non-Additive Facts Nulls in Fact Tables Conformed Facts Transaction Fact Tables 1. Think Dimensional
  • 40.
    Basic Fact TableTechniques Transaction Fact Tables Periodic Snapshot Fact Tables Accumulating Snapshot Fact Tables Factless Fact Tables Aggregate Fact Tables or OLAP Cubes Transaction Fact Tables Periodic Snapshot Fact Tables Accumulating Snapshot Fact Tables Factless Fact Tables Aggregate Fact Tables or OLAP Cubes 1. Think Dimensional
  • 41.
    Basic Fact TableTechniques Periodic Snapshot Fact Tables Accumulating Snapshot Fact Tables Factless Fact Tables Aggregate Fact Tables or OLAP Cubes Consolidated Fact Tables Periodic Snapshot Fact Tables Accumulating Snapshot Fact Tables Factless Fact Tables Aggregate Fact Tables or OLAP Cubes Consolidated Fact Tables 1. Think Dimensional
  • 42.
    Fact Table Structure Facttables are the primary target of computations and dynamic aggregations arising from queries. A fact table contains the numeric measures produced by an operational measurement event in the real world 1. Think Dimensional
  • 43.
    Additive, Semi-Additive, ... additive measurescan be summed across any of the dimensions associated with the fact table. Semi-additive measures can be summed across some dimensions, but not all; balance amounts are common semi-additive facts because they are additive across all dimensions except time. some measures are completely non-additive, such as ratios. 1. Think Dimensional
  • 44.
    Nulls in FactTables nulls must be avoided in the fact table’s foreign keys nulls would automatically cause a referential integrity violation ( SUM , COUNT , MIN , MAX , and AVG ) all do the “right thing” with null facts 1. Think Dimensional
  • 45.
    Conformed Facts be disciplinedin your data naming practices. If it is impossible to conform a fact exactly, you should give different names to the different interpretations so t... 1. Think Dimensional
  • 46.
    If it isimpossible to conform a fact exactly, you should give different names to the different interpretations so that business users do not combine these incompatible facts in calculations. 1. Think Dimensional
  • 47.
    Transaction Fact Tables Arow in a transaction fact table corresponds to a measurement event at a point in space and time. Atomic transaction grain fact tables are the most dimensional and expressive fact tables; enables the maximum slicing and dicing of transaction data A row in a transaction fact table corresponds to a measurement event at a point in space and time. Atomic transaction grain fact tables are the most dimensional and expressive fact tables; enables the maximum slicing and dicing of transaction data 1. Think Dimensional
  • 48.
    Transaction Fact Tables mu m t t t po t p d t m Atomic transaction grain fact tables are the most dimensional and expressive fact tables; enables the maximum slicing and dicing of transaction data rows exist only if measurements take place. m u m t t t po t p d t m Atomic transaction grain fact tables are the most dimensional and expressive fact tables; enables the maximum slicing and dicing of transaction data rows exist only if measurements take place. 1. Think Dimensional
  • 49.
    Periodic Snapshot FactTables summarizes many measurement events occurring over a standard period, such as a day, a week, or a month. The grain is the period, not the individual transaction 1. Think Dimensional
  • 50.
    Accumulating Snapshot Fa... summarizesthe measurement events occurring at predictable steps between the beginning and the end of a process. initially inserted when the order line is created. As pipeline progress occurs, the accumulating fact table ro... In addition to the date foreign keys associated with each summarizes the measurement events occurring at predictable steps between the beginning and the end of a process. initially inserted when the order line is created. As pipeline progress occurs, the accumulating fact table ro... In addition to the date foreign keys associated with each 1. Think Dimensional
  • 51.
    Accumulating Snapshot Fa... predictablesteps between the beginning and the end of a process. initially inserted when the order line is created. As pipeline progress occurs, the accumulating fact table ro... In addition to the date foreign keys associated with each critical process step, accumulating snapshot fact tables ... predictable steps between the beginning and the end of a process. initially inserted when the order line is created. As pipeline progress occurs, the accumulating fact table ro... In addition to the date foreign keys associated with each critical process step, accumulating snapshot fact tables ... 1. Think Dimensional
  • 52.
    initially inserted whenthe order line is created. As pipeline progress occurs, the accumulating fact table row is revisited and updated. This consistent updating of accumulating snapshot fact rows is unique among the three types of fact tables. 1. Think Dimensional
  • 53.
    In addition tothe date foreign keys associated with each critical process step, accumulating snapshot fact tables contain foreign keys for other dimensions and optionally contain degenerate dimensions. 1. Think Dimensional
  • 54.
    Factless Fact Tables eventof a student attending a class on a given day may not have a recorded numeric fact, but a fact row with f... be used to analyze what didn’t happen. 1. Think Dimensional
  • 55.
    event of astudent attending a class on a given day may not have a recorded numeric fact, but a fact row with foreign keys for calendar day, student, teacher, location, and class is well-defi ned. Likewise 1. Think Dimensional
  • 56.
    be used toanalyze what d... factless coverage table that contains all the possibilities of events that might happen activity table that contains the events that did happen 1. Think Dimensional
  • 57.
    When the activity is subtracted fromthe coverage, the... factless coverage tabl... activity table that con... 1. Think Dimensional
  • 58.
    When the activity issubtracted from the coverage, the result is the set of events that did not happen. 1. Think Dimensional
  • 59.
    Aggregate Fact Tables orOLAP Cubes Aggregate fact tables are simple numeric rollups of atomic fact table data built solely to accelerate query performance. 1. Think Dimensional
  • 60.
    Consolidated Fact Tables Itis often convenient to combine facts from multiple processes together into a single... For example, sales actuals can be consolidated with sales forecasts in a single fact table to make the task of anal... 1. Think Dimensional
  • 61.
    It is oftenconvenient to combine facts from multiple processes together into a single consolidated fact table if they can be expressed at the same grain. 1. Think Dimensional
  • 62.
    For example, salesactuals can be consolidated with sales forecasts in a single fact table to make the task of analyzing actuals versus forecasts simple and fast, as compared to assembling a drill across application using separate fact tables. 1. Think Dimensional
  • 63.
    Basic Dimension TableTechniques Dimension Table Structure Dimension Surrogate Keys Natural, Durable, and Supernatural Keys Drilling Down drill-up Dimension Table Structure Dimension Surrogate Keys Natural, Durable, and Supernatural Keys Drilling Down drill-up 1. Think Dimensional
  • 64.
    Basic Dimension TableTechniques drill-up Slice Dice Pivot Degenerate Dimensions drill-up Slice Dice Pivot Degenerate Dimensions 1. Think Dimensional
  • 65.
    Basic Dimension TableTechniques g Denormalized Flattened Dimensions Multiple Hierarchies in Dimensions Subtopic 12 Subtopic 13 Subtopic 14 g Denormalized Flattened Dimensions Multiple Hierarchies in Dimensions Subtopic 12 Subtopic 13 Subtopic 14 1. Think Dimensional
  • 66.
    Basic Dimension TableTechniques Subtopic 12 Subtopic 13 Subtopic 14 Subtopic 15 Subtopic 16 Subtopic 12 Subtopic 13 Subtopic 14 Subtopic 15 Subtopic 16 1. Think Dimensional
  • 67.
    Dimension Table Structure Dimension tableattributes are the primary target of constraints and grouping specifications from queries and BI applications. 1. Think Dimensional
  • 68.
    Dimension Surrogate Keys Adimension table is designed with one column serving as a unique primary key. This primary key cannot be the operational system’s natural natural keys for a dimension may be created by more than one source system, and these natural keys may be incompatible or poorly administered 1. Think Dimensional
  • 69.
    Natural, Durable, andSup... Surrogates keys have no business meaning. A value that represents a real world object. The best durable keys have a format that is independent of the original business process and thus should be simple integers assigned in sequence beginning with 1 . 1. Think Dimensional
  • 70.
    Drilling Down adding arow header to an existing query the new row header is a dimension attribute appended to the GROUP BY expression in an SQL query. 1. Think Dimensional
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
    Degenerate Dimensions Sometimes a dimensionis defi ned that has no content except for its primary key 1. Think Dimensional
  • 77.
    Denormalized Flattened Dimensions Keeping therepeated low cardinality values in the primary dimension table is a fundamental dimensional modeling technique. Normalizing these values into separate tables defeats the primary goals of simplicity and performance 1. Think Dimensional
  • 78.
    Resisting Normalization Urges SnowflakeSchemas with Normalized Dimensions Outriggers Centipede Fact Tables with Too Many Dimensions 1. Think Dimensional
  • 79.
    Snowflake Schemas with... Snowflaking negatively impacts the users’ ability to browse within a dimension Most database optimizers also struggle with the snowflaked schema’s complexity. Numerous tables and joins usually translate into slower query performance. 1. Think Dimensional
  • 80.
  • 81.
    Outriggers Fixed depth hierarchiesshould be flattened in dimension tables. Normalized, snowfl aked dimension tables penalize cross- attribute browsing and prohibit the use of bitmapped indexes. Fixed depth hierarchies should be flattened in dimension tables. Normalized, snowfl aked dimension tables penalize cross- attribute browsing and prohibit the use of bitmapped indexes. 1. Think Dimensional
  • 82.
    Outriggers tables. Normalized, snowfl akeddimension tables penalize cross- attribute browsing and prohibit the use of bitmapped indexes. You should knowingly sacrifice this dimension table space in the spirit of performance and ease of use advantages tables. Normalized, snowfl aked dimension tables penalize cross- attribute browsing and prohibit the use of bitmapped indexes. You should knowingly sacrifice this dimension table space in the spirit of performance and ease of use advantages 1. Think Dimensional
  • 83.
  • 84.
    Centipede Fact Tableswit... A very large number of dimensions typically are a sign that several dimensions are not completely independent and should be combined into a single dimension. It is a dimensional modeling mistake to represent elements of a single hierarchy as separate dimensions in the fact table. 1. Think Dimensional
  • 85.
  • 86.
    3. real cases RetailSales Inventory Procurement Order Management Customer Relationship Management Retail Sales Inventory Procurement Order Management Customer Relationship Management 1. Think Dimensional
  • 87.
    3. real cases CustomerRelationship Management Human Resources Management Financial Services Telecommunications Transportation Customer Relationship Management Human Resources Management Financial Services Telecommunications Transportation 1. Think Dimensional
  • 88.
    3. real cases Telecommunications Transportation Education ElectronicCommerce Insurance Telecommunications Transportation Education Electronic Commerce Insurance 1. Think Dimensional
  • 89.
    2. tasks andtactics of the dimensional modeling
  • 90.
    2. tasks and tacticsof the dimensional modeling Kimball DW/BI Lifecycle Overview Dimensional Modeling Process and Tasks
  • 91.
    3. Big DataAnalytics 3. Big Data Analytics
  • 92.
    3. Big Data Analytics ManagementBest Practices for Big Data Architecture Best Practices for Big Data Data Modeling Best Practices for Big Data Management Best Practices for Big Data Architecture Best Practices for Big Data Data Modeling Best Practices for Big Data
  • 93.
    3. Big Data Analytics Practicesfor Big Data Architecture Best Practices for Big Data Data Modeling Best Practices for Big Data Data Governance Best Practices for Big Data Practices for Big Data Architecture Best Practices for Big Data Data Modeling Best Practices for Big Data Data Governance Best Practices for Big Data
  • 94.
    Management Best Practic... DelayBuilding Legacy Environments Build From Sandbox Results Try Simple Applications First 3. Big Data Analytics
  • 95.
    Architecture Best Practice... Plana Data Highway Build a Fact Extractor from Big Data Build Comprehensive Ecosystems Plan for Data Quality Add Value to Data as Soon as Possible Plan a Data Highway Build a Fact Extractor from Big Data Build Comprehensive Ecosystems Plan for Data Quality Add Value to Data as Soon as Possible 3. Big Data Analytics
  • 96.
    Architecture Best Practice... AddValue to Data as Soon as Possible Implement Backflow to Earlier Caches Implement Streaming Data Avoid Boundary Crashes Move Prototypes to a Private Cloud Add Value to Data as Soon as Possible Implement Backflow to Earlier Caches Implement Streaming Data Avoid Boundary Crashes Move Prototypes to a Private Cloud 3. Big Data Analytics
  • 97.
    Architecture Best Practice... AvoidBoundary Crashes Move Prototypes to a Private Cloud Strive for Performance Improvements Monitor Compute Resources Exploit In-Database Analytics Avoid Boundary Crashes Move Prototypes to a Private Cloud Strive for Performance Improvements Monitor Compute Resources Exploit In-Database Analytics 3. Big Data Analytics
  • 98.
    Data Modeling BestPracti... Think Dimensionally Integrate Separate Data Sources with Conformed Dimensions Anchor Dimensions with Durable Surrogate Keys Expect to Integrate Structured and Unstructured Data Think Dimensionally Integrate Separate Data Sources with Conformed Dimensions Anchor Dimensions with Durable Surrogate Keys Expect to Integrate Structured and Unstructured Data 3. Big Data Analytics
  • 99.
    Data Modeling BestPracti... Anchor Dimensions with Durable Surrogate Keys Expect to Integrate Structured and Unstructured Data Use Slowly Changing Dimensions Declare Data Structure at Analysis Time Rapidly Prototype Using Data Virtualization Anchor Dimensions with Durable Surrogate Keys Expect to Integrate Structured and Unstructured Data Use Slowly Changing Dimensions Declare Data Structure at Analysis Time Rapidly Prototype Using Data Virtualization 3. Big Data Analytics
  • 100.
    Data Governance BestPra... There is No Such Thing as Big Data Governance Dimensionalize the Data before Applying Governance Privacy is the Most Important Governance Perspective Don’t Choose Big Data over Governance 3. Big Data Analytics
  • 101.
    The Definitive Guideto Dimensional Modeling
  • 102.