SlideShare a Scribd company logo
1 of 47
Chapter 33
Data Warehousing Design
Transparencies
2
Chapter Objectives
The activities associated with initiating a
data warehouse project.
The two main methodologies, which
incorporate the development of a data
warehouse: namely Inmon’s Corporate
Information Factory (CIF) and Kimball’s
Business Dimensional Lifecycle.
The main principles and stages associated
with Kimball’s Business Dimensional
Lifecycle.
3
Chapter Objectives
The concepts associated with dimensionality
modeling, which is a core technique of
Kimball’s Business Dimensional Lifecycle.
The Dimensional Modeling stage of
Kimball’s Business Dimensional Lifecycle.
The step-by-step creation of a dimensional
model (DM) using the DreamHome case
study.
The issues associated with the development of
a data warehouse.
4
Designing Data Warehouses
To begin a data warehouse project, we
need to find answers for questions such as:
– Which user requirements are most
important and which data should be
considered first?
– Which data should be considered first?
– Should the project be scaled down into
something more manageable?
– Should the infrastructure for a scaled
down project be capable of ultimately
delivering a full-scale enterprise-wide
data warehouse?
5
Designing Data Warehouses
For many enterprises the way to avoid the
complexities associated with designing a
data warehouse is to start by building one
or more data marts.
Data marts allow designers to build
something that is far simpler and
achievable for a specific group of users.
6
Designing Data Warehouses
Few designers are willing to commit to an
enterprise-wide design that must meet all
user requirements at one time.
Despite the interim solution of building
data marts, the goal remains the same:
that is, the ultimate creation of a data
warehouse that supports the requirements
of the enterprise.
7
Designing Data Warehouses
The requirements collection and analysis
stage of a data warehouse project involves
interviewing appropriate members of staff
(such as marketing users, finance users,
and sales users) to enable the
identification of a prioritized set of
requirements that the data warehouse
must meet.
8
Designing Data Warehouses
At the same time, interviews are
conducted with members of staff
responsible for operational systems to
identify, which data sources can provide
clean, valid, and consistent data that will
remain supported over the next few years.
9
Designing Data Warehouses
Interviews provide the necessary
information for the top-down view (user
requirements) and the bottom-up view
(which data sources are available) of the
data warehouse.
The database component of a data
warehouse is described using a technique
called dimensionality modeling.
10
Data Warehouse Development Methodologies
There are two main methodologies that
incorporate the development of an enterprise
data warehouse (EDW) and these are
proposed by the two key players in the data
warehouse arena.
– Kimball’s Business Dimensional Lifecycle
(Kimball, 2008)
– Inmon’s Corporate Information Factory
(CIF) methodology (Inmon, 2001).
11
Data Warehouse Development Methodologies
Kimballs’ Business Dimensional Lifecycle
Ralph Kimball is a key player in DW.
About creation of an infrastructure
capable of supporting all the information
needs of an enterprise.
Uses new methods and techniques in the
development of an enterprise data
warehouse (EDW).
12
Kimballs’ Business Dimensional Lifecycle
Starts by identifying the information
requirements (referred to as analytical
themes) and associated business processes
of the enterprise.
This activity results in the creation of a
critical document called a Data
Warehouse Bus Matrix.
13
Kimballs’ Business Dimensional Lifecycle
The matrix lists all of the key business
processes of an enterprise together with
an indication of how these processes are to
be analysed.
The matrix is used to facilitate the
selection and development of the first
database (data mart) to meet the
information requirements of a particular
department of the enterprise.
14
Kimballs’ Business Dimensional Lifecycle
This first data mart is critical in setting
the scene for the later integration of other
data marts as they come online.
The integration of data marts ultimately
leads to the development of an EDW.
Uses dimensionality modeling to establish
the data model (called star schema) for
each data mart.
15
Kimballs’ Business Dimensional Lifecycle
Guiding principle is to meet the
information requirements of the enterprise
by building a single, integrated, easy-to-use,
high-performance information
infrastructure, which is delivered in
meaningful increments of 6 to 12 month
timeframes.
Goal is to deliver the entire solution
including the data warehouse, ad hoc query
tools, reporting applications, advanced
analytics and all the necessary training and 16
Kimballs’ Business Dimensional Lifecycle
Has three tracks:
– technology (top track),
– data (middle track),
– business intelligence (BI) applications
(bottom track).
Uses incremental and iterative approach
that involves the development of data marts
that are eventually integrated into an
enterprise data warehouse (EDW).
17
Kimballs’ Business Dimensional Lifecycle
18
19
Dimensionality modeling
A logical design technique that aims to
present the data in a standard, intuitive
form that allows for high-performance
access
Every dimensional model (DM) is
composed of one table with a composite
primary key, called the fact table, and a set
of smaller tables called dimension tables.
20
Dimensionality modeling
Each dimension table has a simple (non-
composite) primary key that corresponds
exactly to one of the components of the
composite key in the fact table.
Forms ‘star-like’ structure, which is called
a star schema or star join.
21
Dimensionality modeling
All natural keys are replaced with
surrogate keys. Means that every join
between fact and dimension tables is based
on surrogate keys, not natural keys.
Surrogate keys allows the data in the
warehouse to have some independence
from the data used and produced by the
OLTP systems.
22
Star schema (dimensional model) for property
sales of DreamHome
23
Dimensionality modeling
Star schema is a logical structure that has
a fact table (containing factual data) in the
center, surrounded by denormalized
dimension tables (containing reference
data).
Facts are generated by events that
occurred in the past, and are unlikely to
change, regardless of how they are
analyzed.
24
Dimensionality modeling
Bulk of data in data warehouse is in fact
tables, which can be extremely large.
Important to treat fact data as read-only
reference data that will not change over
time.
Most useful fact tables contain one or
more numerical measures, or ‘facts’ that
occur for each record and are numeric
and additive.
25
Dimensionality modeling
Dimension tables usually contain
descriptive textual information.
Dimension attributes are used as the
constraints in data warehouse queries.
Star schemas can be used to speed up
query performance by denormalizing
reference information into a single
dimension table.
26
Dimensionality modeling
Snowflake schema is a variant of the star
schema that has a fact table in the center,
surrounded by normalized dimension
tables .
Starflake schema is a hybrid structure that
contains a mixture of star (denormalized)
and snowflake (normalized) dimension
tables.
27
Property sales with normalized version of
Branch dimension table
28
Dimensionality modeling
Predictable and standard form of the
underlying dimensional model offers
important advantages:
– Efficiency
– Ability to handle changing requirements
– Extensibility
– Ability to model common business
situations
– Predictable query processing
29
Comparison of DM and ER models
A single ER model normally decomposes
into multiple DMs.
Multiple DMs are then associated through
‘shared’ dimension tables.
30
Dimensional Modeling Stage of Kimball’s
Business Dimensional Lifecycle
Begins by defining a high-level dimension
model (DM), which progressively gains
more detail and this is achieved using a
two-phased approach.
The first phase is the creation of the high-
level DM and the second phase involves
adding detail to the model through the
identification of dimensional attributes
for the model.
31
Dimensional Modeling Stage of Kimball’s
Business Dimensional Lifecycle
Phase 1 involves the creation of a high-
level dimensional model (DM) using a
four-step process.
32
Step 1: Select business process
The process (function) refers to the
subject matter of a particular data mart.
First data mart built should be the one
that is most likely to be delivered on time,
within budget, and to answer the most
commercially important business
questions.
33
ER model of an extended version of DreamHome
34
ER model of property sales business process of
DreamHome
35
Step 2: Declare grain
Decide what a record of the fact table is to
represents.
Identify dimensions of the fact table. The
grain decision for the fact table also
determines the grain of each dimension
table.
Also include time as a core dimension,
which is always present in star schemas.
36
Step 3: Choose dimensions
Dimensions set the context for asking
questions about the facts in the fact table.
If any dimension occurs in two data marts,
they must be exactly the same dimension,
or one must be a mathematical subset of
the other.
A dimension used in more than one data
mart is referred to as being conformed.
37
Star schemas for property sales and
property advertising
38
Step 4: Identify facts
The grain of the fact table determines
which facts can be used in the data mart.
Facts should be numeric and additive.
Unusable facts include:
– non-numeric facts
– non-additive facts
– fact at different granularity from other
facts in table
39
Step 4: Identify facts
Once the facts have been selected each should
be re-examined to determine whether there
are opportunities to use pre-calculations.
40
Dimensional Modeling Stage of Kimball’s
Business Dimensional Lifecycle
Phase 2 involves the rounding out of the
dimensional tables.
Text descriptions are added to the
dimension tables and be as intuitive and
understandable to the users as possible.
Usefulness of a data mart is determined
by the scope and nature of the attributes
of the dimension tables.
41
Additional design issues
Duration measures how far back in time the
fact table goes.
Very large fact tables raise at least two very
significant data warehouse design issues.
– Often difficult to source increasing old
data.
– It is mandatory that the old versions of
the important dimensions be used, not
the most current versions. Known as the
‘Slowly Changing Dimension’ problem.
42
Additional design issues
Slowly changing dimension problem
means that the proper description of the
old dimension data must be used with the
old fact data.
Often, a generalized key must be assigned
to important dimensions in order to
distinguish multiple snapshots of
dimensions over a period of time.
43
Additional design issues
There are three basic types of slowly
changing dimensions:
– Type 1, where a changed dimension
attribute is overwritten
– Type 2, where a changed dimension
attribute causes a new dimension record to
be created
– Type 3, where a changed dimension
attribute causes an alternate attribute to
be created so that both the old and new
values of the attribute are simultaneously
accessible in the same dimension record
44
Kimball’s Business Dimensional lifecycle
Lifecycle produces a data mart that supports
the requirements of a particular business
process and allows the easy integration with
other related data marts to form the
enterprise-wide data warehouse.
A dimensional model, which contains more
than one fact table sharing one or more
conformed dimension tables, is referred to as
a fact constellation.
45
Dimensional model (fact constellation) for the
DreamHome data warehouse
46
Data Warehouse Development Issues
Selection development methodology.
Identification of key decision-makers to be
supported their analytical requirements.
Identification of data sources and assess the
quality of the data.
Selection of the ETL tool.
Identification of strategy for meta-data be
management.
47
Data Warehouse Development Issues
Establishment of characteristics of the data e.g.
granularity, latency, duration and data lineage.
Establish storage capacity requirements for the
database.
Establishment of the data refresh requirements.
Identification of analytical tools.
Establishing an appropriate architecture for the
DW/DM environment .
Deal with the organisational, cultural and
political issues associated with data ownership.

More Related Content

What's hot

INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...ijscai
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
An ontological approach to handle multidimensional schema evolution for data ...
An ontological approach to handle multidimensional schema evolution for data ...An ontological approach to handle multidimensional schema evolution for data ...
An ontological approach to handle multidimensional schema evolution for data ...ijdms
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodellingmeghu123
 
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overviewJames M. Dey
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingCode Mastery
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designSarita Kataria
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousingShahed Khalili
 
Mc leod9e ch08 information in action
Mc leod9e ch08 information in actionMc leod9e ch08 information in action
Mc leod9e ch08 information in actionsellyhood
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report Tom Donoghue
 
Corporate-data-warehousing-training
Corporate-data-warehousing-trainingCorporate-data-warehousing-training
Corporate-data-warehousing-trainingUnmesh Baile
 
Chapter8
Chapter8Chapter8
Chapter8AG RD
 
Application of mis in textile industry
Application of mis in textile industryApplication of mis in textile industry
Application of mis in textile industryMajharul Islam
 
Competing with-it-systems
Competing with-it-systemsCompeting with-it-systems
Competing with-it-systemsDominic Tellis
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesEtisalat
 

What's hot (20)

INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
INNOVATIVE BI APPROACHES AND METHODOLOGIES IMPLEMENTING A MULTILEVEL ANALYTIC...
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
An ontological approach to handle multidimensional schema evolution for data ...
An ontological approach to handle multidimensional schema evolution for data ...An ontological approach to handle multidimensional schema evolution for data ...
An ontological approach to handle multidimensional schema evolution for data ...
 
Dataware house multidimensionalmodelling
Dataware house multidimensionalmodellingDataware house multidimensionalmodelling
Dataware house multidimensionalmodelling
 
Information architecture overview
Information architecture overviewInformation architecture overview
Information architecture overview
 
Data Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional ModelingData Warehouse Design & Dimensional Modeling
Data Warehouse Design & Dimensional Modeling
 
Data warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-designData warehouse-dimensional-modeling-and-design
Data warehouse-dimensional-modeling-and-design
 
Modules in mis
Modules in misModules in mis
Modules in mis
 
Chapter 9
Chapter 9Chapter 9
Chapter 9
 
Mis
MisMis
Mis
 
An introduction to data warehousing
An introduction to data warehousingAn introduction to data warehousing
An introduction to data warehousing
 
Mc leod9e ch08 information in action
Mc leod9e ch08 information in actionMc leod9e ch08 information in action
Mc leod9e ch08 information in action
 
2. olap warehouse
2. olap warehouse2. olap warehouse
2. olap warehouse
 
Data Warehouse Project Report
Data Warehouse Project Report Data Warehouse Project Report
Data Warehouse Project Report
 
Corporate-data-warehousing-training
Corporate-data-warehousing-trainingCorporate-data-warehousing-training
Corporate-data-warehousing-training
 
Chapter8
Chapter8Chapter8
Chapter8
 
Application of mis in textile industry
Application of mis in textile industryApplication of mis in textile industry
Application of mis in textile industry
 
Competing with-it-systems
Competing with-it-systemsCompeting with-it-systems
Competing with-it-systems
 
Planning Data Warehouse
Planning Data WarehousePlanning Data Warehouse
Planning Data Warehouse
 
Survey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data WarehousesSurvey On Temporal Data And Change Management in Data Warehouses
Survey On Temporal Data And Change Management in Data Warehouses
 

Similar to Ch33 - Dim Modelling

Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationSunderland City Council
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an OverviewZahra Mansoori
 
Accelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringAccelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringCognizant
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousingZahra Mansoori
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseRussel Chowdhury
 
BI Architecture in support of data quality
BI Architecture in support of data qualityBI Architecture in support of data quality
BI Architecture in support of data qualityTom Breur
 
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...aliramezani30
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDenodo
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project reportsonalighai
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureJames Serra
 
Methodology conceptual databases design roll no. 99 & 111
Methodology conceptual databases design roll no. 99 & 111Methodology conceptual databases design roll no. 99 & 111
Methodology conceptual databases design roll no. 99 & 111Manoj Nolkha
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationDenodo
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems Cavien Clever
 

Similar to Ch33 - Dim Modelling (20)

Data Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data VisualisationData Warehousing, Data Mining & Data Visualisation
Data Warehousing, Data Mining & Data Visualisation
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
 
Accelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature EngineeringAccelerating Machine Learning as a Service with Automated Feature Engineering
Accelerating Machine Learning as a Service with Automated Feature Engineering
 
Unit 3 part 2
Unit  3 part 2Unit  3 part 2
Unit 3 part 2
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Master data management and data warehousing
Master data management and data warehousingMaster data management and data warehousing
Master data management and data warehousing
 
Ch03
Ch03Ch03
Ch03
 
Business Intelligence and Multidimensional Database
Business Intelligence and Multidimensional DatabaseBusiness Intelligence and Multidimensional Database
Business Intelligence and Multidimensional Database
 
BI Architecture in support of data quality
BI Architecture in support of data qualityBI Architecture in support of data quality
BI Architecture in support of data quality
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...
BoUl2000_-_Developing_Data_Warehouse_Structures_from_Business_Process_Models_...
 
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data VirtualizationDAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
DAMA Webinar: Turn Grand Designs into a Reality with Data Virtualization
 
Data warehousing and business intelligence project report
Data warehousing and business intelligence project reportData warehousing and business intelligence project report
Data warehousing and business intelligence project report
 
Building an Effective Data Warehouse Architecture
Building an Effective Data Warehouse ArchitectureBuilding an Effective Data Warehouse Architecture
Building an Effective Data Warehouse Architecture
 
Methodology conceptual databases design roll no. 99 & 111
Methodology conceptual databases design roll no. 99 & 111Methodology conceptual databases design roll no. 99 & 111
Methodology conceptual databases design roll no. 99 & 111
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Course Outline Ch 2
Course Outline Ch 2Course Outline Ch 2
Course Outline Ch 2
 
Connecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data VirtualizationConnecting Silos in Real Time with Data Virtualization
Connecting Silos in Real Time with Data Virtualization
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems
 

Recently uploaded

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Ch33 - Dim Modelling

  • 1. Chapter 33 Data Warehousing Design Transparencies
  • 2. 2 Chapter Objectives The activities associated with initiating a data warehouse project. The two main methodologies, which incorporate the development of a data warehouse: namely Inmon’s Corporate Information Factory (CIF) and Kimball’s Business Dimensional Lifecycle. The main principles and stages associated with Kimball’s Business Dimensional Lifecycle.
  • 3. 3 Chapter Objectives The concepts associated with dimensionality modeling, which is a core technique of Kimball’s Business Dimensional Lifecycle. The Dimensional Modeling stage of Kimball’s Business Dimensional Lifecycle. The step-by-step creation of a dimensional model (DM) using the DreamHome case study. The issues associated with the development of a data warehouse.
  • 4. 4 Designing Data Warehouses To begin a data warehouse project, we need to find answers for questions such as: – Which user requirements are most important and which data should be considered first? – Which data should be considered first? – Should the project be scaled down into something more manageable? – Should the infrastructure for a scaled down project be capable of ultimately delivering a full-scale enterprise-wide data warehouse?
  • 5. 5 Designing Data Warehouses For many enterprises the way to avoid the complexities associated with designing a data warehouse is to start by building one or more data marts. Data marts allow designers to build something that is far simpler and achievable for a specific group of users.
  • 6. 6 Designing Data Warehouses Few designers are willing to commit to an enterprise-wide design that must meet all user requirements at one time. Despite the interim solution of building data marts, the goal remains the same: that is, the ultimate creation of a data warehouse that supports the requirements of the enterprise.
  • 7. 7 Designing Data Warehouses The requirements collection and analysis stage of a data warehouse project involves interviewing appropriate members of staff (such as marketing users, finance users, and sales users) to enable the identification of a prioritized set of requirements that the data warehouse must meet.
  • 8. 8 Designing Data Warehouses At the same time, interviews are conducted with members of staff responsible for operational systems to identify, which data sources can provide clean, valid, and consistent data that will remain supported over the next few years.
  • 9. 9 Designing Data Warehouses Interviews provide the necessary information for the top-down view (user requirements) and the bottom-up view (which data sources are available) of the data warehouse. The database component of a data warehouse is described using a technique called dimensionality modeling.
  • 10. 10 Data Warehouse Development Methodologies There are two main methodologies that incorporate the development of an enterprise data warehouse (EDW) and these are proposed by the two key players in the data warehouse arena. – Kimball’s Business Dimensional Lifecycle (Kimball, 2008) – Inmon’s Corporate Information Factory (CIF) methodology (Inmon, 2001).
  • 12. Kimballs’ Business Dimensional Lifecycle Ralph Kimball is a key player in DW. About creation of an infrastructure capable of supporting all the information needs of an enterprise. Uses new methods and techniques in the development of an enterprise data warehouse (EDW). 12
  • 13. Kimballs’ Business Dimensional Lifecycle Starts by identifying the information requirements (referred to as analytical themes) and associated business processes of the enterprise. This activity results in the creation of a critical document called a Data Warehouse Bus Matrix. 13
  • 14. Kimballs’ Business Dimensional Lifecycle The matrix lists all of the key business processes of an enterprise together with an indication of how these processes are to be analysed. The matrix is used to facilitate the selection and development of the first database (data mart) to meet the information requirements of a particular department of the enterprise. 14
  • 15. Kimballs’ Business Dimensional Lifecycle This first data mart is critical in setting the scene for the later integration of other data marts as they come online. The integration of data marts ultimately leads to the development of an EDW. Uses dimensionality modeling to establish the data model (called star schema) for each data mart. 15
  • 16. Kimballs’ Business Dimensional Lifecycle Guiding principle is to meet the information requirements of the enterprise by building a single, integrated, easy-to-use, high-performance information infrastructure, which is delivered in meaningful increments of 6 to 12 month timeframes. Goal is to deliver the entire solution including the data warehouse, ad hoc query tools, reporting applications, advanced analytics and all the necessary training and 16
  • 17. Kimballs’ Business Dimensional Lifecycle Has three tracks: – technology (top track), – data (middle track), – business intelligence (BI) applications (bottom track). Uses incremental and iterative approach that involves the development of data marts that are eventually integrated into an enterprise data warehouse (EDW). 17
  • 19. 19 Dimensionality modeling A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables.
  • 20. 20 Dimensionality modeling Each dimension table has a simple (non- composite) primary key that corresponds exactly to one of the components of the composite key in the fact table. Forms ‘star-like’ structure, which is called a star schema or star join.
  • 21. 21 Dimensionality modeling All natural keys are replaced with surrogate keys. Means that every join between fact and dimension tables is based on surrogate keys, not natural keys. Surrogate keys allows the data in the warehouse to have some independence from the data used and produced by the OLTP systems.
  • 22. 22 Star schema (dimensional model) for property sales of DreamHome
  • 23. 23 Dimensionality modeling Star schema is a logical structure that has a fact table (containing factual data) in the center, surrounded by denormalized dimension tables (containing reference data). Facts are generated by events that occurred in the past, and are unlikely to change, regardless of how they are analyzed.
  • 24. 24 Dimensionality modeling Bulk of data in data warehouse is in fact tables, which can be extremely large. Important to treat fact data as read-only reference data that will not change over time. Most useful fact tables contain one or more numerical measures, or ‘facts’ that occur for each record and are numeric and additive.
  • 25. 25 Dimensionality modeling Dimension tables usually contain descriptive textual information. Dimension attributes are used as the constraints in data warehouse queries. Star schemas can be used to speed up query performance by denormalizing reference information into a single dimension table.
  • 26. 26 Dimensionality modeling Snowflake schema is a variant of the star schema that has a fact table in the center, surrounded by normalized dimension tables . Starflake schema is a hybrid structure that contains a mixture of star (denormalized) and snowflake (normalized) dimension tables.
  • 27. 27 Property sales with normalized version of Branch dimension table
  • 28. 28 Dimensionality modeling Predictable and standard form of the underlying dimensional model offers important advantages: – Efficiency – Ability to handle changing requirements – Extensibility – Ability to model common business situations – Predictable query processing
  • 29. 29 Comparison of DM and ER models A single ER model normally decomposes into multiple DMs. Multiple DMs are then associated through ‘shared’ dimension tables.
  • 30. 30 Dimensional Modeling Stage of Kimball’s Business Dimensional Lifecycle Begins by defining a high-level dimension model (DM), which progressively gains more detail and this is achieved using a two-phased approach. The first phase is the creation of the high- level DM and the second phase involves adding detail to the model through the identification of dimensional attributes for the model.
  • 31. 31 Dimensional Modeling Stage of Kimball’s Business Dimensional Lifecycle Phase 1 involves the creation of a high- level dimensional model (DM) using a four-step process.
  • 32. 32 Step 1: Select business process The process (function) refers to the subject matter of a particular data mart. First data mart built should be the one that is most likely to be delivered on time, within budget, and to answer the most commercially important business questions.
  • 33. 33 ER model of an extended version of DreamHome
  • 34. 34 ER model of property sales business process of DreamHome
  • 35. 35 Step 2: Declare grain Decide what a record of the fact table is to represents. Identify dimensions of the fact table. The grain decision for the fact table also determines the grain of each dimension table. Also include time as a core dimension, which is always present in star schemas.
  • 36. 36 Step 3: Choose dimensions Dimensions set the context for asking questions about the facts in the fact table. If any dimension occurs in two data marts, they must be exactly the same dimension, or one must be a mathematical subset of the other. A dimension used in more than one data mart is referred to as being conformed.
  • 37. 37 Star schemas for property sales and property advertising
  • 38. 38 Step 4: Identify facts The grain of the fact table determines which facts can be used in the data mart. Facts should be numeric and additive. Unusable facts include: – non-numeric facts – non-additive facts – fact at different granularity from other facts in table
  • 39. 39 Step 4: Identify facts Once the facts have been selected each should be re-examined to determine whether there are opportunities to use pre-calculations.
  • 40. 40 Dimensional Modeling Stage of Kimball’s Business Dimensional Lifecycle Phase 2 involves the rounding out of the dimensional tables. Text descriptions are added to the dimension tables and be as intuitive and understandable to the users as possible. Usefulness of a data mart is determined by the scope and nature of the attributes of the dimension tables.
  • 41. 41 Additional design issues Duration measures how far back in time the fact table goes. Very large fact tables raise at least two very significant data warehouse design issues. – Often difficult to source increasing old data. – It is mandatory that the old versions of the important dimensions be used, not the most current versions. Known as the ‘Slowly Changing Dimension’ problem.
  • 42. 42 Additional design issues Slowly changing dimension problem means that the proper description of the old dimension data must be used with the old fact data. Often, a generalized key must be assigned to important dimensions in order to distinguish multiple snapshots of dimensions over a period of time.
  • 43. 43 Additional design issues There are three basic types of slowly changing dimensions: – Type 1, where a changed dimension attribute is overwritten – Type 2, where a changed dimension attribute causes a new dimension record to be created – Type 3, where a changed dimension attribute causes an alternate attribute to be created so that both the old and new values of the attribute are simultaneously accessible in the same dimension record
  • 44. 44 Kimball’s Business Dimensional lifecycle Lifecycle produces a data mart that supports the requirements of a particular business process and allows the easy integration with other related data marts to form the enterprise-wide data warehouse. A dimensional model, which contains more than one fact table sharing one or more conformed dimension tables, is referred to as a fact constellation.
  • 45. 45 Dimensional model (fact constellation) for the DreamHome data warehouse
  • 46. 46 Data Warehouse Development Issues Selection development methodology. Identification of key decision-makers to be supported their analytical requirements. Identification of data sources and assess the quality of the data. Selection of the ETL tool. Identification of strategy for meta-data be management.
  • 47. 47 Data Warehouse Development Issues Establishment of characteristics of the data e.g. granularity, latency, duration and data lineage. Establish storage capacity requirements for the database. Establishment of the data refresh requirements. Identification of analytical tools. Establishing an appropriate architecture for the DW/DM environment . Deal with the organisational, cultural and political issues associated with data ownership.