SlideShare a Scribd company logo
1 of 14
Download to read offline
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 1
Data Warehouse
Logical Modeling and Design
(6)
Bernard ESPINASSE
Professeur à Aix-Marseille Université (AMU)
Ecole Polytechnique Universitaire de Marseille
October 2013
Methodological framework
Logical Modeling: The Multidimensional Model
Logical Design : From Fact schema to Logical schema
in ROLAP context
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 2
1. Methodological Framework
• Conceptual Design & Logical Design
• Design Phases and schemata derivations
2. Logical Modelling: The Multidimensionnal Model
• Problematic of the Logical Design
• The Multidimensional Model: fact, measures, dimensions
•
3. Implementing a Dimensional Model in ROLAP
• Star schema
• Snowflake schema
• Aggregates and views
•
4. Logical Design: From Fact schema to Logical schema in ROLAP context
• From fact schema to relational star-schema: basic rules
• Examples towards Relational Star Schema
• Examples towards Relational Snowflake Schema
• Advanced logical modelling
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 3
• Livres
• Golfarelli M., Rizzi S., « Data Warehouse Design : Modern Principles and
Methodologies », McGrawHill, 2009.
• Kimball R., Ross, M., « Entrepôts de données : guide pratique de
modélisation dimensionnelle », 2°édition, Ed. Vuibert, 2003, ISBN : 2-
7117-4811-1.
• Franco J-M., « Le Data Warehouse ». Ed. Eyrolles, Paris, 1997. ISBN 2-
212-08956-2.
• Cours
• Course of M. Golfarelli M. and S. Rizzi, University of Bologna
• Course of M. Böhlen and J. Gamper J., Free University of Bolzano
• …
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 4
• Conceptual Design & Logical Design
• Life-Cycle
• Design Phases
• Schemata derivations
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 5
• Entite-Relation models are not very useful in modeling DWs
• Is now universally recognized that a DW is based on a
multidimensional view of data :
! But there is still no agreement on HOW to implement its
conceptual design !
• Most of the time, DW design is at the logical level : a
multidimensional model (star/snowflake schema) is directly designed :
! But a star schema is nothing but a relational schema: it
contains only the definition of a set of relations and integrity
constraints !
• A better approach:
! 1) Design first a conceptual model : Conceptual Design
! 2) Then translate this conceptual model into a logical
model : Logical Design
Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 6
J. Gamper, Free University of Bolzano, DWDM 2012/13 40
%

0



3

0
'



-2





$

business user
designer
db administrator
!0


Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 7
The Logical Design transforms the Conceptual Schema for a DM into a Logical
Schema :
• Choice of the type of logical schema
• Translation of conceptual schema to logical schema
• Optimization (view materialization, fragmentation)
Different principles from the one used in operational databases :
• data redundancy
• denormalization of tables
J. Gamper, Free University of Bolzano, DWDM 2012-13 10
A
6
7
6
1
4
1
5
1
3
4

More Related Content

What's hot

KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Denodo
 
Data fabric and VMware
Data fabric and VMwareData fabric and VMware
Data fabric and VMwareVMware vFabric
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?Denodo
 
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEEDTHE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEEDwebwinkelvakdag
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleVasu S
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Denodo
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesDATAVERSITY
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An IntroductionDenodo
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Denodo
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
Where does Fast Data Strategy Fit within IT Projects
Where does Fast Data Strategy Fit within IT ProjectsWhere does Fast Data Strategy Fit within IT Projects
Where does Fast Data Strategy Fit within IT ProjectsDenodo
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Denodo
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosDenodo
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationDenodo
 
Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)Denodo
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business IntelligencePrithwis Mukerjee
 

What's hot (20)

KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
Analyst View of Data Virtualization: Conversations with Boulder Business Inte...
 
Data fabric and VMware
Data fabric and VMwareData fabric and VMware
Data fabric and VMware
 
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
SAP Analytics Cloud: Haben Sie schon alle Datenquellen im Live-Zugriff?
 
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEEDTHE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
THE FUTURE OF DATA: PROVISIONING ANALYTICS-READY DATA AT SPEED
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Modern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | QuboleModern Integrated Data Environment - Whitepaper | Qubole
Modern Integrated Data Environment - Whitepaper | Qubole
 
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
Empowering your Enterprise with a Self-Service Data Marketplace (ASEAN)
 
Data-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse StrategiesData-Ed Online Presents: Data Warehouse Strategies
Data-Ed Online Presents: Data Warehouse Strategies
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 
Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)Enabling Cloud Data Integration (EMEA)
Enabling Cloud Data Integration (EMEA)
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Where does Fast Data Strategy Fit within IT Projects
Where does Fast Data Strategy Fit within IT ProjectsWhere does Fast Data Strategy Fit within IT Projects
Where does Fast Data Strategy Fit within IT Projects
 
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with OktopusDenodo Data Virtualization - IT Days in Luxembourg with Oktopus
Denodo Data Virtualization - IT Days in Luxembourg with Oktopus
 
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
Analyst Webinar: Discover how a logical data fabric helps organizations avoid...
 
In Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data ScenariosIn Memory Parallel Processing for Big Data Scenarios
In Memory Parallel Processing for Big Data Scenarios
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)Introduction to Modern Data Virtualization (US)
Introduction to Modern Data Virtualization (US)
 
Datawarehousing and Business Intelligence
Datawarehousing and Business IntelligenceDatawarehousing and Business Intelligence
Datawarehousing and Business Intelligence
 

Similar to Data Warehouse Logical Design Guide

The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3Terry Bunio
 
The final frontier
The final frontierThe final frontier
The final frontierTerry Bunio
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsVivastream
 
Organization and DSS. Approaches, paths and results
Organization and DSS. Approaches, paths and resultsOrganization and DSS. Approaches, paths and results
Organization and DSS. Approaches, paths and resultsDecision Science Community
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxcalf_ville86
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Muhammad Fahad
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...Fabio Fumarola
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information qualityPeter O'Kelly
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AXAlvin You
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)tafosepsdfasg
 
Wbs, estimation and scheduling
Wbs, estimation and schedulingWbs, estimation and scheduling
Wbs, estimation and schedulingSulman Ahmed
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an OverviewZahra Mansoori
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 

Similar to Data Warehouse Logical Design Guide (20)

Big Data Modeling
Big Data ModelingBig Data Modeling
Big Data Modeling
 
The final frontier v3
The final frontier v3The final frontier v3
The final frontier v3
 
The final frontier
The final frontierThe final frontier
The final frontier
 
Data Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisionsData Refinement: The missing link between data collection and decisions
Data Refinement: The missing link between data collection and decisions
 
2. data warehouse 2nd unit
2. data warehouse 2nd unit2. data warehouse 2nd unit
2. data warehouse 2nd unit
 
Data Vault Introduction
Data Vault IntroductionData Vault Introduction
Data Vault Introduction
 
Organization and DSS. Approaches, paths and results
Organization and DSS. Approaches, paths and resultsOrganization and DSS. Approaches, paths and results
Organization and DSS. Approaches, paths and results
 
Lesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptxLesson 3 - The Kimbal Lifecycle.pptx
Lesson 3 - The Kimbal Lifecycle.pptx
 
Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)Business Intelligence Presentation 1 (15th March'16)
Business Intelligence Presentation 1 (15th March'16)
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
1. Introduction to the Course "Designing Data Bases with Advanced Data Models...
 
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality201407 MIT CDO IQ conceptual data modeling, big data, and information quality
201407 MIT CDO IQ conceptual data modeling, big data, and information quality
 
Data Warehouse approaches with Dynamics AX
Data Warehouse  approaches with Dynamics AXData Warehouse  approaches with Dynamics AX
Data Warehouse approaches with Dynamics AX
 
1 introductory slides (1)
1 introductory slides (1)1 introductory slides (1)
1 introductory slides (1)
 
Wbs
WbsWbs
Wbs
 
Wbs, estimation and scheduling
Wbs, estimation and schedulingWbs, estimation and scheduling
Wbs, estimation and scheduling
 
BI Introduction
BI IntroductionBI Introduction
BI Introduction
 
Business intelligence an Overview
Business intelligence an OverviewBusiness intelligence an Overview
Business intelligence an Overview
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 

Recently uploaded

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 

Recently uploaded (20)

DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 

Data Warehouse Logical Design Guide

  • 1. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 1 Data Warehouse Logical Modeling and Design (6) Bernard ESPINASSE Professeur à Aix-Marseille Université (AMU) Ecole Polytechnique Universitaire de Marseille October 2013 Methodological framework Logical Modeling: The Multidimensional Model Logical Design : From Fact schema to Logical schema in ROLAP context Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 2 1. Methodological Framework • Conceptual Design & Logical Design • Design Phases and schemata derivations 2. Logical Modelling: The Multidimensionnal Model • Problematic of the Logical Design • The Multidimensional Model: fact, measures, dimensions • 3. Implementing a Dimensional Model in ROLAP • Star schema • Snowflake schema • Aggregates and views • 4. Logical Design: From Fact schema to Logical schema in ROLAP context • From fact schema to relational star-schema: basic rules • Examples towards Relational Star Schema • Examples towards Relational Snowflake Schema • Advanced logical modelling Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 3 • Livres • Golfarelli M., Rizzi S., « Data Warehouse Design : Modern Principles and Methodologies », McGrawHill, 2009. • Kimball R., Ross, M., « Entrepôts de données : guide pratique de modélisation dimensionnelle », 2°édition, Ed. Vuibert, 2003, ISBN : 2- 7117-4811-1. • Franco J-M., « Le Data Warehouse ». Ed. Eyrolles, Paris, 1997. ISBN 2- 212-08956-2. • Cours • Course of M. Golfarelli M. and S. Rizzi, University of Bologna • Course of M. Böhlen and J. Gamper J., Free University of Bolzano • … Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 4 • Conceptual Design & Logical Design • Life-Cycle • Design Phases • Schemata derivations
  • 2. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 5 • Entite-Relation models are not very useful in modeling DWs • Is now universally recognized that a DW is based on a multidimensional view of data : ! But there is still no agreement on HOW to implement its conceptual design ! • Most of the time, DW design is at the logical level : a multidimensional model (star/snowflake schema) is directly designed : ! But a star schema is nothing but a relational schema: it contains only the definition of a set of relations and integrity constraints ! • A better approach: ! 1) Design first a conceptual model : Conceptual Design ! 2) Then translate this conceptual model into a logical model : Logical Design Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 6 J. Gamper, Free University of Bolzano, DWDM 2012/13 40
  • 3. % 0 3 0 ' -2 $ business user designer db administrator !0 Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 7 The Logical Design transforms the Conceptual Schema for a DM into a Logical Schema : • Choice of the type of logical schema • Translation of conceptual schema to logical schema • Optimization (view materialization, fragmentation) Different principles from the one used in operational databases : • data redundancy • denormalization of tables J. Gamper, Free University of Bolzano, DWDM 2012-13 10
  • 4. A
  • 5. 6
  • 6. 7
  • 7. 6
  • 8. 1
  • 9. 4
  • 10. 1
  • 11. 5
  • 12. 1
  • 13. 3
  • 14. 4
  • 15. 5 Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 8 design workload refinement logical design physical design final user DWs are based on a pre-existing information system Methodological frameworkMethodological framework (2)(2) LogicalLogical SchemeScheme LOGICAL DESIGN Workload Target logical model PhysicalPhysical SchemeScheme PHYSICAL DESIGN Workload Target DBMS E/RE/R SchemeScheme chiavenegozio negozio città regione indirizzo resp.vendite N1 …. …. …. …… ……… N2 chiavetempo chiavenegozio chiave_prodotto quantvenduta incasso num_clienti T1 N1 P1 10 10000002 T1 N1 P2 8 12000008 T1 N2 P5 15 15000005 … ….. …… ……. RelationalRelational SchemeScheme ConceptualConceptual SchemeScheme CONCEPTUAL DESIGN Facts Preliminary workload
  • 16. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 9 • The Multidimensional Model: ! Fact, ! Measures, ! Dimensions • Star and Snowflake Schemata • Aggregates and Views Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 10 Multidimensional Model : • is a logical model • has one purpose: Data analysis • is the most popular data model for DW • is more built in “meaning” : ! What is important ! What describes the important ! What we want to optimize ! Automatic aggregations means easy querying • is recognized by OLAP/BI tools : Tools offer powerful query facilities based on MD design Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 11 • Data is divided into facts (with measures) and dimensions • Facts ! are the important entity, e.g., a sale ! have measures that can be aggregated, e.g., sales price • Dimensions : ! describe facts ! Ex : a sale has the dimensions Product, Store and Time • Goal for dimensional modeling: ! Surround facts with as much context/dimensions as possible (redundancy may be ok in well-chosen places) ! But you should not try to model all relationships in the data (unlike E/R and OO modeling!) Facts (data) “live” in a multidimensional « cube » Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 12 Facts represent the subject of the desired analysis : the ”important” in the business that should be analyzed : • A fact is most often identified via its dimension values : ! A fact is a non-empty cell ! Some models give facts an explicit identity • Generally, a fact should : ! be attached to exactly one dimension value in each dimension; ! only be attached to dimension values in the bottom levels ! Ex : if the lowest time granularity is day, for each fact the exact day should be specified ! some models do not require this.
  • 17. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 13 Different types of facts are distinguished : • Event facts (transaction) : a fact for every business event (Ex : sale) • « Fact-less » facts : ! A fact per event (Ex : customer contact) ! No numerical measures ! An event happened for a dimension value combination • Snapshot fact : ! A fact for every dimension combination at given time interval ! Captures current status (Ex : inventory) • Cumulative snapshot facts : ! A fact for every dimension combination at given time interval ! Captures cumulative status up to now (Ex : sales to date) Every type of facts answers different questions : often event facts and snapshot facts exist Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 14 • Dimensions are the core of multidimensional databases • Other types of databases do not support dimensions • Dimensions are used for : ! Selection of data ! Grouping of data at the right level of detail • Dimensions consist of dimension values Ex : • Product dimension has values ”milk”, ”cream”, ... • Time dimension has values ”1/1/2001”, ”2/1/2001”,.. • Dimension values may have an ordering : ! Used for comparing cube data across values ! Especially used for Time dimension Ex : percentage of sales increase compared with last month Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 15 • Dimensions encode hierarchies with levels : Typically 3-5 levels (of detail) • Dimension values are organized in a tree structure or lattice : ! Ex : Product: Product - Type - Category Store: Store - Area - City County Time: Day - Month - Quarter - Year • Dimensions have a bottom level and a top level (ALL) • Levels may have attributes simple, non-hierarchical information ! Ex : Day has Workday as attribute • General rule: dimensions should contain much information : ! Ex : Time dimensions may contain holiday, season, events,... Good dimensions have 50-100 or more attributes/levels Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 16 • A location dimension with attributes street, city, province_or_state, and country encodes implicitly the following hierarchy : Instance country city Province_or_state All Canada USA British Colombia Quebec Montréal Québec CityVancouver Victoria... ... New York Buffalo... New York Illinois Chicago Urbana...
  • 18. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 17 • A cube may have many dimensions : ! More than 3 – the term ”hypercube” is sometimes used ! Theoretically no limit for the number of dimensions ! Typical cubes have 4-12 dimensions • But only 2-3 dimensions can be viewed at a time : ! Dimensionality reduced by queries via projection/aggregation • A cube consists of cells : ! A given combination of dimension values ! A cell can be empty (no data for this combination) ! A sparse cube has many empty cells ! A dense cube has few empty cells ! Cubes become sparser for many/large dimensions. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 18 • Granularity of facts is important : ! What does a single fact mean? ! Determines the level of detail ! Given by the combination of bottom levels ! Ex : ”total sales per store per day per product” • Important for number of facts : Scalability • Often the granularity is a single business transaction : ! Ex : sale ! Sometimes the data is aggregated (total sales per store per day per product) ! Aggregation might be necessary for scalability Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 19 • Measures represent the fact property that users want to study and analyze : ! Ex : the total sales price • A measure has 2 components : ! Numerical value (ex : sales price) ! Aggregation formula (ex : SUM): used for aggregating / combining a number of measure values into one • Additivity is an important property for measures : ! Single fact table rows are (almost) never retrieved, but aggregations over millions of fact rows • Measure value determined by the combination of dimension values : ! Measure value is important for all aggregation levels. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 20 3 types of measures are distinguished : • Additive measures : ! Can be aggregated over all dimensions using SUM operator ! Ex : sales price, gross profit computed from sales and cost ! Often occur in event facts • Semi-additive measures : ! Cannot be aggregated over some dimensions – typically time ! Ex : inventory: additive across time or store, non-additive accross product ! Often occur in snapshot facts • Non-additive measures : ! Cannot be aggregated over any dimensions ! Ex : unit cost ! Occur in all types of facts
  • 19. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 21 • 1. Choose the business process(es) to model : ! Ex : Sales • 2. Choose the granularity of the business process : ! Ex : Items by Store by Promotion by Day ! Low granularity is needed ? ! Are individual transactions necessary/feasible ? • 3. Choose the dimensions : ! Ex : Time, Store, ... • 4. Choose the measures : ! Ex : Dollar_sales, unit_sales, dollar_cost, customer_count Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 22 • Star schema • Snowflake schema • Aggregates and views Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 23 • Is a common approach to draw a dimensional model • Consists of : one fact table and many dimension tables : SalesFT ProductId StoreId DateId Quantity ProductDT ProductId Product Type Category StoreDT StoreId Store City Country DateDT DateId Day DateFull DayOfWeek CalMonth CalYear Holiday dimension tables fact table Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 24 J. Gamper, Free University of Bolzano, DWDM 2012-13 16
  • 20. 3
  • 21.
  • 22. !
  • 23. #
  • 25. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 25 Forces : • Simple - ease-of-use • Relatively flexible • Fact table is normalized (dimensions tables are not necessary normalized) • Dimension tables often relatively small • “Recognized” by many RDBMS - good performance Weakness : • Hierarchies are ”hidden” in the columns • Dimension tables are de-normalized = From relational Star schema to relational « Snowflake » schema Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 26 • Merge several star schemata, which use common dimensions • Has consequently several facts and dimensions common or not Ex : Drugs sales in pharmacies • Make up of 2 star-schemas : ! first concerning sales in pharmacies (SALE) ! second concerning medecin prescription (PRESCRIPTION) • DATE and STORE dimensions are shared by PRESCRIPTION and SALE facts. DATE DateId Day Month Year SALE DateId ProductId StoreId Quantity Amount STORE StoreId Store City Country PRODUCT ProductId Gamme ProductName Color PRESCRIPTION DateId DrugId StoreId DrugNumber Honorary DRUG DrugId Category Molecule SecondaryEffect Posology dimension tables fact tables Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 27 • Is obtained from a star schema by breaking down one or more dimension tables into smaller tables to remove transitive functional dependencies • Dimension table referenced in the fact table are « primary dimension tables » and other are « secondary dimension tables » SALES ProductId StoreId DateId Sale TYPE TypeId Type CategoryId STORE StoreId Store CityId DATE DateId Day MonthId PRODUCT ProductId Product TypeId MonthYearDescription MonthId Month YearId Primary dimension tables Fact table Secondary dimension tables CATEGORY CategoryId category CityId StoreCity State Country CategoryId CITY Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 28 Forces : • Hierarchies are made explicit/visible • Very flexible • Dimension tables use less space : ! However this is a minor saving ! Disk space of dimensions is typically less than 5 percent of disk for DW Weakness : • Harder to use due to many joins • Worse performance : Ex : efficient bitmap indexes are not applicable
  • 26. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 29 View = Fact table that include Aggregate Data • Primary view : defined in the fact schema dimensions (primary group-by sets), populated by operational data • Secondary views : new fact tables including aggregate(s) (secondary group-by sets), populated by others views (not by operational data) star schema some sales schema views • v1 = primary view and v2 … v5 = secondary views • if vi - vj then vj can be calculated by aggregating vi data SALES keyStore keyDate keyProduct quantity receips unitPrice nbOfCustomer PRODUCT ProductId Product Type Category STORE keyStore store storeCity state country salesManager salesDistrict DATE keyDate date month quarter year day week holiday v1 = {product, date, store} v2 = {type, date, storeCity} v4 = {type, month, state} v3 = {category, month, storeCity} v5 = {quarter, state} Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 30 Solution 1 : the easier solution in a star schema, is to store both primary view data and secondary view data in the same fact table : SALES keyStore keyDate keyProduct quantity receipts … (fact table) 1 1 1 170 85 … 2 1 1 300 150 … 3 1 1 1700 850 … … … … … … … STORE keyStore store storeCity state … (dimension table) 1 Coop1 Colombus Ohio … 2 - Austin Texas … (- = NULL) 3 - - Texas … … … … … … Aggregation level of fact table tuples specified by corresponding tuples in dimension table : • the dimension table STORE related to aggregate data will have NULL values in all attributes whose aggregation level is finer than the current one • in the fact table SALES : ! first tuple is related to one sale, ! second tuple store aggregate data for every sale in Austin, ! third tuple sums up all the sales in Texas. only fact table is used for request, but performance are bad because table is too big Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 31 Solution 2: a common solution is to store different group-by sets into separate fact tables in a constellation schema : • here a fact table concerne primary view V1 and an other V5 secondary view • dimension tables can be merged as in solution 1, or replicate for each aggregate view • the fact tables that correspond to the group-by sets where one or more dimensions are completely aggregated do not have foreign key referencing these dimensions the size of fact table size is far larger than size of the dimension tables, then performance depends of the fact table optimisation V5 view keyStore keyDate quantity receips unitPrice nbOfCustomer PRODUCT ProductId Product Type Category STORE keyStore store storeCity state country salesManager salesDistrict DATE keyDate date month quarter year day week holiday V1 view keyStore keyDate keyProduct quantity receips unitPrice nbOfCustomer Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 32 Solution 3: dimension tables are replicated for every view including only the set of attributes that are valide for the aggregation level at which that dimension table is created : • here are V1 primary view and V5 secondary view with their specific dimension tables • note that the key of the fact table for secondary view V5 has not attribute related to store hierarchy, which is completely aggregated this solution is the best in performance because the table acces are optimized, however replicate dimension tables need disk space V5 view keyStore keyQuarter quantity receips unitPrice nbOfCustomer PRODUCT ProductId Product Type Category STORE keyStore store storeCity state country salesManager salesDistrict DATE keyDate date month quarter year day week holiday V1 view keyStore keyDate keyProduct quantity receips unitPrice nbOfCustomer QUARTER keyQuarter quarter year STATE keyState state country
  • 27. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 33 Solution 4: a compromise, where aggregate views are materialized in a snowflake schema : • take advantage of the optimization achieved with aggregate data by aggregation level without replicating dimension table the sales fact is modeled by a snowfake schema V5 view keyStore keyQuarter quantity receips unitPrice nbOfCustomer PRODUCT ProductId Product Type Category STORE keyStore store storeCity keyState DATE keyDate date month keyQuarter year day week holiday V1 view keyStore keyDate keyProduct quantity receips unitPrice nbOfCustomer QUARTER keyQuarter quarter year STATE keyState state country Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 34 • From fact schema to relational star-schema: basic rules • Examples towards Relational Star Schema • Examples towards Relational Snowflake Schema • Advanced logical modelling Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 35 The Logical Design transforms the Conceptual Schema for a DM into a Logical Schema : • Choice of the type of logical schema • Translation of conceptual schema to logical schema • Optimization (view materialization, fragmentation) Different principles from the one used in operational databases : • data redundancy • denormalization of tables J. Gamper, Free University of Bolzano, DWDM 2012-13 10
  • 28. A
  • 29. 6
  • 30. 7
  • 31. 6
  • 32. 1
  • 33. 4
  • 34. 1
  • 35. 5
  • 36. 1
  • 37. 3
  • 38. 4
  • 39. 5 Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 36 design workload refinement logical design physical design final user DWs are based on a pre-existing information system Methodological frameworkMethodological framework (2)(2) LogicalLogical SchemeScheme LOGICAL DESIGN Workload Target logical model PhysicalPhysical SchemeScheme PHYSICAL DESIGN Workload Target DBMS E/RE/R SchemeScheme chiavenegozio negozio città regione indirizzo resp.vendite N1 …. …. …. …… ……… N2 chiavetempo chiavenegozio chiave_prodotto quantvenduta incasso num_clienti T1 N1 P1 10 10000002 T1 N1 P2 8 12000008 T1 N2 P5 15 15000005 … ….. …… ……. RelationalRelational SchemeScheme ConceptualConceptual SchemeScheme CONCEPTUAL DESIGN Facts Preliminary workload
  • 40. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 37 Conceptual Design : Step to derive Dimensional Fact (DF) schemata from Relational schema of operational sources (DB) : • 1. Finding and defining FACT from Relational/E-R schema • 2. Building the Attribute Tree from Relational/E-R schema • 3. Building the Fact Schemata from Attribute Tree Logical Design : • transforms the Fact Schemata into a Logical Schema, a Relational Schema (in ROLAP strategy) Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 38 Fact Schema Relational Schema !#$%'()*% +'% ()', -%. /. /'%0).# ()*%1 2.* 34%1 56678% /7'# 9''% /)4'.#:.% .%07;'.'7) *.))6678=4 ?@A:B :$)4' +7;8)4' +4.'7) (71%; 78=4 !!!#$%'#()*+ !#$%'()*#$'##$+($',,$(-$!#$%$(.$/#0(1#/$)#2,+3 !#$% '() $*+) !*,(%-.()+/01) $-2) !)34(5 67'(*03 $*89)(/01) 26785'.) /0.( :,,*8)/01) ;'+)117).. =0*3(. :,,*8)/01)=6785'.)'()=7047)..*?);6+@)7 26785'.) =7047)..*?);6+@)7 =6785'.)$*+) A0B1)7;'+) *.8063( $-2)/01) ).87*2(*03 '-. =0*3(. /0.( 4,#$%5$(-$#$#/(#/$'##5$#$6*-1(,-%2$/#0#-/#-17$6',8$!'(%)*(+,-./+$,$0.1(*$%.$)##- '#8,9#/$,$8%:#$0.1(*$%$1(2/$,6$#$',,5$.,$%$($1%-$)#$1,.#-$%.$%$8#%.*'#3$!#$.%8#$(.$/,-#$6,' 2345(.13$!#$(1:#$%-/$#$.:(0%..$;'%-*2%'((#.$%'#$'#8,9#/5$%-/$%$%675**84$96+($62%;$(.$%//#/$, FT_RENTAL pickupOffice dropoffOffice idCar pickupDate PaymentMode Amount Discount Duration Miles DT_CAR idCar Car Fuel Model Brand Category RegistrationDate DT_OFFICE idOffice Office City State Country Area DT_DATE idDate Date Month Year Pickup dropoff Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 44 Logical schema obtained from prevous rental fact schema is the following snowflake schema : • DT_CAR (idCar, Car, Fuel, Model, Brand, Category, registrationDate:DT_DATE) • DT_OFFICE (idOffice, Office, City, State, Country, Area) • DT_DATE (idDate, Date, Month, Year) • FT_RENTAL (pickupOffice:DT_OFFICE, dropoffOffice:DT_OFFICE, idCar:DT_CAR, pickupDate:DT_DATE, PaymentMode, Amount, Discount, Duration, Miles) Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 39 R1. From « Fact schema » to « Tables » : • Fact box of the fact schema leads to create a Fact Table (FT) • dimension leads to create a Dimension Table (DT) R2. Dimension table (DT) attributes : • dimension attribute of fact schema become DT attribute • the first dimension attribute (first level) become DT key attribute R3. Fact table (FT) attributes : • FT key attribute of each dimension become FT foreign key • measure attributes of fact schema become FT measure Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 40 Fact schema : week store product quantity SALE fact dimensions measures type category city countrymonthyear
  • 41. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 41 Relational schema : • SaleFT (ProductId, StoreId, DateId, Quantity) • WeekDT (ProductId, Product, Type, Category) • StoreDT (StoreId, Store, City, State, Country) • DateDT (DateId, Day, Month, Year) SalesFT ProductId StoreId DateId Quantity ProductDT ProductId Product Type Category StoreDT StoreId Store City Country DateDT DateId Day Month Year dimension tables fact table Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 42 Instances : J. Gamper, Free University of Bolzano, DWDM 2012-13 52 $=2 -;-$$$ 6.
  • 42. 2
  • 43. . %
  • 44. %$ ,
  • 45. ,
  • 48. ( $ ,
  • 49. = 2
  • 50. 2
  • 51. * )
  • 52. ( Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 43 OLAP Query on Star schema : Total quantity sold for each product type, week, and city, only for food products : SELECT City, Week, Type, SUM(Quantity) FROM WeekDT, StoreDT, ProductDT, SaleFT WHERE WeekDT.WeekID = SaleFT.WeekID AND StoreDT.StoreID = SaleFT.StoreID AND ProductDT.ProductID = SaleFT.ProductID AND ProductDT.Category = ‘Food’ GROUP BY City, Week, Type; Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 44 Relational schema : • SaleFT (ProductId, StoreId, DateId, Sale) • ProductDT (ProductId, Product, TypeId) • ProductType (TypeId, Type, CategoryId) • StoreDT (StoreId, Store, City, Country) • DateDT (DateId, Day, MonthId) • MonthYearDescription (MonthId, Month, YearId) SaleFT ProductId StoreId DateId Sale ProductType TypeId Type CategoryId StoreDT StoreId Store City Country DateDT DateId Day MonthId ProductDT ProductId Product TypeId MonthYearDescription MonthId Month YearId Primary dimension tables Fact table Secondary dimension tables
  • 53. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 45 Instances : J. Gamper, Free University of Bolzano, DWDM 2012-13 56 $=2 $(-$
  • 55. $ $ ,
  • 56. = =
  • 58. 2
  • 60. %$ ,
  • 61. ,
  • 62. #
  • 64. 2
  • 65. $* $)
  • 66. ( Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 46 OLAP Query on Snowflake schema : Total quantity sold for each product type, week, and city, only for food products : SELECT City, Week, Type, SUM(Quantity) FROM WeekDT, StoreDT, ProductDT, CityDT, TypeDT, SaleFT WHERE WeekDT.WeekID = SaleFT.WeekID AND StoreDT.StoreID = SaleFT.StoreID AND ProductDT.ProductID = SaleFT.ProductID AND StoreDT.CityID = CityDT.CityID AND ProductDT.TypeID = TypeDT.TypeID AND ProductDT.Category = ‘Food’ GROUP BY City, Week, Type; Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 47 « RENTAL » Fact schema obtained by the Conceptual Design : !#$%'()*% +'% ()', -%. /. /'%0).# ()*%1 2.* 34%1 56678% /7'# 9''% /)4'.#:.% .%07;'.'7) *.))6678=4 ?@A:B :$)4' +7;8)4' +4.'7) (71%; 78=4 !!!#$%'#()*+ !#$%'()*#$'##$+($',,$(-$!#$%$(.$/#0(1#/$)#2,+3 A0B1)7;'+) Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 48 Logical schema obtained from prevous rental fact schema is the following snowflake schema : • DT_CAR (idCar, Car, Fuel, Model, Brand, Category, registrationDate:DT_DATE) • DT_OFFICE (idOffice, Office, City, State, Country, Area) • DT_DATE (idDate, Date, Month, Year) • FT_RENTAL (pickupOffice:DT_OFFICE, dropoffOffice:DT_OFFICE, idCar:DT_CAR, pickupDate:DT_DATE, PaymentMode, Amount, Discount, Duration, Miles)
  • 67. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 49 Snowflake schema : FT_RENTAL pickupOffice dropoffOffice idCar pickupDate PaymentMode Amount Discount Duration Miles DT_CAR idCar Car Fuel Model Brand Category RegistrationDate DT_OFFICE idOffice Office City State Country Area DT_DATE idDate Date Month Year Pickup dropoff Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 50 Conceptual Modelling: • A descriptive attribute contains additional information about an attribute of the hierarchy • It cannot be used for aggregation Logical Modelling: • If a descriptive attribute is linked to a dimensional attribute, it have to be included in the dimension table for the hierarchy that contains it Ex : the size of the product have to be included in PRODUCT table • If a descriptive attribute is linked to a fact attribute, it have to be included in the fact table with measures Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 51 Conceptual Modelling: A cross-dimensional attribute b is a dimensionnal or descriptive attribute whose value is defined by the combination of 2 or more dimensional attributes a1, …am, possibly belonging to different hierarchies. Logical Modelling: translation leads to a new table that includes the b cross- dimensional attribute and has the a1, …am attributes as primary key Ex : a product Value Added Tax (VAT) depends both on the product category and on the country where the product is sold salesDistrict store storeCity state product quantity receips unitPrice numberOfCustomer SALE type brand brandCity department category salesManager marketingGroup country size diet cross-dimensionnal attributes VAT SALES productId storeId dateId quantity ... PRODUCT productId product type category ... STORE storeId store city country VAT country category VAT Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 52 Conceptual Modelling: Entire portion of hierarchies are frequently replicated 2 or more time in fact schemata. Ex: calling and called phone numbers … Logical Modelling: we have to avoid multiples dimension tables, which contain all or part of the same data. Ex: the hierarchies contain all the same data, we choose to insert 2 foreign keys referencing the one table that models the telephone number in the fact table: roles hour number duration CALL shared hierarchy date month year O calling called telNumber type district CALLS KeyCalled KeyCalling KeyDate KeyHour number duration NUMBERS KeyNumber telNumber type dictrict
  • 68. Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 53 Conceptual Modelling: A multiple arc models a many-to-many association between the 2 dimensional attributes it connects Ex : in a fact schema modeling the sales of books, book is a dimension, but it would not be relevant to model author as a dimensional child attribute of book because many different authors can write many books = the relationship between books and authors is modeled as a multiple arc: Logical Modelling: solution is to insert a bridge-table to model multiple arcs: datemonth book author quantity receips unitPrice numberOfCustomer SALE quarteryear genre holiday day week multiple arc SALES KeyBook KeyDate number receipts BOOK KeyBook book genre AUTHOR KeyAuthor author BRIDGE_AUTHOR KeyBook KeyAuthor weight Bernard ESPINASSE - Data Warehouse Logical Modelling and Design 54 Conceptual Modelling: Non-additive measure can be explicitely specify with its operator(s) used for aggregation – other that SUM Ex: AVG and MIN for inventory level: Logical Modelling: set a new mesure for each aggregation operator. • Count is a support measure necessary to calculate average level • minLevel is required to calculate the minimum level for each month and for each product type datemonth warehouse city level incomingQuantity INVENTORY quarteryear address week non additive measure country AVG, MIN product MONTH KeyMonth nomth year TYPE KeyType type ... INVENTORY KeyMonth KeyType avgLevel minLevel count incomingQuantity