SlideShare a Scribd company logo
1 of 35
The Data Warehouse Lifecycle Bart Lowe Decision Source Inc.
Agenda Discuss high level concepts related to Data warehousing
The Data Warehouse	Must… Make information easily accessible Present information consistently Be adaptive & resilient to change Be Secure Serve as the foundation for decision making
The Business Community Must… Accept and trust the data warehouse if it is to be successful
Data Warehouse Lifecycle
Data Warehouse Lifecycle
Data Warehouse Components
Data Warehouse Components
Source Systems ,[object Object]
Data schemas optimized for transactions not queries
Difficult to share data
Typically do not maintain historical data,[object Object]
This is typically the most difficult and labor intensive component
Data is cleansed & conformed
Typically a normalized data schema
No direct querying is allowed to this component,[object Object]
Consists of a series of conformed dimensional data marts
Each Data Mart represents a difference business process.
Dimensional modeling emphasizes simplicity & query performance.,[object Object]
In General only a small subset of users will need true ad-hoc query capability
80-90% of users will used a parameterized analytic system,[object Object]
Fact Table This is the primary table in a dimensional model The measurements of the dimensional model are stored here Each measurement is tracked at the intersection of several dimensions This is the “grain” of the model Most useful facts are additive
Dimension Table Descriptors of each fact Tend to have many attributes but fewer rows Tend to be used as query constraints. The better the attribute descriptions the better the warehouse Typically highly denormalized
Star Schema This is a fact table joined to a set of dimensions Relates data in a manner that is familiar to business users. Symmetrical nature allows for many answering many different business questions One dimensional model will exist for each business process.  A  single data warehouse can have dozens of these models.
Dimensional Modeling Key Concepts
Store the most atomic data By storing the most detail data possible you can ensure that users can drill to the level they need.  Its OK to provide aggregate facts as well to improve performance.
Conformed Dimensions By conforming your dimensions you can correlate performance across business processes. Can be very painful (but worth it) if combining data from disparate systems.
Always use an artificial key as the primary key Surrogate Key allow you to: Protect you model from changes in the source system Integrate data from multiple sources Add rows that do not exist in the source system. Track changes to dimensions over time. Use Surrogate Keys
A key design consideration is what to do when dimension values change. A change may or may not have business meaning. Three ways to handle changes Slowly Changing Dimensions
Slowly Changing Dimension Types Type I Simply overwrite the old values. Simplest case, used when you don’t care about changes to data. Type II Create a new dimension row for new values Existing facts still relate to old dimension value Used when you do care about the historical changes. Type III Add a new column to table to store the new value Rarely used.
Dates are a fundamental Business concept and nearly every DW has a date dimension The date dimension is the classic role playing dimension. Allows rollups/filters on any date related attribute such as month/quarter/year  Date dimension records still use a surrogate to handle unknown dates. Date Dimensions
Snowflaking is the process of hooking up lookup tables to a dimension. This is in a way re-normalizing the data. Snowflaking is in general discouraged since it adds complexity to the model. Snowflaking
Most relationships are one-to-many.  This is the simplest case. Real world scenarios are often more complex. Many to Many between facts & dimensions are represented by creating a bridge table between the facts and the dimension Many to Many Relationships
Hierarchies summarize or group the data within the dimension. Typically are de-normalized into the dimension table Hierarchies
There are three types of fact tables Transaction Tracks each transaction as it occurs. Periodic Snapshot Captures cumulative performance over a specific period of time Often used for periodic rollups Accumulating Snapshot Updated over time Types of Fact Tables

More Related Content

What's hot

Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodologyDatabase Architechs
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasEric Matthews
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMDATAVERSITY
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)DATAVERSITY
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Reference master data management
Reference master data managementReference master data management
Reference master data managementDr. Hamdan Al-Sabri
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsBoris Otto
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse conceptsobieefans
 
3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final PresentationJames Chi
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata ManagementDATAVERSITY
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of MetadataDATAVERSITY
 

What's hot (20)

Master Data Management methodology
Master Data Management methodologyMaster Data Management methodology
Master Data Management methodology
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Warehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemasWarehousing dimension star-snowflake_schemas
Warehousing dimension star-snowflake_schemas
 
Lessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDMLessons in Data Modeling: Data Modeling & MDM
Lessons in Data Modeling: Data Modeling & MDM
 
Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)Data Governance Takes a Village (So Why is Everyone Hiding?)
Data Governance Takes a Village (So Why is Everyone Hiding?)
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
adb.pdf
adb.pdfadb.pdf
adb.pdf
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Reference master data management
Reference master data managementReference master data management
Reference master data management
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
 
Data warehouse concepts
Data warehouse conceptsData warehouse concepts
Data warehouse concepts
 
3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation3 Keys To Successful Master Data Management - Final Presentation
3 Keys To Successful Master Data Management - Final Presentation
 
Best Practices in Metadata Management
Best Practices in Metadata ManagementBest Practices in Metadata Management
Best Practices in Metadata Management
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Mdm: why, when, how
Mdm: why, when, howMdm: why, when, how
Mdm: why, when, how
 
The Importance of Metadata
The Importance of MetadataThe Importance of Metadata
The Importance of Metadata
 

Similar to The Data Warehouse Lifecycle

Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data WarehousesMichael Lamont
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehousekiran14360
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3Malik Alig
 
Data warehouse
Data warehouseData warehouse
Data warehouse_123_
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional modelGersiton Pila Challco
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overviewashok kumar
 
Big Data Analytics Architecture Powerpoint Presentation Slides
Big Data Analytics Architecture Powerpoint Presentation SlidesBig Data Analytics Architecture Powerpoint Presentation Slides
Big Data Analytics Architecture Powerpoint Presentation SlidesSlideTeam
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptPalaniKumarR2
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional ModellingAshish Chandwani
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.pptSumathiG8
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysNEWYORKSYS-IT SOLUTIONS
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guidethomasmary607
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptDougSchoemaker
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.pptBsMath3rdsem
 

Similar to The Data Warehouse Lifecycle (20)

Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Data Warehouse 101
Data Warehouse 101Data Warehouse 101
Data Warehouse 101
 
introduction to datawarehouse
introduction to datawarehouseintroduction to datawarehouse
introduction to datawarehouse
 
Dimensional modelling-mod-3
Dimensional modelling-mod-3Dimensional modelling-mod-3
Dimensional modelling-mod-3
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
ITReady DW Day2
ITReady DW Day2ITReady DW Day2
ITReady DW Day2
 
Designing the business process dimensional model
Designing the business process dimensional modelDesigning the business process dimensional model
Designing the business process dimensional model
 
Datawarehouse Overview
Datawarehouse OverviewDatawarehouse Overview
Datawarehouse Overview
 
Date Analysis .pdf
Date Analysis .pdfDate Analysis .pdf
Date Analysis .pdf
 
Big Data Analytics Architecture Powerpoint Presentation Slides
Big Data Analytics Architecture Powerpoint Presentation SlidesBig Data Analytics Architecture Powerpoint Presentation Slides
Big Data Analytics Architecture Powerpoint Presentation Slides
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
Introduction to Dimesional Modelling
Introduction to Dimesional ModellingIntroduction to Dimesional Modelling
Introduction to Dimesional Modelling
 
20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt20IT501_DWDM_PPT_Unit_I.ppt
20IT501_DWDM_PPT_Unit_I.ppt
 
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ NewyorksysWhat is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
What is OLAP -Data Warehouse Concepts - IT Online Training @ Newyorksys
 
Data Warehouse Basic Guide
Data Warehouse Basic GuideData Warehouse Basic Guide
Data Warehouse Basic Guide
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
dw_concepts_2_day_course.ppt
dw_concepts_2_day_course.pptdw_concepts_2_day_course.ppt
dw_concepts_2_day_course.ppt
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 

The Data Warehouse Lifecycle

  • 1. The Data Warehouse Lifecycle Bart Lowe Decision Source Inc.
  • 2. Agenda Discuss high level concepts related to Data warehousing
  • 3. The Data Warehouse Must… Make information easily accessible Present information consistently Be adaptive & resilient to change Be Secure Serve as the foundation for decision making
  • 4. The Business Community Must… Accept and trust the data warehouse if it is to be successful
  • 9.
  • 10. Data schemas optimized for transactions not queries
  • 12.
  • 13. This is typically the most difficult and labor intensive component
  • 14. Data is cleansed & conformed
  • 16.
  • 17. Consists of a series of conformed dimensional data marts
  • 18. Each Data Mart represents a difference business process.
  • 19.
  • 20. In General only a small subset of users will need true ad-hoc query capability
  • 21.
  • 22. Fact Table This is the primary table in a dimensional model The measurements of the dimensional model are stored here Each measurement is tracked at the intersection of several dimensions This is the “grain” of the model Most useful facts are additive
  • 23. Dimension Table Descriptors of each fact Tend to have many attributes but fewer rows Tend to be used as query constraints. The better the attribute descriptions the better the warehouse Typically highly denormalized
  • 24. Star Schema This is a fact table joined to a set of dimensions Relates data in a manner that is familiar to business users. Symmetrical nature allows for many answering many different business questions One dimensional model will exist for each business process. A single data warehouse can have dozens of these models.
  • 26. Store the most atomic data By storing the most detail data possible you can ensure that users can drill to the level they need. Its OK to provide aggregate facts as well to improve performance.
  • 27. Conformed Dimensions By conforming your dimensions you can correlate performance across business processes. Can be very painful (but worth it) if combining data from disparate systems.
  • 28. Always use an artificial key as the primary key Surrogate Key allow you to: Protect you model from changes in the source system Integrate data from multiple sources Add rows that do not exist in the source system. Track changes to dimensions over time. Use Surrogate Keys
  • 29. A key design consideration is what to do when dimension values change. A change may or may not have business meaning. Three ways to handle changes Slowly Changing Dimensions
  • 30. Slowly Changing Dimension Types Type I Simply overwrite the old values. Simplest case, used when you don’t care about changes to data. Type II Create a new dimension row for new values Existing facts still relate to old dimension value Used when you do care about the historical changes. Type III Add a new column to table to store the new value Rarely used.
  • 31. Dates are a fundamental Business concept and nearly every DW has a date dimension The date dimension is the classic role playing dimension. Allows rollups/filters on any date related attribute such as month/quarter/year Date dimension records still use a surrogate to handle unknown dates. Date Dimensions
  • 32. Snowflaking is the process of hooking up lookup tables to a dimension. This is in a way re-normalizing the data. Snowflaking is in general discouraged since it adds complexity to the model. Snowflaking
  • 33. Most relationships are one-to-many. This is the simplest case. Real world scenarios are often more complex. Many to Many between facts & dimensions are represented by creating a bridge table between the facts and the dimension Many to Many Relationships
  • 34. Hierarchies summarize or group the data within the dimension. Typically are de-normalized into the dimension table Hierarchies
  • 35. There are three types of fact tables Transaction Tracks each transaction as it occurs. Periodic Snapshot Captures cumulative performance over a specific period of time Often used for periodic rollups Accumulating Snapshot Updated over time Types of Fact Tables
  • 37. System Sizing Considerations Storage The fact table volumes will drive storage requirements. Don’t forget to account for staging storage needs. Performance Understand the usage complexity of your community. Predefined reports & queries can be cachedre-aggregated. The more ad-hoc analysis that is used will impact the hardware requirements. Must understand how many simultaneous user the DW will be asked to support. Memory All the BI components Love RAM. Use 64-bit hardware to address more memory space.
  • 38. System Configuration Considerations All-In-One Configuration All components hosted on a single server Appropriate for small deployment or POC’s
  • 39. System Configuration Considerations Separate Reporting Server Reporting Server scaled out Appropriate for mid-sized deployments
  • 40. System Configuration Considerations Scale Out Deployment Both Report Services & Analysis Services have their own servers Appropriate for larger deployments Can be scaled massively from here
  • 42. Becoming overly focused on technology rather then business requirements & goals Failure to embrace an influential management visionary as the business sponsor Tackle a huge multiyear project rather then smaller iterative development efforts. Paying more attention to back-end issues and ease of development then front-end performance and simplicity. Common Pitfalls
  • 43. Making the query able data overly complex Populating model without properly conforming your dimensions Loading only summary data into your models Presuming that the business requirements are static Neglect to understand that the data warehouse success is tied to user acceptance. Common Pitfalls Continued….

Editor's Notes

  1. These may seem simple but these principles are the foundation for the deign methodology.For business users to be able to navigate the system the tools and most importantly the data must simple and easy to use.Consistency requires a thorough ETL process to cleanse & conform the data.Change is inevitable. We need a design that is resilient to change.Security …Must have the right data in order to support decisions this means up front analysis focuses on the business need
  2. Ultimately if any system doesn’t satisfy some business need, it is of no value and is a failure.
  3. Go through Each Component.
  4. Discuss Each bullet point
  5. Discuss Each bullet pointExamples of Cleansing activities:MisspellingsFormattingCapitalization ConformanceEmphasize that users are forbidden from executing queries on these data.3NF Data is to complex for most users.3NF is not optimized for query performance.
  6. Discuss Each bullet pointDiscuss what it means to be a conformed data martPoint out that dimensional modeling will be discussion in detail later on
  7. Discuss Each bullet pointSpecify the examples in this diagram and what role they play
  8. Discuss why additive facts are most usefulDescribe Semi additive factsNote that the primary key is the combo of all the foreign keys. A ROWID add little value and the index probably would be of any us either.
  9. Attribute description should avoid cryptic abbreviationsMinimize the use of codesShow the denormalized nature of one of these dimensions.Denormalized dimensions provide the following benefits.Simplified structure for non technical usersBetter query performanceSince dimensions typically have a relatively few number of rows the impact of reduced storage efficiency is minimal
  10. Walk through SCD2 example using dimensional model above
  11. Walk through the date dimension in the POC example
  12. Point out that the POCGLTransaction fact table is a transaction fact tableAnd the budget table is a periodic snapshot.