SlideShare a Scribd company logo
1 of 42
Introduction To Data
Warehousing
Triangle MySQL Users Group
March 24, 2017
Alex Meadows

Principal Consultant (Data and Analytics),
CSpring Inc.

Business Analytics Adjunct Professor, Wake
Tech

MS in Business Intelligence

Passion in developing BI solutions that provide
end users easy access to necessary data to
find the answers they demand (even the ones
they don’t know yet!)
Twitter: @OpenDataAlex LinkedIn: alexmeadows
GitHub: OpenDataAlex Email: ameadows@cspring.com
About Alex
Agenda

Why Data Warehousing

Use Case Modeling Decisions

Building Your Data Warehouse

Data Warehouse Gotchas

Q&A
Please feel free to ask questions throughout the presentation!
Why Data Warehouses?

Started being discussed in 1970

While databases existed, they were not relational/normalized
− Network/hierarchical in nature
− Design for query, not for data model

Reporting was hard
− System/application queries were not the same as management reporting queries
Design For Query
Shopping Order
Widgets
Thingys
Odds and Ends
Customers
Country
State
City
Bill Inmon
Data warehouses: subject-
oriented, integrated, time-variant
and non-volatile collection of
data in support of management's
decision making process
Business
Requirements
DATA
How Many Customers
Like Animals?
Dogs?
Dogs with short hair?
Customers Sales Marketing Employee
Enterprise Data Warehouse
STAR
SCHEMAS
Traditional Model
Business Intelligence Use Cases

Traditional Data Warehousing focuses on Descriptive and
Diagnostic questions

Predictive and Prescriptive questions require other types of tooling
(i.e. simulation modeling, statistics, etc.)
Use Case
Modeling Decisions
-or-
Do You Need A Data Warehouse?
Case 1: Operational Data Store
Holds All Data
● Classic use-case
● System bogged down with historic ‘valuable’
data
● Applications may try to take advantage of
hosting the data for features
Case 2: Adding Unnecessary
Objects
● Adding objects that meet more reporting needs
than application requirements
● Impacting maintainability through duplication of
data
● Database/application handling data movement
to these objects
Case 3: Complex Relationships
Based On Filtering Factors
● Storing data by clear delimiters
– Time – Quarters, Months, etc.
– Geography – Region, State, City, etc.
– Business logic
● Makes querying very complicated
● Also can impact high availability architecture
Do You Need A DWH?
● Case 1 – Data volume/historical data
● Case 2/3 – Transactional database not matching
reporting/analysis requirements
● If performance isn’t an issue (yet) then you have
some time
● If data volume is tiny (under ~50 GB) then maybe not
● We’re going to assume that the DWH is needed ;)
Building Your Data Warehouse
Traditional
Iterations On Existing Architecture
Inmon: 3rd
Normal Form
● Normalize on Objects, Relationships
● Focus on all data stores
– Join data sets as necessary
– Look into Master Data Management practices for
true store merging
Classroom Transaction System
Student
First Name
Student
Last Name
Student
System ID
Bob Young 1
Robert Young 2
Jennifer Owens 3
Andrew Collins 4
Student
ID
Class ID Student
Grade
1 1A
2 1B
3 1B
4 1C
2 2A
Class
ID
Class Name Class
Program
Class
Credits
1Intro to
Computer
Science
Business
Admin
4
2Clay Sculpting
101
Art 2
Classroom 3NF Data Warehouse
Student
First Name
Student
Last Name
Student
System ID
Studen
t ID
Bob Young 1 100
Robert Young 2 101
Jennifer Owens 3 102
Andrew Collins 4 103
Class ID Class
System ID
Class Name Class
Program
Class
Credits
200 1Intro to
Computer
Science
Business
Admin
4
201 2Clay Sculpting
101
Art 2
Stude
nt ID
Class ID Student
Grade
100 200A
101 200B
102 200B
103 200C
101 201A
DWH = Version Control
Class
ID
Class
System
ID
Class Name Class
Program
Class
Credits
Create
Date
Update Date Version
200 1Intro to
Computer
Science
Business
Admin
4 01/01/17 02/01/17 1
202 1Intro to
Computer
Science
Business
Admin
6 02/01/17 02/01/17 2
201 2Clay Sculpting
101
Art 2 01/01/17 01/01/17 1
Also For Relationships!
● Historical vs current
relationships
● Different ways of
handling version
control
(dimensionality)
Student ID Class ID Student
Grade
Create
Date
Update
Date
100 200A 01/01/17 01/01/17
101 200B 01/01/17 01/01/17
102 200B 01/01/17 01/01/17
103 200C 02/01/17 02/01/17
104 200D 02/01/17 02/01/17
103 202C 02/01/17 02/01/17
104 202D 02/01/17 02/01/17
104 201A 01/01/17 01/01/17
101 201A 01/01/17 01/01/17
Slowly Changing
Dimensions/Managing Changes
SCD 1
SCD 2
SCD 3
Dimensionality
● Concept originated with star schema
● Store data changes based on what is being
done with the data/long term utilization
● Build models based on objects and
relationships
Dimensionality
BUS Architecture
Star Schema Example
Student Dim
Class Dim
Student
Fact
Professor Dim
Dimension Example
Class ID Class Name Class Program Topic
200Intro to Computer ScienceBusiness Admin Computer
Science
202Intro to Computer ScienceBusiness Admin Computer
Science
201Clay Sculpting 101 Art Sculpture
Fact Table Example
Student ID Class ID Credit
Earned
Credit
Maximum
Date ID
100 200 26 120 20170101
101 200 37 120 20170101
102 200 12 120 20170101
103 200 42 120 20170101
104 200 16 120 20170101
103 202 80 120 20170101
104 202 120 120 20170101
104 201 90 120 20170101
101 201 26 120 20170101
Data Vault

Hybrid between 3NF and star schema

Created by Dan Linstedt

Persistent data layer – keep everything

Bring data over as needed
− Once touching an object, bring it all over

Can be hybrid between relational databases and Hadoop

Massive parallel loading, eventual consistency (with Hadoop)

1.0 documentation found at:

TDAN Article

2.0 documentation ->

Certification/training:

http://learndatavault.com/
Data Warehouse
Gotchas
Resetting The Data Warehouse
● Especially Star Schema
– Business Logic changes
– Missing requirements
● What to do?
– Reload from permanent
storage (3NF DWH/Data
Vault)
Performance Issues*
● Hitting the same/similar performance bottlenecks as
transactional system
● What to do?
– Check for proper indexing (can get complicated with star schema)
– Volume too high for platform? Consider alternatives (other data
stores, NoSQL for large static data sets)
– Matching too closely to transactional model? Look at tuning the
model for purpose
– Composite keys in star schema?
– Too many joins?
*This is a huge area, and trying to generalize it is difficult. There are other solutions we can
discuss :)
Introduction To Data Warehousing

More Related Content

What's hot

Essential Reference and Master Data Management
Essential Reference and Master Data ManagementEssential Reference and Master Data Management
Essential Reference and Master Data ManagementDATAVERSITY
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingDunn Solutions Group
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture janani thirupathi
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consultingadivasoft
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data WarehouseShanthi Mukkavilli
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on ReadKent Graziano
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse RequirementsDavid Walker
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData Blueprint
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data martAmit Sarkar
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecturepcherukumalla
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing Girish Dhareshwar
 

What's hot (20)

Essential Reference and Master Data Management
Essential Reference and Master Data ManagementEssential Reference and Master Data Management
Essential Reference and Master Data Management
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Data Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional ModelingData Warehouse Back to Basics: Dimensional Modeling
Data Warehouse Back to Basics: Dimensional Modeling
 
Data warehouse architecture
Data warehouse architecture Data warehouse architecture
Data warehouse architecture
 
Introduction to Relational Databases
Introduction to Relational DatabasesIntroduction to Relational Databases
Introduction to Relational Databases
 
Data Warehouse
Data Warehouse Data Warehouse
Data Warehouse
 
Basic Introduction of Data Warehousing from Adiva Consulting
Basic Introduction of  Data Warehousing from Adiva ConsultingBasic Introduction of  Data Warehousing from Adiva Consulting
Basic Introduction of Data Warehousing from Adiva Consulting
 
Dimensional Modelling
Dimensional ModellingDimensional Modelling
Dimensional Modelling
 
Introduction to Data Warehouse
Introduction to Data WarehouseIntroduction to Data Warehouse
Introduction to Data Warehouse
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse Requirements
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Data warehousing ppt
Data warehousing pptData warehousing ppt
Data warehousing ppt
 
Oltp vs olap
Oltp vs olapOltp vs olap
Oltp vs olap
 
Data warehouse
Data warehouse Data warehouse
Data warehouse
 
Data warehousing and data mart
Data warehousing and data martData warehousing and data mart
Data warehousing and data mart
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
Introduction to data warehousing
Introduction to data warehousing   Introduction to data warehousing
Introduction to data warehousing
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Viewers also liked

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Miningidnats
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Conceptsraulmisir
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewHealth Catalyst
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operationsTableau Software
 
5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the CloudTableau Software
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Health Catalyst
 
History of Business Intelligence
History of Business IntelligenceHistory of Business Intelligence
History of Business IntelligenceNic Smith
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data WarehousingJason S
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business IntelligenceAlmog Ramrajkar
 

Viewers also liked (13)

Data Warehousing and Data Mining
Data Warehousing and Data MiningData Warehousing and Data Mining
Data Warehousing and Data Mining
 
Data Warehousing Datamining Concepts
Data Warehousing Datamining ConceptsData Warehousing Datamining Concepts
Data Warehousing Datamining Concepts
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
 
Database vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative ReviewDatabase vs Data Warehouse: A Comparative Review
Database vs Data Warehouse: A Comparative Review
 
Four ways data is improving healthcare operations
Four ways data is improving healthcare operationsFour ways data is improving healthcare operations
Four ways data is improving healthcare operations
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud5 Reasons to Move Your BI to the Cloud
5 Reasons to Move Your BI to the Cloud
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
Clinical Data Repository vs. A Data Warehouse - Which Do You Need?
 
History of Business Intelligence
History of Business IntelligenceHistory of Business Intelligence
History of Business Intelligence
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Introduction to Business Intelligence
Introduction to Business IntelligenceIntroduction to Business Intelligence
Introduction to Business Intelligence
 

Similar to Introduction To Data Warehousing

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platformsJamesAnderson599331
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Pedro Mac Dowell Innecco
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemstaimur hafeez
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesDATAVERSITY
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Mark Tabladillo
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceSense Corp
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big datagaliasisense
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day Peter Ward
 
Advanced Database Management System_Introduction Slide.ppt
Advanced Database Management System_Introduction Slide.pptAdvanced Database Management System_Introduction Slide.ppt
Advanced Database Management System_Introduction Slide.pptBikalAdhikari4
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityDataconomy Media
 

Similar to Introduction To Data Warehousing (20)

Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Got data?… now what? An introduction to modern data platforms
Got data?… now what?  An introduction to modern data platformsGot data?… now what?  An introduction to modern data platforms
Got data?… now what? An introduction to modern data platforms
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)Data Lakehouse, Data Mesh, and Data Fabric (r2)
Data Lakehouse, Data Mesh, and Data Fabric (r2)
 
Database 1 Introduction
Database 1   IntroductionDatabase 1   Introduction
Database 1 Introduction
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
 
E05WAREH1.PPT
E05WAREH1.PPTE05WAREH1.PPT
E05WAREH1.PPT
 
The Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystemsThe Six pillars for Building big data analytics ecosystems
The Six pillars for Building big data analytics ecosystems
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Data mining notes
Data mining notesData mining notes
Data mining notes
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
Managing Large Amounts of Data with Salesforce
Managing Large Amounts of Data with SalesforceManaging Large Amounts of Data with Salesforce
Managing Large Amounts of Data with Salesforce
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
What is bi analytics and big data
What is bi analytics and big dataWhat is bi analytics and big data
What is bi analytics and big data
 
Analytics in a day
Analytics in a day Analytics in a day
Analytics in a day
 
Advanced Database Management System_Introduction Slide.ppt
Advanced Database Management System_Introduction Slide.pptAdvanced Database Management System_Introduction Slide.ppt
Advanced Database Management System_Introduction Slide.ppt
 
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data VirtualityBeyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
Beyond the Data Lake - Matthias Korn, Technical Consultant at Data Virtuality
 

More from Alex Meadows

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven WorldAlex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?Alex Meadows
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A ServiceAlex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsAlex Meadows
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview Alex Meadows
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceAlex Meadows
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overviewAlex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleAlex Meadows
 

More from Alex Meadows (16)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Open source data_warehousing_overview
Open source data_warehousing_overviewOpen source data_warehousing_overview
Open source data_warehousing_overview
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Recently uploaded

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 

Recently uploaded (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 

Introduction To Data Warehousing

  • 1. Introduction To Data Warehousing Triangle MySQL Users Group March 24, 2017 Alex Meadows
  • 2.  Principal Consultant (Data and Analytics), CSpring Inc.  Business Analytics Adjunct Professor, Wake Tech  MS in Business Intelligence  Passion in developing BI solutions that provide end users easy access to necessary data to find the answers they demand (even the ones they don’t know yet!) Twitter: @OpenDataAlex LinkedIn: alexmeadows GitHub: OpenDataAlex Email: ameadows@cspring.com About Alex
  • 3.
  • 4.
  • 5. Agenda  Why Data Warehousing  Use Case Modeling Decisions  Building Your Data Warehouse  Data Warehouse Gotchas  Q&A Please feel free to ask questions throughout the presentation!
  • 6. Why Data Warehouses?  Started being discussed in 1970  While databases existed, they were not relational/normalized − Network/hierarchical in nature − Design for query, not for data model  Reporting was hard − System/application queries were not the same as management reporting queries
  • 7. Design For Query Shopping Order Widgets Thingys Odds and Ends Customers Country State City
  • 8. Bill Inmon Data warehouses: subject- oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process
  • 10. How Many Customers Like Animals? Dogs? Dogs with short hair? Customers Sales Marketing Employee Enterprise Data Warehouse STAR SCHEMAS
  • 12. Business Intelligence Use Cases  Traditional Data Warehousing focuses on Descriptive and Diagnostic questions  Predictive and Prescriptive questions require other types of tooling (i.e. simulation modeling, statistics, etc.)
  • 13. Use Case Modeling Decisions -or- Do You Need A Data Warehouse?
  • 14. Case 1: Operational Data Store Holds All Data ● Classic use-case ● System bogged down with historic ‘valuable’ data ● Applications may try to take advantage of hosting the data for features
  • 15. Case 2: Adding Unnecessary Objects ● Adding objects that meet more reporting needs than application requirements ● Impacting maintainability through duplication of data ● Database/application handling data movement to these objects
  • 16. Case 3: Complex Relationships Based On Filtering Factors ● Storing data by clear delimiters – Time – Quarters, Months, etc. – Geography – Region, State, City, etc. – Business logic ● Makes querying very complicated ● Also can impact high availability architecture
  • 17. Do You Need A DWH? ● Case 1 – Data volume/historical data ● Case 2/3 – Transactional database not matching reporting/analysis requirements ● If performance isn’t an issue (yet) then you have some time ● If data volume is tiny (under ~50 GB) then maybe not ● We’re going to assume that the DWH is needed ;)
  • 18. Building Your Data Warehouse
  • 20. Inmon: 3rd Normal Form ● Normalize on Objects, Relationships ● Focus on all data stores – Join data sets as necessary – Look into Master Data Management practices for true store merging
  • 21.
  • 22. Classroom Transaction System Student First Name Student Last Name Student System ID Bob Young 1 Robert Young 2 Jennifer Owens 3 Andrew Collins 4 Student ID Class ID Student Grade 1 1A 2 1B 3 1B 4 1C 2 2A Class ID Class Name Class Program Class Credits 1Intro to Computer Science Business Admin 4 2Clay Sculpting 101 Art 2
  • 23. Classroom 3NF Data Warehouse Student First Name Student Last Name Student System ID Studen t ID Bob Young 1 100 Robert Young 2 101 Jennifer Owens 3 102 Andrew Collins 4 103 Class ID Class System ID Class Name Class Program Class Credits 200 1Intro to Computer Science Business Admin 4 201 2Clay Sculpting 101 Art 2 Stude nt ID Class ID Student Grade 100 200A 101 200B 102 200B 103 200C 101 201A
  • 24. DWH = Version Control Class ID Class System ID Class Name Class Program Class Credits Create Date Update Date Version 200 1Intro to Computer Science Business Admin 4 01/01/17 02/01/17 1 202 1Intro to Computer Science Business Admin 6 02/01/17 02/01/17 2 201 2Clay Sculpting 101 Art 2 01/01/17 01/01/17 1
  • 25. Also For Relationships! ● Historical vs current relationships ● Different ways of handling version control (dimensionality) Student ID Class ID Student Grade Create Date Update Date 100 200A 01/01/17 01/01/17 101 200B 01/01/17 01/01/17 102 200B 01/01/17 01/01/17 103 200C 02/01/17 02/01/17 104 200D 02/01/17 02/01/17 103 202C 02/01/17 02/01/17 104 202D 02/01/17 02/01/17 104 201A 01/01/17 01/01/17 101 201A 01/01/17 01/01/17
  • 27. SCD 1
  • 28. SCD 2
  • 29. SCD 3
  • 30. Dimensionality ● Concept originated with star schema ● Store data changes based on what is being done with the data/long term utilization ● Build models based on objects and relationships
  • 33. Star Schema Example Student Dim Class Dim Student Fact Professor Dim
  • 34. Dimension Example Class ID Class Name Class Program Topic 200Intro to Computer ScienceBusiness Admin Computer Science 202Intro to Computer ScienceBusiness Admin Computer Science 201Clay Sculpting 101 Art Sculpture
  • 35. Fact Table Example Student ID Class ID Credit Earned Credit Maximum Date ID 100 200 26 120 20170101 101 200 37 120 20170101 102 200 12 120 20170101 103 200 42 120 20170101 104 200 16 120 20170101 103 202 80 120 20170101 104 202 120 120 20170101 104 201 90 120 20170101 101 201 26 120 20170101
  • 36. Data Vault  Hybrid between 3NF and star schema  Created by Dan Linstedt  Persistent data layer – keep everything  Bring data over as needed − Once touching an object, bring it all over  Can be hybrid between relational databases and Hadoop  Massive parallel loading, eventual consistency (with Hadoop)
  • 37.
  • 38.  1.0 documentation found at:  TDAN Article  2.0 documentation ->  Certification/training:  http://learndatavault.com/
  • 40. Resetting The Data Warehouse ● Especially Star Schema – Business Logic changes – Missing requirements ● What to do? – Reload from permanent storage (3NF DWH/Data Vault)
  • 41. Performance Issues* ● Hitting the same/similar performance bottlenecks as transactional system ● What to do? – Check for proper indexing (can get complicated with star schema) – Volume too high for platform? Consider alternatives (other data stores, NoSQL for large static data sets) – Matching too closely to transactional model? Look at tuning the model for purpose – Composite keys in star schema? – Too many joins? *This is a huge area, and trying to generalize it is difficult. There are other solutions we can discuss :)

Editor's Notes

  1. So here’s a bit about me. There are three things I’m going to ask of you, the first being – please feel free to reach out! I love talking and learning about what folks are using out in the wild and sharing. If you want to know more or chat more about any topic within data science/business intelligence just message me via one of the above methods.
  2. The second thing I’ll ask is to be aware that some of these solutions may fix your particular problems and you’ll iterate on them and we’ll find them super-awesome and maybe you’ll be able to give back and talk about your experiences at a conference or in a trade paper. Note that the business side might not realize the undertaking or super awesome things being done – they are designed to be seamless and make users lives easier.
  3. The final ask before we get fully started is please don’t be the pointy-haired boss! We’re covering a lot of topics at a very high level and a lot of nuances aren’t being discussed (it’s only a 40 minute presentation after all). Please dig further and ask plenty of questions.
  4. By the end of this presentation, you will know where traditional data warehousing is failing and have a basic understanding of what technologies and methodologies are helping to address the needs of more data savvy customer bases.
  5. The concept of data warehouses started in the 1970s and fully came into their own during the late 80s and well into the 90s. Before relational databases, data was stored based on query usage and not necessarily based on the data itself. As a result, reporting was hard. Data would either have to be merged out piece-meal or stored again based on the specific query requirements.
  6. Into that mess, a gentleman named Bill Inmon created the initial concept of separating reporting and analysis needs away from the OLTP layer.
  7. With that said, here is a typical model/workflow. From OLTP systems, Excel files, etc. The data is moved into a 3NF model. From the 3NF model, star schema are built on top to handle all the reporting/analytics requirements. This model has worked very well but there are several problems that have come out with this model. While I don’t have an exact number, a high number of data warehouse projects are considered failures due to these issues. What are they? Glad you asked!
  8. There are distinct groups of requirements that business intelligence tries to answer. Traditional data warehousing can answer the first two – what happened and why it did happen. Where it starts to fail is in the predictive analytics space where again, data scientists want data that is not cleansed and conformed, but still easy to access. Then there is proscriptive analytics – applying the predictions found and making automated decisions based on them. Graph Source: http://www.odoscope.com/technology/prescriptive-analysis/
  9. Here is our basic example that we’ll be using through the rest of this presentation. It’s a simple student/teacher/class model that, while not modeled 100% ‘correct’, will provide a good example going forward.
  10. Of the newer architectures, Data Vault is one of the easier to implement because it is a combination of both the Kimball and Inmon methods. Data is only brought over from source systems as needed as opposed to bringing everything from the source all at once. The other really cool thing about Data Vault is that data can be offloaded into Hadoop as it ages and becomes non-volitile. Image Source: https://pixabay.com/en/vault-strongbox-security-container-154023/
  11. Here is that same model in data vault form. Business entities become hub tables. Relationships between hubs get stored in many to many relationship tables called links. Off both hubs and links are dimension-like tables called satellites that store all relative information of their related hub or link. Satellites version data as changes occur.
  12. There’s not a large amount of information publicly available outside the book, shown above. The original series of articles can be found on TDAN. There is also certification thru the learn data vault website.
  13. Image Source: https://pixabay.com/p-1014060/?no_redirect