SlideShare a Scribd company logo
1 of 21
Download to read offline
Worst Practices in Data Warehouse 
Design 
Kent Graziano 
Data Warrior LLC 
Twitter @KentGraziano
Agenda 
 My Bio 
 My Book 
 Survey 
 Backstory 
 What’s wrong with this picture? 
 The fallacy of the unconstrained data 
warehouse 
 Moral of the Story 
© Data Warrior LLC
My Bio 
 Kent Graziano 
● Oracle ACE Director (BI/DW) 
● Data Architecture and Data Warehouse Specialist 
● 30+ years in IT 
● 20+ years of Oracle-related work 
● 15+ years of data warehousing experience 
● Member: Boulder BI Brain Trust 
(http://www.boulderbibraintrust.org/ ) 
● Co-Author of 
● The Business of Data Vault Modeling 
● The Data Model Resource Book (1st Edition) 
● Past-President of Oracle Development Tools User Group and 
Rocky Mountain Oracle User Group 
© Data Warrior LLC
Most recent book: 
http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
Survey 
 Who are you? 
● Data Modeler or Architect 
● Project Managers 
● IT Managers 
● DBA 
● Developer 
 Experience 
● Data Warehousing? 
● Less than 1 yr? 
● 1-5 yrs? 
● Over 5 years? 
© Data Warrior LLC
The Backstory 
 Metrics data mart 
 Outsourced 
 POC worked great 
● 500 records loaded! 
 Real world: 100K ++ rows 
● 1st run – DBA cancelled after 8 hours 
● Filled up 665GB temp space 
 Something wrong? 
© Data Warrior LLC
Next step 
 DBA says 
● Too many parallel sessions 
● Too many partitions on fact table 
● Load includes 
● Select * 
● Select distinct 
 Me 
● Reverse engineer the tables first 
● Look at the design 
● Yikes! 
© Data Warrior LLC
My email to management 
“In general, the designs of both the source star schema 
and the target reporting table do not conform to best 
practices from either an Oracle tuning or data warehouse 
design perspective. “ 
“My only conclusion is that the folks who did the design 
were not well versed or experienced in designing high 
performance, high volume data warehouse databases on 
Oracle.” 
“Some of the omissions are so basic as it is hard to 
comprehend how this could have been considered a 
completed system. “ 
© Data Warrior LLC
What’s wrong with this picture? 
● All optional 
columns 
● The 
measure is 
optional! 
● Even meta 
data! 
● Extra 
Varchar 
columns 
● No PK 
● No UK 
● No FKs 
● No 
Indexes! 
© Data Warrior LLC
So what? 
 Works fine for 500 rows 
● Full table scans 
 No clues for the optimizer 
 No clues for customer! 
● Design intent? 
● Data profile? 
 No PK/UK – could get duplicates in load 
 No FK – could be missing dimension keys 
 Lazy design! 
© Data Warrior LLC
What’s wrong with this picture? 
● All 
optional 
columns 
● Even the 
PK and 
meta 
data! 
● No UK 
● PK on an 
optional 
column? 
© Data Warrior LLC
So what? 
 No clue on business key 
 SCD Type 1 or 2? 
 There is a CRC Key and CRC Attr 
● But which date is the Type 2 date? 
 Again no clues in the indexes or NOT NULL 
 Have to look at data to see if 
DW_REC_CREATED_DT and 
DW_REC_UPDATED_DT are different 
 Can’t discern the intent 
© Data Warrior LLC
How about the Date Dimension? 
● All 
optional 
columns 
● Assume 
1st column 
is PK? 
● No PK 
● No UK 
● No Indexes 
© Data Warrior LLC
More examples 
 Let’s look into the data model…. 
© Data Warrior LLC
Other Stuff 
 Untested partitioning scheme 
● Target report table partitioning and sub-partition is 
non-standard – not on date field 
● Pre-created 200 list-based partitions 
● But the domain only had 37 values! 
 Did not use partition-aware loading approach 
 No indexes on partitions or sub partition 
© Data Warrior LLC
Load approach 
 Uses a “select *” from source in a view 
 UPPER function in predicate 
● Not needed 
● Cancels index usage 
 Degree of parallelism hardcoded into view 
 Dummy columns coded into view 
 No documentation on why 
 NEVER TESTED with real data! 
© Data Warrior LLC
The Fallacy of the Unconstrained Data 
Warehouse 
 Rationale 
● Fast to load – no constraints 
● All the validation is in the code 
 Reality 
● May be fast load, but slow query 
● Not tuned for extract! 
● Code may not have been QA’d well 
● No model to tell the programmers the rules 
● What columns are required? 
● What are the FKs to check? 
● What defines a duplicate row? 
 Cost 
● Slow query response 
● Bad data loaded 
● Few clues to help tune 
© Data Warrior LLC
Moral of the story? 
 Be careful who you outsource to 
 Have someone independent do touch point 
reviews of design 
● Costs extra, but we have spent MONTHS fixing this 
 Insist on documentation 
 Insist on knowledge transfer with internal DBA 
 Require load testing with performance criteria 
Trust but Verify! 
© Data Warrior LLC
SUBMIT YOUR ABSTRACTS TODAY! 
Kscope15.com
Contact Information 
Kent Graziano 
The Oracle Data Warrior 
Data Warrior LLC 
Kent.graziano@att.net 
On Twitter @KentGraziano 
Visit my blog at 
http://kentgraziano.com

More Related Content

What's hot

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachDatabricks
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsKent Graziano
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...Kent Graziano
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on ReadKent Graziano
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Kent Graziano
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault ModelingKent Graziano
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Kent Graziano
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data VaultTorsten Glunde
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Michael Olschimke
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsQuontra Solutions
 

What's hot (20)

Speeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT ApproachSpeeding Time to Insight with a Modern ELT Approach
Speeding Time to Insight with a Modern ELT Approach
 
Visual Data Vault
Visual Data VaultVisual Data Vault
Visual Data Vault
 
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
 
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 DimensionsExtreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
Extreme BI: Creating Virtualized Hybrid Type 1+2 Dimensions
 
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...HOW TO SAVE  PILEs of $$$BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
HOW TO SAVE PILEs of $$$ BY CREATING THE BEST DATA MODEL THE FIRST TIME (Ksc...
 
Operational Data Vault
Operational Data VaultOperational Data Vault
Operational Data Vault
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 
Making Sense of Schema on Read
Making Sense of Schema on ReadMaking Sense of Schema on Read
Making Sense of Schema on Read
 
Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)Demystifying Data Warehousing as a Service (GLOC 2019)
Demystifying Data Warehousing as a Service (GLOC 2019)
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Introduction to Data Vault Modeling
Introduction to Data Vault ModelingIntroduction to Data Vault Modeling
Introduction to Data Vault Modeling
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 
Conceptional Data Vault
Conceptional Data VaultConceptional Data Vault
Conceptional Data Vault
 
Data vault what's Next: Part 2
Data vault what's Next: Part 2Data vault what's Next: Part 2
Data vault what's Next: Part 2
 
Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)Agile Data Mining with Data Vault 2.0 (english)
Agile Data Mining with Data Vault 2.0 (english)
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
Data Vault Overview
Data Vault OverviewData Vault Overview
Data Vault Overview
 
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Dataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra SolutionsDataware house Introduction By Quontra Solutions
Dataware house Introduction By Quontra Solutions
 

Viewers also liked

Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureKent Graziano
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016Kent Graziano
 
Heli data modeler wildcard2013
Heli data modeler wildcard2013Heli data modeler wildcard2013
Heli data modeler wildcard2013Andrejs Vorobjovs
 
Pimping SQL Developer and Data Modeler
Pimping SQL Developer and Data ModelerPimping SQL Developer and Data Modeler
Pimping SQL Developer and Data ModelerKris Rice
 
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...FrederikN
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresPhilip Stoyanov
 
My Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler FeaturesMy Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler FeaturesJeff Smith
 
Generating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data ModelerGenerating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data ModelerRob van den Berg
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerJeff Smith
 
Oracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your DesignsOracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your DesignsJeff Smith
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Jeff Smith
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperJeff Smith
 
Dan Norris: Exadata security
Dan Norris: Exadata securityDan Norris: Exadata security
Dan Norris: Exadata securityKyle Hailey
 
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Mark Farnam  : Minimizing the Concurrency Footprint of TransactionsMark Farnam  : Minimizing the Concurrency Footprint of Transactions
Mark Farnam : Minimizing the Concurrency Footprint of TransactionsKyle Hailey
 

Viewers also liked (16)

Data Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data CaptureData Vault 2.0: Using MD5 Hashes for Change Data Capture
Data Vault 2.0: Using MD5 Hashes for Change Data Capture
 
Data Warehousing 2016
Data Warehousing 2016Data Warehousing 2016
Data Warehousing 2016
 
Data control
Data controlData control
Data control
 
Why Data Vault?
Why Data Vault? Why Data Vault?
Why Data Vault?
 
Heli data modeler wildcard2013
Heli data modeler wildcard2013Heli data modeler wildcard2013
Heli data modeler wildcard2013
 
Pimping SQL Developer and Data Modeler
Pimping SQL Developer and Data ModelerPimping SQL Developer and Data Modeler
Pimping SQL Developer and Data Modeler
 
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
Your favorite data modeling tool, your partner in crime for Data Warehouse Au...
 
Oracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new featuresOracle Sql Developer Data Modeler 3 3 new features
Oracle Sql Developer Data Modeler 3 3 new features
 
My Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler FeaturesMy Favorite Oracle SQL Developer Data Modeler Features
My Favorite Oracle SQL Developer Data Modeler Features
 
Generating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data ModelerGenerating Code with Oracle SQL Developer Data Modeler
Generating Code with Oracle SQL Developer Data Modeler
 
Dimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developerDimensional modeling in oracle sql developer
Dimensional modeling in oracle sql developer
 
Oracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your DesignsOracle SQL Developer Data Modeler - Version Control Your Designs
Oracle SQL Developer Data Modeler - Version Control Your Designs
 
Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?Oracle SQL Developer for SQL Server?
Oracle SQL Developer for SQL Server?
 
PL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL DeveloperPL/SQL All the Things in Oracle SQL Developer
PL/SQL All the Things in Oracle SQL Developer
 
Dan Norris: Exadata security
Dan Norris: Exadata securityDan Norris: Exadata security
Dan Norris: Exadata security
 
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
Mark Farnam  : Minimizing the Concurrency Footprint of TransactionsMark Farnam  : Minimizing the Concurrency Footprint of Transactions
Mark Farnam : Minimizing the Concurrency Footprint of Transactions
 

Similar to Worst Practices in Data Warehouse Design

Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkartStories
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Marco Tusa
 
SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015Ronni Pedersen
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cEdelweiss Kammermann
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Databricks
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowKevin Kline
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data IntegrationRoberto Marchetto
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling MagentoCopious
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Hiral Patel
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerKevin Kline
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013J.D. Wade
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionDaniel Norman
 
Easier and faster tagging with Kermit
Easier and faster tagging with KermitEasier and faster tagging with Kermit
Easier and faster tagging with KermitAlban Gérôme
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentationMichael Young
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mhaAgileDenver
 

Similar to Worst Practices in Data Warehouse Design (20)

Flipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 repriseFlipkart Data Platform @ Scale - slash n 2018 reprise
Flipkart Data Platform @ Scale - slash n 2018 reprise
 
Cloud dwh
Cloud dwhCloud dwh
Cloud dwh
 
Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...Are we there Yet?? (The long journey of Migrating from close source to opens...
Are we there Yet?? (The long journey of Migrating from close source to opens...
 
SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015SCUG.DK: Visualizing Your Data, April 2015
SCUG.DK: Visualizing Your Data, April 2015
 
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12cIntegrating Oracle Data Integrator with Oracle GoldenGate 12c
Integrating Oracle Data Integrator with Oracle GoldenGate 12c
 
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
Behavior-Driven Development (BDD) Testing with Apache Spark with Aaron Colcor...
 
Ten query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should knowTen query tuning techniques every SQL Server programmer should know
Ten query tuning techniques every SQL Server programmer should know
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
 
Scaling Magento
Scaling MagentoScaling Magento
Scaling Magento
 
Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]Query generation across multiple data stores [SBTB 2016]
Query generation across multiple data stores [SBTB 2016]
 
Top 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL ServerTop 10 DBA Mistakes on Microsoft SQL Server
Top 10 DBA Mistakes on Microsoft SQL Server
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big DataVoxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
Voxxed Athens 2018 - Methods and Practices for Guaranteed Failure in Big Data
 
Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
What SQL DBAs need to know about SharePoint-Kansas City, Sept 2013
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
AMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interactionAMW43 - Unba.se, Distributed database for human interaction
AMW43 - Unba.se, Distributed database for human interaction
 
Easier and faster tagging with Kermit
Easier and faster tagging with KermitEasier and faster tagging with Kermit
Easier and faster tagging with Kermit
 
Group 3 slide presentation
Group 3 slide presentationGroup 3 slide presentation
Group 3 slide presentation
 
Agile methods and dw mha
Agile methods and dw mhaAgile methods and dw mha
Agile methods and dw mha
 

Recently uploaded

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 

Recently uploaded (20)

Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 

Worst Practices in Data Warehouse Design

  • 1. Worst Practices in Data Warehouse Design Kent Graziano Data Warrior LLC Twitter @KentGraziano
  • 2. Agenda  My Bio  My Book  Survey  Backstory  What’s wrong with this picture?  The fallacy of the unconstrained data warehouse  Moral of the Story © Data Warrior LLC
  • 3. My Bio  Kent Graziano ● Oracle ACE Director (BI/DW) ● Data Architecture and Data Warehouse Specialist ● 30+ years in IT ● 20+ years of Oracle-related work ● 15+ years of data warehousing experience ● Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/ ) ● Co-Author of ● The Business of Data Vault Modeling ● The Data Model Resource Book (1st Edition) ● Past-President of Oracle Development Tools User Group and Rocky Mountain Oracle User Group © Data Warrior LLC
  • 4. Most recent book: http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
  • 5. Survey  Who are you? ● Data Modeler or Architect ● Project Managers ● IT Managers ● DBA ● Developer  Experience ● Data Warehousing? ● Less than 1 yr? ● 1-5 yrs? ● Over 5 years? © Data Warrior LLC
  • 6. The Backstory  Metrics data mart  Outsourced  POC worked great ● 500 records loaded!  Real world: 100K ++ rows ● 1st run – DBA cancelled after 8 hours ● Filled up 665GB temp space  Something wrong? © Data Warrior LLC
  • 7. Next step  DBA says ● Too many parallel sessions ● Too many partitions on fact table ● Load includes ● Select * ● Select distinct  Me ● Reverse engineer the tables first ● Look at the design ● Yikes! © Data Warrior LLC
  • 8. My email to management “In general, the designs of both the source star schema and the target reporting table do not conform to best practices from either an Oracle tuning or data warehouse design perspective. “ “My only conclusion is that the folks who did the design were not well versed or experienced in designing high performance, high volume data warehouse databases on Oracle.” “Some of the omissions are so basic as it is hard to comprehend how this could have been considered a completed system. “ © Data Warrior LLC
  • 9. What’s wrong with this picture? ● All optional columns ● The measure is optional! ● Even meta data! ● Extra Varchar columns ● No PK ● No UK ● No FKs ● No Indexes! © Data Warrior LLC
  • 10. So what?  Works fine for 500 rows ● Full table scans  No clues for the optimizer  No clues for customer! ● Design intent? ● Data profile?  No PK/UK – could get duplicates in load  No FK – could be missing dimension keys  Lazy design! © Data Warrior LLC
  • 11. What’s wrong with this picture? ● All optional columns ● Even the PK and meta data! ● No UK ● PK on an optional column? © Data Warrior LLC
  • 12. So what?  No clue on business key  SCD Type 1 or 2?  There is a CRC Key and CRC Attr ● But which date is the Type 2 date?  Again no clues in the indexes or NOT NULL  Have to look at data to see if DW_REC_CREATED_DT and DW_REC_UPDATED_DT are different  Can’t discern the intent © Data Warrior LLC
  • 13. How about the Date Dimension? ● All optional columns ● Assume 1st column is PK? ● No PK ● No UK ● No Indexes © Data Warrior LLC
  • 14. More examples  Let’s look into the data model…. © Data Warrior LLC
  • 15. Other Stuff  Untested partitioning scheme ● Target report table partitioning and sub-partition is non-standard – not on date field ● Pre-created 200 list-based partitions ● But the domain only had 37 values!  Did not use partition-aware loading approach  No indexes on partitions or sub partition © Data Warrior LLC
  • 16. Load approach  Uses a “select *” from source in a view  UPPER function in predicate ● Not needed ● Cancels index usage  Degree of parallelism hardcoded into view  Dummy columns coded into view  No documentation on why  NEVER TESTED with real data! © Data Warrior LLC
  • 17. The Fallacy of the Unconstrained Data Warehouse  Rationale ● Fast to load – no constraints ● All the validation is in the code  Reality ● May be fast load, but slow query ● Not tuned for extract! ● Code may not have been QA’d well ● No model to tell the programmers the rules ● What columns are required? ● What are the FKs to check? ● What defines a duplicate row?  Cost ● Slow query response ● Bad data loaded ● Few clues to help tune © Data Warrior LLC
  • 18. Moral of the story?  Be careful who you outsource to  Have someone independent do touch point reviews of design ● Costs extra, but we have spent MONTHS fixing this  Insist on documentation  Insist on knowledge transfer with internal DBA  Require load testing with performance criteria Trust but Verify! © Data Warrior LLC
  • 19.
  • 20. SUBMIT YOUR ABSTRACTS TODAY! Kscope15.com
  • 21. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net On Twitter @KentGraziano Visit my blog at http://kentgraziano.com