Worst Practices in Data Warehouse 
Design 
Kent Graziano 
Data Warrior LLC 
Twitter @KentGraziano
Agenda 
 My Bio 
 My Book 
 Survey 
 Backstory 
 What’s wrong with this picture? 
 The fallacy of the unconstrained data 
warehouse 
 Moral of the Story 
© Data Warrior LLC
My Bio 
 Kent Graziano 
● Oracle ACE Director (BI/DW) 
● Data Architecture and Data Warehouse Specialist 
● 30+ years in IT 
● 20+ years of Oracle-related work 
● 15+ years of data warehousing experience 
● Member: Boulder BI Brain Trust 
(http://www.boulderbibraintrust.org/ ) 
● Co-Author of 
● The Business of Data Vault Modeling 
● The Data Model Resource Book (1st Edition) 
● Past-President of Oracle Development Tools User Group and 
Rocky Mountain Oracle User Group 
© Data Warrior LLC
Most recent book: 
http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
Survey 
 Who are you? 
● Data Modeler or Architect 
● Project Managers 
● IT Managers 
● DBA 
● Developer 
 Experience 
● Data Warehousing? 
● Less than 1 yr? 
● 1-5 yrs? 
● Over 5 years? 
© Data Warrior LLC
The Backstory 
 Metrics data mart 
 Outsourced 
 POC worked great 
● 500 records loaded! 
 Real world: 100K ++ rows 
● 1st run – DBA cancelled after 8 hours 
● Filled up 665GB temp space 
 Something wrong? 
© Data Warrior LLC
Next step 
 DBA says 
● Too many parallel sessions 
● Too many partitions on fact table 
● Load includes 
● Select * 
● Select distinct 
 Me 
● Reverse engineer the tables first 
● Look at the design 
● Yikes! 
© Data Warrior LLC
My email to management 
“In general, the designs of both the source star schema 
and the target reporting table do not conform to best 
practices from either an Oracle tuning or data warehouse 
design perspective. “ 
“My only conclusion is that the folks who did the design 
were not well versed or experienced in designing high 
performance, high volume data warehouse databases on 
Oracle.” 
“Some of the omissions are so basic as it is hard to 
comprehend how this could have been considered a 
completed system. “ 
© Data Warrior LLC
What’s wrong with this picture? 
● All optional 
columns 
● The 
measure is 
optional! 
● Even meta 
data! 
● Extra 
Varchar 
columns 
● No PK 
● No UK 
● No FKs 
● No 
Indexes! 
© Data Warrior LLC
So what? 
 Works fine for 500 rows 
● Full table scans 
 No clues for the optimizer 
 No clues for customer! 
● Design intent? 
● Data profile? 
 No PK/UK – could get duplicates in load 
 No FK – could be missing dimension keys 
 Lazy design! 
© Data Warrior LLC
What’s wrong with this picture? 
● All 
optional 
columns 
● Even the 
PK and 
meta 
data! 
● No UK 
● PK on an 
optional 
column? 
© Data Warrior LLC
So what? 
 No clue on business key 
 SCD Type 1 or 2? 
 There is a CRC Key and CRC Attr 
● But which date is the Type 2 date? 
 Again no clues in the indexes or NOT NULL 
 Have to look at data to see if 
DW_REC_CREATED_DT and 
DW_REC_UPDATED_DT are different 
 Can’t discern the intent 
© Data Warrior LLC
How about the Date Dimension? 
● All 
optional 
columns 
● Assume 
1st column 
is PK? 
● No PK 
● No UK 
● No Indexes 
© Data Warrior LLC
More examples 
 Let’s look into the data model…. 
© Data Warrior LLC
Other Stuff 
 Untested partitioning scheme 
● Target report table partitioning and sub-partition is 
non-standard – not on date field 
● Pre-created 200 list-based partitions 
● But the domain only had 37 values! 
 Did not use partition-aware loading approach 
 No indexes on partitions or sub partition 
© Data Warrior LLC
Load approach 
 Uses a “select *” from source in a view 
 UPPER function in predicate 
● Not needed 
● Cancels index usage 
 Degree of parallelism hardcoded into view 
 Dummy columns coded into view 
 No documentation on why 
 NEVER TESTED with real data! 
© Data Warrior LLC
The Fallacy of the Unconstrained Data 
Warehouse 
 Rationale 
● Fast to load – no constraints 
● All the validation is in the code 
 Reality 
● May be fast load, but slow query 
● Not tuned for extract! 
● Code may not have been QA’d well 
● No model to tell the programmers the rules 
● What columns are required? 
● What are the FKs to check? 
● What defines a duplicate row? 
 Cost 
● Slow query response 
● Bad data loaded 
● Few clues to help tune 
© Data Warrior LLC
Moral of the story? 
 Be careful who you outsource to 
 Have someone independent do touch point 
reviews of design 
● Costs extra, but we have spent MONTHS fixing this 
 Insist on documentation 
 Insist on knowledge transfer with internal DBA 
 Require load testing with performance criteria 
Trust but Verify! 
© Data Warrior LLC
SUBMIT YOUR ABSTRACTS TODAY! 
Kscope15.com
Contact Information 
Kent Graziano 
The Oracle Data Warrior 
Data Warrior LLC 
Kent.graziano@att.net 
On Twitter @KentGraziano 
Visit my blog at 
http://kentgraziano.com

Worst Practices in Data Warehouse Design

  • 1.
    Worst Practices inData Warehouse Design Kent Graziano Data Warrior LLC Twitter @KentGraziano
  • 2.
    Agenda  MyBio  My Book  Survey  Backstory  What’s wrong with this picture?  The fallacy of the unconstrained data warehouse  Moral of the Story © Data Warrior LLC
  • 3.
    My Bio Kent Graziano ● Oracle ACE Director (BI/DW) ● Data Architecture and Data Warehouse Specialist ● 30+ years in IT ● 20+ years of Oracle-related work ● 15+ years of data warehousing experience ● Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/ ) ● Co-Author of ● The Business of Data Vault Modeling ● The Data Model Resource Book (1st Edition) ● Past-President of Oracle Development Tools User Group and Rocky Mountain Oracle User Group © Data Warrior LLC
  • 4.
    Most recent book: http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
  • 5.
    Survey  Whoare you? ● Data Modeler or Architect ● Project Managers ● IT Managers ● DBA ● Developer  Experience ● Data Warehousing? ● Less than 1 yr? ● 1-5 yrs? ● Over 5 years? © Data Warrior LLC
  • 6.
    The Backstory Metrics data mart  Outsourced  POC worked great ● 500 records loaded!  Real world: 100K ++ rows ● 1st run – DBA cancelled after 8 hours ● Filled up 665GB temp space  Something wrong? © Data Warrior LLC
  • 7.
    Next step DBA says ● Too many parallel sessions ● Too many partitions on fact table ● Load includes ● Select * ● Select distinct  Me ● Reverse engineer the tables first ● Look at the design ● Yikes! © Data Warrior LLC
  • 8.
    My email tomanagement “In general, the designs of both the source star schema and the target reporting table do not conform to best practices from either an Oracle tuning or data warehouse design perspective. “ “My only conclusion is that the folks who did the design were not well versed or experienced in designing high performance, high volume data warehouse databases on Oracle.” “Some of the omissions are so basic as it is hard to comprehend how this could have been considered a completed system. “ © Data Warrior LLC
  • 9.
    What’s wrong withthis picture? ● All optional columns ● The measure is optional! ● Even meta data! ● Extra Varchar columns ● No PK ● No UK ● No FKs ● No Indexes! © Data Warrior LLC
  • 10.
    So what? Works fine for 500 rows ● Full table scans  No clues for the optimizer  No clues for customer! ● Design intent? ● Data profile?  No PK/UK – could get duplicates in load  No FK – could be missing dimension keys  Lazy design! © Data Warrior LLC
  • 11.
    What’s wrong withthis picture? ● All optional columns ● Even the PK and meta data! ● No UK ● PK on an optional column? © Data Warrior LLC
  • 12.
    So what? No clue on business key  SCD Type 1 or 2?  There is a CRC Key and CRC Attr ● But which date is the Type 2 date?  Again no clues in the indexes or NOT NULL  Have to look at data to see if DW_REC_CREATED_DT and DW_REC_UPDATED_DT are different  Can’t discern the intent © Data Warrior LLC
  • 13.
    How about theDate Dimension? ● All optional columns ● Assume 1st column is PK? ● No PK ● No UK ● No Indexes © Data Warrior LLC
  • 14.
    More examples Let’s look into the data model…. © Data Warrior LLC
  • 15.
    Other Stuff Untested partitioning scheme ● Target report table partitioning and sub-partition is non-standard – not on date field ● Pre-created 200 list-based partitions ● But the domain only had 37 values!  Did not use partition-aware loading approach  No indexes on partitions or sub partition © Data Warrior LLC
  • 16.
    Load approach Uses a “select *” from source in a view  UPPER function in predicate ● Not needed ● Cancels index usage  Degree of parallelism hardcoded into view  Dummy columns coded into view  No documentation on why  NEVER TESTED with real data! © Data Warrior LLC
  • 17.
    The Fallacy ofthe Unconstrained Data Warehouse  Rationale ● Fast to load – no constraints ● All the validation is in the code  Reality ● May be fast load, but slow query ● Not tuned for extract! ● Code may not have been QA’d well ● No model to tell the programmers the rules ● What columns are required? ● What are the FKs to check? ● What defines a duplicate row?  Cost ● Slow query response ● Bad data loaded ● Few clues to help tune © Data Warrior LLC
  • 18.
    Moral of thestory?  Be careful who you outsource to  Have someone independent do touch point reviews of design ● Costs extra, but we have spent MONTHS fixing this  Insist on documentation  Insist on knowledge transfer with internal DBA  Require load testing with performance criteria Trust but Verify! © Data Warrior LLC
  • 20.
    SUBMIT YOUR ABSTRACTSTODAY! Kscope15.com
  • 21.
    Contact Information KentGraziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net On Twitter @KentGraziano Visit my blog at http://kentgraziano.com