Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Worst Practices in Data Warehouse Design

3,045 views

Published on

This presentation was given at OakTable World 2014 (#OTW14) in San Francisco. After many years of designing data warehouses and consulting on data warehouse architectures, I have seen a lot of bad design choices by supposedly experienced professional. A sense of professionalism, confidentiality agreements, and some sense of common decency have prevented me from calling people out on some of this. No more! In this session I will walk you through a typical bad design like many I have seen. I will show you what I see when I reverse engineer a supposedly complete design and walk through what is wrong with it and discuss options to correct it. This will be a test of your knowledge of data warehouse best practices by seeing if you can recognize these worst practices.

Published in: Data & Analytics

Worst Practices in Data Warehouse Design

  1. 1. Worst Practices in Data Warehouse Design Kent Graziano Data Warrior LLC Twitter @KentGraziano
  2. 2. Agenda  My Bio  My Book  Survey  Backstory  What’s wrong with this picture?  The fallacy of the unconstrained data warehouse  Moral of the Story © Data Warrior LLC
  3. 3. My Bio  Kent Graziano ● Oracle ACE Director (BI/DW) ● Data Architecture and Data Warehouse Specialist ● 30+ years in IT ● 20+ years of Oracle-related work ● 15+ years of data warehousing experience ● Member: Boulder BI Brain Trust (http://www.boulderbibraintrust.org/ ) ● Co-Author of ● The Business of Data Vault Modeling ● The Data Model Resource Book (1st Edition) ● Past-President of Oracle Development Tools User Group and Rocky Mountain Oracle User Group © Data Warrior LLC
  4. 4. Most recent book: http://www.amazon.com/Check-Doing-Design-Reviews-ebook/dp/B008RG9L5E/
  5. 5. Survey  Who are you? ● Data Modeler or Architect ● Project Managers ● IT Managers ● DBA ● Developer  Experience ● Data Warehousing? ● Less than 1 yr? ● 1-5 yrs? ● Over 5 years? © Data Warrior LLC
  6. 6. The Backstory  Metrics data mart  Outsourced  POC worked great ● 500 records loaded!  Real world: 100K ++ rows ● 1st run – DBA cancelled after 8 hours ● Filled up 665GB temp space  Something wrong? © Data Warrior LLC
  7. 7. Next step  DBA says ● Too many parallel sessions ● Too many partitions on fact table ● Load includes ● Select * ● Select distinct  Me ● Reverse engineer the tables first ● Look at the design ● Yikes! © Data Warrior LLC
  8. 8. My email to management “In general, the designs of both the source star schema and the target reporting table do not conform to best practices from either an Oracle tuning or data warehouse design perspective. “ “My only conclusion is that the folks who did the design were not well versed or experienced in designing high performance, high volume data warehouse databases on Oracle.” “Some of the omissions are so basic as it is hard to comprehend how this could have been considered a completed system. “ © Data Warrior LLC
  9. 9. What’s wrong with this picture? ● All optional columns ● The measure is optional! ● Even meta data! ● Extra Varchar columns ● No PK ● No UK ● No FKs ● No Indexes! © Data Warrior LLC
  10. 10. So what?  Works fine for 500 rows ● Full table scans  No clues for the optimizer  No clues for customer! ● Design intent? ● Data profile?  No PK/UK – could get duplicates in load  No FK – could be missing dimension keys  Lazy design! © Data Warrior LLC
  11. 11. What’s wrong with this picture? ● All optional columns ● Even the PK and meta data! ● No UK ● PK on an optional column? © Data Warrior LLC
  12. 12. So what?  No clue on business key  SCD Type 1 or 2?  There is a CRC Key and CRC Attr ● But which date is the Type 2 date?  Again no clues in the indexes or NOT NULL  Have to look at data to see if DW_REC_CREATED_DT and DW_REC_UPDATED_DT are different  Can’t discern the intent © Data Warrior LLC
  13. 13. How about the Date Dimension? ● All optional columns ● Assume 1st column is PK? ● No PK ● No UK ● No Indexes © Data Warrior LLC
  14. 14. More examples  Let’s look into the data model…. © Data Warrior LLC
  15. 15. Other Stuff  Untested partitioning scheme ● Target report table partitioning and sub-partition is non-standard – not on date field ● Pre-created 200 list-based partitions ● But the domain only had 37 values!  Did not use partition-aware loading approach  No indexes on partitions or sub partition © Data Warrior LLC
  16. 16. Load approach  Uses a “select *” from source in a view  UPPER function in predicate ● Not needed ● Cancels index usage  Degree of parallelism hardcoded into view  Dummy columns coded into view  No documentation on why  NEVER TESTED with real data! © Data Warrior LLC
  17. 17. The Fallacy of the Unconstrained Data Warehouse  Rationale ● Fast to load – no constraints ● All the validation is in the code  Reality ● May be fast load, but slow query ● Not tuned for extract! ● Code may not have been QA’d well ● No model to tell the programmers the rules ● What columns are required? ● What are the FKs to check? ● What defines a duplicate row?  Cost ● Slow query response ● Bad data loaded ● Few clues to help tune © Data Warrior LLC
  18. 18. Moral of the story?  Be careful who you outsource to  Have someone independent do touch point reviews of design ● Costs extra, but we have spent MONTHS fixing this  Insist on documentation  Insist on knowledge transfer with internal DBA  Require load testing with performance criteria Trust but Verify! © Data Warrior LLC
  19. 19. SUBMIT YOUR ABSTRACTS TODAY! Kscope15.com
  20. 20. Contact Information Kent Graziano The Oracle Data Warrior Data Warrior LLC Kent.graziano@att.net On Twitter @KentGraziano Visit my blog at http://kentgraziano.com

×