Open source data_warehousing_overview


Published on

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Open source data_warehousing_overview

  1. 1. Open Source Data Warehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  2. 2. What Is Data Warehousing?● Central repository● Oriented on Reporting and Analysis● Integrates multiple sources● Core to Business Intelligence and Advanced Analytics● Helps keep source systems clean and lean
  3. 3. Warehouse Methodologies● Inmon’s 3NF/Hub and Spoke Model● Kimball’s Conformed Dimension Model● Linstedt’s Data Vault Model● Rönnbäck’s Anchor Model/6NF
  4. 4. Source:
  5. 5. Common DW Challenges● Data storage increases significantly ● Time based snapshots ● Storing source changes● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting● Source Issues Magnified● Scalability
  6. 6. Inmon’s 3NF Model● Original data warehouse model● Move historical data into own data store● Data transformed to 3NF ● Entities and relationships
  7. 7. Open Source Software● MySQL● PostgreSQL● Greenplum (PostgreSQL derivative)● Any other traditional RDBMS
  8. 8. Cautions● Indexing● Replication● Partitioning
  9. 9. Kimball’s Conformed Dimensions● Normal database modeling does not meet needs of reporting and analysis● Denormalize data● Dimensions ● How does data need to be filtered?● Facts ● What are we wanting to analyze/measure?
  10. 10. Source:
  11. 11. Open Source Software● Greenplum (PostgreSQL derivative)● InfiniDB (MySQL derivative)● Infobright (MySQL derivative)● Other columnar data stores
  12. 12. Columnar Data Stores● Designed for conformed dimensions● High Performance ● Self-indexing based on usage ● High compression of data
  13. 13. Row vs Columnar DatabasesSource:
  14. 14. Cautions● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  15. 15. Inmon’s Hub and Spoke● Combines ● 3NF central data warehouse ● Conformed dimensions● Becomes foundation for further variants
  16. 16. ● Linstedt’s Data Vault Model● Mixes 3NF and Conformed Dimensions● Model data per business entities and their relationships● Hubs ● Store unique business entity identifiers (keys)● Links ● Relate hubs and other links to form relationships● Satellites ● Store unique information regarding entity or relationship
  17. 17. Source:
  18. 18. Cautions● While you get the best mix between 3NF and conformed dimensions, data marts are still needed● Issues seen with both 3NF and conformed dimensions can be found here
  19. 19. Open Source Software● MySQL● PostgreSQL● Greenplum● Other Traditional RDBMS● NoSQL ● Hadoop
  20. 20. ● Rönnbäck’s Anchor Model/6NF● Focus is on the data and it’s relationships.● Anchors ● Model entities and events● Attributes ● Model properties of anchors● Ties ● Model relationships between anchors● Knots ● Model relationships between shared properties
  21. 21. Source:
  22. 22. Cautions● Number of joins will be an issue for some databases● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  23. 23. ?
  24. 24. Open Source Software● Anchor Modeling website ● ● Web based design tools● No databases built specifically for 6NF