Open Source Data Warehousing:
     MySQL and Beyond



             Alex Meadows
          Twitter: @DBA_Alex
        Percona MySQL University
               Raleigh, NC
                1/29/2013
What Is Data Warehousing?
●   Central repository
●   Oriented on Reporting and Analysis
●   Integrates multiple sources
●   Core to Business Intelligence and Advanced
    Analytics
●   Helps keep source systems clean and lean
Warehouse Methodologies
●   Inmon’s 3NF/Hub and Spoke Model
●   Kimball’s Conformed Dimension Model
●   Linstedt’s Data Vault Model
●   Rönnbäck’s Anchor Model/6NF
Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf
Common DW Challenges
●   Data storage increases significantly
        ●   Time based snapshots
        ●   Storing source changes
●   Massive queries
        ●   Joining many tables, from multiple sources
        ●   Exploratory vs reporting
●   Source Issues Magnified
●   Scalability
Inmon’s 3NF Model
●   Original data warehouse model
●   Move historical data into own data store
●   Data transformed to 3NF
        ●   Entities and relationships
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum (PostgreSQL derivative)
●   Any other traditional RDBMS
Cautions
●   Indexing
●   Replication
●   Partitioning
Kimball’s Conformed Dimensions
●   Normal database modeling does not meet needs of
    reporting and analysis
●   Denormalize data
●   Dimensions
       ●    How does data need to be filtered?
●   Facts
       ●    What are we wanting to analyze/measure?
Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html
Open Source Software
●   Greenplum (PostgreSQL derivative)
●   InfiniDB (MySQL derivative)
●   Infobright (MySQL derivative)
●   Other columnar data stores
Columnar Data Stores
●   Designed for conformed dimensions
●   High Performance
       ●   Self-indexing based on usage
       ●   High compression of data
Row vs Columnar Databases




Source: http://dbbest.com/blog/column-oriented-database-technologies/
Cautions
●   Traditional RDBMS
       ●   Not built for conformed dimensions!
       ●   Performance will become issue
Inmon’s Hub and Spoke
●   Combines
        ●   3NF central data warehouse
        ●   Conformed dimensions
●   Becomes foundation for further variants
●   Linstedt’s Data Vault Model
●   Mixes 3NF and Conformed Dimensions
●   Model data per business entities and their
    relationships
●   Hubs
        ●   Store unique business entity identifiers (keys)
●   Links
        ●   Relate hubs and other links to form relationships
●   Satellites
        ●   Store unique information regarding entity or
              relationship
Source: http://danlinstedt.com/about/data-vault-basics/
Cautions
●   While you get the best mix between 3NF and
    conformed dimensions, data marts are still needed
●   Issues seen with both 3NF and conformed
    dimensions can be found here
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum
●   Other Traditional RDBMS
●   NoSQL
       ●   Hadoop
●   Rönnbäck’s Anchor Model/6NF
●   Focus is on the data and it’s relationships.
●   Anchors
        ●   Model entities and events
●   Attributes
        ●   Model properties of anchors
●   Ties
        ●   Model relationships between anchors
●   Knots
        ●   Model relationships between shared properties
Source: http://en.wikipedia.org/wiki/Anchor_Modeling
Cautions
●   Number of joins will be an issue for some databases
●   Queries will become complex
        ●   Joins
        ●   Finding properties/valuable information
        ●   Every column in traditional tables becomes own
             unique table
?
Open Source Software
●   Anchor Modeling website
        ●   http://www.anchormodeling.com
        ●   Web based design tools
●   No databases built specifically for 6NF

Open source data_warehousing_overview

  • 1.
    Open Source DataWarehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  • 2.
    What Is DataWarehousing? ● Central repository ● Oriented on Reporting and Analysis ● Integrates multiple sources ● Core to Business Intelligence and Advanced Analytics ● Helps keep source systems clean and lean
  • 3.
    Warehouse Methodologies ● Inmon’s 3NF/Hub and Spoke Model ● Kimball’s Conformed Dimension Model ● Linstedt’s Data Vault Model ● Rönnbäck’s Anchor Model/6NF
  • 4.
  • 5.
    Common DW Challenges ● Data storage increases significantly ● Time based snapshots ● Storing source changes ● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting ● Source Issues Magnified ● Scalability
  • 6.
    Inmon’s 3NF Model ● Original data warehouse model ● Move historical data into own data store ● Data transformed to 3NF ● Entities and relationships
  • 7.
    Open Source Software ● MySQL ● PostgreSQL ● Greenplum (PostgreSQL derivative) ● Any other traditional RDBMS
  • 8.
    Cautions ● Indexing ● Replication ● Partitioning
  • 9.
    Kimball’s Conformed Dimensions ● Normal database modeling does not meet needs of reporting and analysis ● Denormalize data ● Dimensions ● How does data need to be filtered? ● Facts ● What are we wanting to analyze/measure?
  • 10.
  • 11.
    Open Source Software ● Greenplum (PostgreSQL derivative) ● InfiniDB (MySQL derivative) ● Infobright (MySQL derivative) ● Other columnar data stores
  • 12.
    Columnar Data Stores ● Designed for conformed dimensions ● High Performance ● Self-indexing based on usage ● High compression of data
  • 13.
    Row vs ColumnarDatabases Source: http://dbbest.com/blog/column-oriented-database-technologies/
  • 14.
    Cautions ● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  • 15.
    Inmon’s Hub andSpoke ● Combines ● 3NF central data warehouse ● Conformed dimensions ● Becomes foundation for further variants
  • 16.
    Linstedt’s Data Vault Model ● Mixes 3NF and Conformed Dimensions ● Model data per business entities and their relationships ● Hubs ● Store unique business entity identifiers (keys) ● Links ● Relate hubs and other links to form relationships ● Satellites ● Store unique information regarding entity or relationship
  • 17.
  • 18.
    Cautions ● While you get the best mix between 3NF and conformed dimensions, data marts are still needed ● Issues seen with both 3NF and conformed dimensions can be found here
  • 19.
    Open Source Software ● MySQL ● PostgreSQL ● Greenplum ● Other Traditional RDBMS ● NoSQL ● Hadoop
  • 20.
    Rönnbäck’s Anchor Model/6NF ● Focus is on the data and it’s relationships. ● Anchors ● Model entities and events ● Attributes ● Model properties of anchors ● Ties ● Model relationships between anchors ● Knots ● Model relationships between shared properties
  • 21.
  • 22.
    Cautions ● Number of joins will be an issue for some databases ● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  • 23.
  • 24.
    Open Source Software ● Anchor Modeling website ● http://www.anchormodeling.com ● Web based design tools ● No databases built specifically for 6NF