Open source data_warehousing_overview

Open Source Data Warehousing:
MySQL and Beyond

Alex Meadows
Twitter: @DBA_Alex
Percona MySQL University
Raleigh, NC
1/29/2013

What Is Data Warehousing?
● Central repository
● Oriented on Reporting and Analysis
● Integrates multiple sources
● Core to Business Intelligence and Advanced
Analytics
● Helps keep source systems clean and lean

Warehouse Methodologies
● Inmon’s 3NF/Hub and Spoke Model
● Kimball’s Conformed Dimension Model
● Linstedt’s Data Vault Model
● Rönnbäck’s Anchor Model/6NF

Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf

Common DW Challenges
● Data storage increases significantly
● Time based snapshots
● Storing source changes
● Massive queries
● Joining many tables, from multiple sources
● Exploratory vs reporting
● Source Issues Magnified
● Scalability

Inmon’s 3NF Model
● Original data warehouse model
● Move historical data into own data store
● Data transformed to 3NF
● Entities and relationships

Open Source Software
● MySQL
● PostgreSQL
● Greenplum (PostgreSQL derivative)
● Any other traditional RDBMS

Cautions
● Indexing
● Replication
● Partitioning

Kimball’s Conformed Dimensions
● Normal database modeling does not meet needs of
reporting and analysis
● Denormalize data
● Dimensions
● How does data need to be filtered?
● Facts
● What are we wanting to analyze/measure?

Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html

● Greenplum (PostgreSQL derivative)
● InfiniDB (MySQL derivative)
● Infobright (MySQL derivative)
● Other columnar data stores

Columnar Data Stores
● Designed for conformed dimensions
● High Performance
● Self-indexing based on usage
● High compression of data

Row vs Columnar Databases

Source: http://dbbest.com/blog/column-oriented-database-technologies/

Cautions
● Traditional RDBMS
● Not built for conformed dimensions!
● Performance will become issue

Inmon’s Hub and Spoke
● Combines
● 3NF central data warehouse
● Conformed dimensions
● Becomes foundation for further variants

● Linstedt’s Data Vault Model
● Mixes 3NF and Conformed Dimensions
● Model data per business entities and their
relationships
● Hubs
● Store unique business entity identifiers (keys)
● Links
● Relate hubs and other links to form relationships
● Satellites
● Store unique information regarding entity or
relationship

Source: http://danlinstedt.com/about/data-vault-basics/

Cautions
● While you get the best mix between 3NF and
conformed dimensions, data marts are still needed
● Issues seen with both 3NF and conformed
dimensions can be found here

● MySQL
● PostgreSQL
● Greenplum
● Other Traditional RDBMS
● NoSQL
● Hadoop

● Rönnbäck’s Anchor Model/6NF
● Focus is on the data and it’s relationships.
● Anchors
● Model entities and events
● Attributes
● Model properties of anchors
● Ties
● Model relationships between anchors
● Knots
● Model relationships between shared properties

Source: http://en.wikipedia.org/wiki/Anchor_Modeling

Cautions
● Number of joins will be an issue for some databases
● Queries will become complex
● Joins
● Finding properties/valuable information
● Every column in traditional tables becomes own
unique table

● Anchor Modeling website
● http://www.anchormodeling.com
● Web based design tools
● No databases built specifically for 6NF

Open source data_warehousing_overview

More Related Content

What's hot

Similar to Open source data_warehousing_overview

More from Alex Meadows

Recently uploaded

Open source data_warehousing_overview