SlideShare a Scribd company logo
1 of 24
Open Source Data Warehousing:
     MySQL and Beyond



             Alex Meadows
          Twitter: @DBA_Alex
        Percona MySQL University
               Raleigh, NC
                1/29/2013
What Is Data Warehousing?
●   Central repository
●   Oriented on Reporting and Analysis
●   Integrates multiple sources
●   Core to Business Intelligence and Advanced
    Analytics
●   Helps keep source systems clean and lean
Warehouse Methodologies
●   Inmon’s 3NF/Hub and Spoke Model
●   Kimball’s Conformed Dimension Model
●   Linstedt’s Data Vault Model
●   Rönnbäck’s Anchor Model/6NF
Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf
Common DW Challenges
●   Data storage increases significantly
        ●   Time based snapshots
        ●   Storing source changes
●   Massive queries
        ●   Joining many tables, from multiple sources
        ●   Exploratory vs reporting
●   Source Issues Magnified
●   Scalability
Inmon’s 3NF Model
●   Original data warehouse model
●   Move historical data into own data store
●   Data transformed to 3NF
        ●   Entities and relationships
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum (PostgreSQL derivative)
●   Any other traditional RDBMS
Cautions
●   Indexing
●   Replication
●   Partitioning
Kimball’s Conformed Dimensions
●   Normal database modeling does not meet needs of
    reporting and analysis
●   Denormalize data
●   Dimensions
       ●    How does data need to be filtered?
●   Facts
       ●    What are we wanting to analyze/measure?
Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html
Open Source Software
●   Greenplum (PostgreSQL derivative)
●   InfiniDB (MySQL derivative)
●   Infobright (MySQL derivative)
●   Other columnar data stores
Columnar Data Stores
●   Designed for conformed dimensions
●   High Performance
       ●   Self-indexing based on usage
       ●   High compression of data
Row vs Columnar Databases




Source: http://dbbest.com/blog/column-oriented-database-technologies/
Cautions
●   Traditional RDBMS
       ●   Not built for conformed dimensions!
       ●   Performance will become issue
Inmon’s Hub and Spoke
●   Combines
        ●   3NF central data warehouse
        ●   Conformed dimensions
●   Becomes foundation for further variants
●   Linstedt’s Data Vault Model
●   Mixes 3NF and Conformed Dimensions
●   Model data per business entities and their
    relationships
●   Hubs
        ●   Store unique business entity identifiers (keys)
●   Links
        ●   Relate hubs and other links to form relationships
●   Satellites
        ●   Store unique information regarding entity or
              relationship
Source: http://danlinstedt.com/about/data-vault-basics/
Cautions
●   While you get the best mix between 3NF and
    conformed dimensions, data marts are still needed
●   Issues seen with both 3NF and conformed
    dimensions can be found here
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum
●   Other Traditional RDBMS
●   NoSQL
       ●   Hadoop
●   Rönnbäck’s Anchor Model/6NF
●   Focus is on the data and it’s relationships.
●   Anchors
        ●   Model entities and events
●   Attributes
        ●   Model properties of anchors
●   Ties
        ●   Model relationships between anchors
●   Knots
        ●   Model relationships between shared properties
Source: http://en.wikipedia.org/wiki/Anchor_Modeling
Cautions
●   Number of joins will be an issue for some databases
●   Queries will become complex
        ●   Joins
        ●   Finding properties/valuable information
        ●   Every column in traditional tables becomes own
             unique table
?
Open Source Software
●   Anchor Modeling website
        ●   http://www.anchormodeling.com
        ●   Web based design tools
●   No databases built specifically for 6NF

More Related Content

What's hot

Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your DataAlex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryAlex Meadows
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zainKenAndTea
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)Sergio Fernández
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.jsMax Neunhöffer
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?FlyData Inc.
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaThomas Kurz
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsShawn Zhu
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Tsendsuren Munkhdalai
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model databaseMahdi Atawneh
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar PresentationMuntazir Mehdi
 

What's hot (20)

NoSQL
NoSQLNoSQL
NoSQL
 
CSCi226PPT1
CSCi226PPT1CSCi226PPT1
CSCi226PPT1
 
Database
DatabaseDatabase
Database
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
NoSQL
NoSQLNoSQL
NoSQL
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
No sql
No sqlNo sql
No sql
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
Core Data
Core DataCore Data
Core Data
 

Similar to Open source data_warehousing_overview

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland Bouman
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBKnoldus Inc.
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous PersistenceJervin Real
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLdatamantra
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresqlZaid Shabbir
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...ScyllaDB
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesnicolascombin1
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL DatabasesAbiral Gautam
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...adeel8937
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 

Similar to Open source data_warehousing_overview (20)

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Datastore PPT.pptx
Datastore PPT.pptxDatastore PPT.pptx
Datastore PPT.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSql
NoSqlNoSql
NoSql
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 

More from Alex Meadows

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven WorldAlex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?Alex Meadows
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data WarehousingAlex Meadows
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A ServiceAlex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehousesAlex Meadows
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To AnalyticsAlex Meadows
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsAlex Meadows
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - IntroductionAlex Meadows
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview Alex Meadows
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business IntelligenceAlex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP OverviewAlex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleAlex Meadows
 

More from Alex Meadows (14)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Recently uploaded

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 

Recently uploaded (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 

Open source data_warehousing_overview

  • 1. Open Source Data Warehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  • 2. What Is Data Warehousing? ● Central repository ● Oriented on Reporting and Analysis ● Integrates multiple sources ● Core to Business Intelligence and Advanced Analytics ● Helps keep source systems clean and lean
  • 3. Warehouse Methodologies ● Inmon’s 3NF/Hub and Spoke Model ● Kimball’s Conformed Dimension Model ● Linstedt’s Data Vault Model ● Rönnbäck’s Anchor Model/6NF
  • 5. Common DW Challenges ● Data storage increases significantly ● Time based snapshots ● Storing source changes ● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting ● Source Issues Magnified ● Scalability
  • 6. Inmon’s 3NF Model ● Original data warehouse model ● Move historical data into own data store ● Data transformed to 3NF ● Entities and relationships
  • 7. Open Source Software ● MySQL ● PostgreSQL ● Greenplum (PostgreSQL derivative) ● Any other traditional RDBMS
  • 8. Cautions ● Indexing ● Replication ● Partitioning
  • 9. Kimball’s Conformed Dimensions ● Normal database modeling does not meet needs of reporting and analysis ● Denormalize data ● Dimensions ● How does data need to be filtered? ● Facts ● What are we wanting to analyze/measure?
  • 11. Open Source Software ● Greenplum (PostgreSQL derivative) ● InfiniDB (MySQL derivative) ● Infobright (MySQL derivative) ● Other columnar data stores
  • 12. Columnar Data Stores ● Designed for conformed dimensions ● High Performance ● Self-indexing based on usage ● High compression of data
  • 13. Row vs Columnar Databases Source: http://dbbest.com/blog/column-oriented-database-technologies/
  • 14. Cautions ● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  • 15. Inmon’s Hub and Spoke ● Combines ● 3NF central data warehouse ● Conformed dimensions ● Becomes foundation for further variants
  • 16. Linstedt’s Data Vault Model ● Mixes 3NF and Conformed Dimensions ● Model data per business entities and their relationships ● Hubs ● Store unique business entity identifiers (keys) ● Links ● Relate hubs and other links to form relationships ● Satellites ● Store unique information regarding entity or relationship
  • 18. Cautions ● While you get the best mix between 3NF and conformed dimensions, data marts are still needed ● Issues seen with both 3NF and conformed dimensions can be found here
  • 19. Open Source Software ● MySQL ● PostgreSQL ● Greenplum ● Other Traditional RDBMS ● NoSQL ● Hadoop
  • 20. Rönnbäck’s Anchor Model/6NF ● Focus is on the data and it’s relationships. ● Anchors ● Model entities and events ● Attributes ● Model properties of anchors ● Ties ● Model relationships between anchors ● Knots ● Model relationships between shared properties
  • 22. Cautions ● Number of joins will be an issue for some databases ● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  • 23. ?
  • 24. Open Source Software ● Anchor Modeling website ● http://www.anchormodeling.com ● Web based design tools ● No databases built specifically for 6NF