SlideShare a Scribd company logo
Open Source Data Warehousing:
     MySQL and Beyond



             Alex Meadows
          Twitter: @DBA_Alex
        Percona MySQL University
               Raleigh, NC
                1/29/2013
What Is Data Warehousing?
●   Central repository
●   Oriented on Reporting and Analysis
●   Integrates multiple sources
●   Core to Business Intelligence and Advanced
    Analytics
●   Helps keep source systems clean and lean
Warehouse Methodologies
●   Inmon’s 3NF/Hub and Spoke Model
●   Kimball’s Conformed Dimension Model
●   Linstedt’s Data Vault Model
●   Rönnbäck’s Anchor Model/6NF
Source: http://www.anchormodeling.com/wp-content/uploads/2011/05/Anchor-Modeling-GSE.pdf
Common DW Challenges
●   Data storage increases significantly
        ●   Time based snapshots
        ●   Storing source changes
●   Massive queries
        ●   Joining many tables, from multiple sources
        ●   Exploratory vs reporting
●   Source Issues Magnified
●   Scalability
Inmon’s 3NF Model
●   Original data warehouse model
●   Move historical data into own data store
●   Data transformed to 3NF
        ●   Entities and relationships
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum (PostgreSQL derivative)
●   Any other traditional RDBMS
Cautions
●   Indexing
●   Replication
●   Partitioning
Kimball’s Conformed Dimensions
●   Normal database modeling does not meet needs of
    reporting and analysis
●   Denormalize data
●   Dimensions
       ●    How does data need to be filtered?
●   Facts
       ●    What are we wanting to analyze/measure?
Source: http://blog-mstechnology.blogspot.com/2010/06/bi-dimensional-model-star-schema.html
Open Source Software
●   Greenplum (PostgreSQL derivative)
●   InfiniDB (MySQL derivative)
●   Infobright (MySQL derivative)
●   Other columnar data stores
Columnar Data Stores
●   Designed for conformed dimensions
●   High Performance
       ●   Self-indexing based on usage
       ●   High compression of data
Row vs Columnar Databases




Source: http://dbbest.com/blog/column-oriented-database-technologies/
Cautions
●   Traditional RDBMS
       ●   Not built for conformed dimensions!
       ●   Performance will become issue
Inmon’s Hub and Spoke
●   Combines
        ●   3NF central data warehouse
        ●   Conformed dimensions
●   Becomes foundation for further variants
●   Linstedt’s Data Vault Model
●   Mixes 3NF and Conformed Dimensions
●   Model data per business entities and their
    relationships
●   Hubs
        ●   Store unique business entity identifiers (keys)
●   Links
        ●   Relate hubs and other links to form relationships
●   Satellites
        ●   Store unique information regarding entity or
              relationship
Source: http://danlinstedt.com/about/data-vault-basics/
Cautions
●   While you get the best mix between 3NF and
    conformed dimensions, data marts are still needed
●   Issues seen with both 3NF and conformed
    dimensions can be found here
Open Source Software
●   MySQL
●   PostgreSQL
●   Greenplum
●   Other Traditional RDBMS
●   NoSQL
       ●   Hadoop
●   Rönnbäck’s Anchor Model/6NF
●   Focus is on the data and it’s relationships.
●   Anchors
        ●   Model entities and events
●   Attributes
        ●   Model properties of anchors
●   Ties
        ●   Model relationships between anchors
●   Knots
        ●   Model relationships between shared properties
Source: http://en.wikipedia.org/wiki/Anchor_Modeling
Cautions
●   Number of joins will be an issue for some databases
●   Queries will become complex
        ●   Joins
        ●   Finding properties/valuable information
        ●   Every column in traditional tables becomes own
             unique table
?
Open Source Software
●   Anchor Modeling website
        ●   http://www.anchormodeling.com
        ●   Web based design tools
●   No databases built specifically for 6NF

More Related Content

What's hot

NoSQL
NoSQLNoSQL
NoSQL
Radu Potop
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
Alex Meadows
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
Alex Meadows
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
KenAndTea
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
Sergio Fernández
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
Max Neunhöffer
 
No sql
No sqlNo sql
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
Mike Frampton
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
FlyData Inc.
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaThomas Kurz
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
Shawn Zhu
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Tsendsuren Munkhdalai
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
Mahdi Atawneh
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
Michael Hackstein
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
Muntazir Mehdi
 
Core Data
Core DataCore Data
Core Data
Rinto Rapheal
 

What's hot (20)

NoSQL
NoSQLNoSQL
NoSQL
 
CSCi226PPT1
CSCi226PPT1CSCi226PPT1
CSCi226PPT1
 
Database
DatabaseDatabase
Database
 
Graphing Your Data
Graphing Your DataGraphing Your Data
Graphing Your Data
 
How Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information DiscoveryHow Linked Data Can Speed Information Discovery
How Linked Data Can Speed Information Discovery
 
NoSQL
NoSQLNoSQL
NoSQL
 
Introduction to mongo db by zain
Introduction to mongo db by zainIntroduction to mongo db by zain
Introduction to mongo db by zain
 
Apache Marmotta (incubating)
Apache Marmotta (incubating)Apache Marmotta (incubating)
Apache Marmotta (incubating)
 
Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
No sql
No sqlNo sql
No sql
 
Apache Arrow
Apache ArrowApache Arrow
Apache Arrow
 
What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?What's So Unique About a Columnar Database?
What's So Unique About a Columnar Database?
 
Semantic Media Management with Apache Marmotta
Semantic Media Management with Apache MarmottaSemantic Media Management with Apache Marmotta
Semantic Media Management with Apache Marmotta
 
Oslo bekk2014
Oslo bekk2014Oslo bekk2014
Oslo bekk2014
 
Build an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data ScientistsBuild an Open Source Data Lake For Data Scientists
Build an Open Source Data Lake For Data Scientists
 
Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2Improvement of no sql technology for relational databases v2
Improvement of no sql technology for relational databases v2
 
01 nosql and multi model database
01   nosql and multi model database01   nosql and multi model database
01 nosql and multi model database
 
Multi model-databases
Multi model-databasesMulti model-databases
Multi model-databases
 
RDF Seminar Presentation
RDF Seminar PresentationRDF Seminar Presentation
RDF Seminar Presentation
 
Core Data
Core DataCore Data
Core Data
 

Similar to Open source data_warehousing_overview

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland Bouman
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
Alex Meadows
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
nehabsairam
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Knoldus Inc.
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
Jervin Real
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
Ahmed Farag
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
datamantra
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
Datastore PPT.pptx
Datastore PPT.pptxDatastore PPT.pptx
Datastore PPT.pptx
Jatin Chuglani
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
MongoDB
MongoDBMongoDB
NoSql
NoSqlNoSql
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
ScyllaDB
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Denodo
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
nicolascombin1
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
Abiral Gautam
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
adeel8937
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
RithikRaj25
 

Similar to Open source data_warehousing_overview (20)

Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
Roland bouman modern_data_warehouse_architectures_data_vault_and_anchor_model...
 
Big Data Pitfalls
Big Data PitfallsBig Data Pitfalls
Big Data Pitfalls
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Heterogenous Persistence
Heterogenous PersistenceHeterogenous Persistence
Heterogenous Persistence
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
Introduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQLIntroduction to Structured Data Processing with Spark SQL
Introduction to Structured Data Processing with Spark SQL
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
 
Datastore PPT.pptx
Datastore PPT.pptxDatastore PPT.pptx
Datastore PPT.pptx
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
MongoDB
MongoDBMongoDB
MongoDB
 
NoSql
NoSqlNoSql
NoSql
 
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
Scylla Summit 2022: Migrating SQL Schemas for ScyllaDB: Data Modeling Best Pr...
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 
Complete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examplesComplete+dbt+Bootcamp+slides-plus examples
Complete+dbt+Bootcamp+slides-plus examples
 
Presentation On NoSQL Databases
Presentation On NoSQL DatabasesPresentation On NoSQL Databases
Presentation On NoSQL Databases
 
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
vdocuments.mx_chapter-2-database-environment-thomas-connolly-carolyn-begg-dat...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 

More from Alex Meadows

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
Alex Meadows
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
Alex Meadows
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
Alex Meadows
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
Alex Meadows
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
Alex Meadows
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
Alex Meadows
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
Alex Meadows
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
Alex Meadows
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
Alex Meadows
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
Alex Meadows
 
Data quality overview
Data quality overviewData quality overview
Data quality overviewAlex Meadows
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
Alex Meadows
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence OverviewAlex Meadows
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
Alex Meadows
 

More from Alex Meadows (14)

Ethics In A Data Driven World
Ethics In A Data Driven WorldEthics In A Data Driven World
Ethics In A Data Driven World
 
SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?SIM RTP Meeting - So Who's Using Open Source Anyway?
SIM RTP Meeting - So Who's Using Open Source Anyway?
 
Introduction To Data Warehousing
Introduction To Data WarehousingIntroduction To Data Warehousing
Introduction To Data Warehousing
 
Continuous Integration As A Service
Continuous Integration As A ServiceContinuous Integration As A Service
Continuous Integration As A Service
 
Building next generation data warehouses
Building next generation data warehousesBuilding next generation data warehouses
Building next generation data warehouses
 
Introduction To Analytics
Introduction To AnalyticsIntroduction To Analytics
Introduction To Analytics
 
Continuous integration with business intelligence and analytics
Continuous integration with business intelligence and analyticsContinuous integration with business intelligence and analytics
Continuous integration with business intelligence and analytics
 
Big Data Analytics - Introduction
Big Data Analytics - IntroductionBig Data Analytics - Introduction
Big Data Analytics - Introduction
 
Open Source BI Overview
Open Source BI Overview Open Source BI Overview
Open Source BI Overview
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Data quality overview
Data quality overviewData quality overview
Data quality overview
 
Mondrian and OLAP Overview
Mondrian and OLAP OverviewMondrian and OLAP Overview
Mondrian and OLAP Overview
 
Open Source Business Intelligence Overview
Open Source Business Intelligence OverviewOpen Source Business Intelligence Overview
Open Source Business Intelligence Overview
 
Choosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettleChoosing the right steps in pentaho kettle
Choosing the right steps in pentaho kettle
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 

Open source data_warehousing_overview

  • 1. Open Source Data Warehousing: MySQL and Beyond Alex Meadows Twitter: @DBA_Alex Percona MySQL University Raleigh, NC 1/29/2013
  • 2. What Is Data Warehousing? ● Central repository ● Oriented on Reporting and Analysis ● Integrates multiple sources ● Core to Business Intelligence and Advanced Analytics ● Helps keep source systems clean and lean
  • 3. Warehouse Methodologies ● Inmon’s 3NF/Hub and Spoke Model ● Kimball’s Conformed Dimension Model ● Linstedt’s Data Vault Model ● Rönnbäck’s Anchor Model/6NF
  • 5. Common DW Challenges ● Data storage increases significantly ● Time based snapshots ● Storing source changes ● Massive queries ● Joining many tables, from multiple sources ● Exploratory vs reporting ● Source Issues Magnified ● Scalability
  • 6. Inmon’s 3NF Model ● Original data warehouse model ● Move historical data into own data store ● Data transformed to 3NF ● Entities and relationships
  • 7. Open Source Software ● MySQL ● PostgreSQL ● Greenplum (PostgreSQL derivative) ● Any other traditional RDBMS
  • 8. Cautions ● Indexing ● Replication ● Partitioning
  • 9. Kimball’s Conformed Dimensions ● Normal database modeling does not meet needs of reporting and analysis ● Denormalize data ● Dimensions ● How does data need to be filtered? ● Facts ● What are we wanting to analyze/measure?
  • 11. Open Source Software ● Greenplum (PostgreSQL derivative) ● InfiniDB (MySQL derivative) ● Infobright (MySQL derivative) ● Other columnar data stores
  • 12. Columnar Data Stores ● Designed for conformed dimensions ● High Performance ● Self-indexing based on usage ● High compression of data
  • 13. Row vs Columnar Databases Source: http://dbbest.com/blog/column-oriented-database-technologies/
  • 14. Cautions ● Traditional RDBMS ● Not built for conformed dimensions! ● Performance will become issue
  • 15. Inmon’s Hub and Spoke ● Combines ● 3NF central data warehouse ● Conformed dimensions ● Becomes foundation for further variants
  • 16. Linstedt’s Data Vault Model ● Mixes 3NF and Conformed Dimensions ● Model data per business entities and their relationships ● Hubs ● Store unique business entity identifiers (keys) ● Links ● Relate hubs and other links to form relationships ● Satellites ● Store unique information regarding entity or relationship
  • 18. Cautions ● While you get the best mix between 3NF and conformed dimensions, data marts are still needed ● Issues seen with both 3NF and conformed dimensions can be found here
  • 19. Open Source Software ● MySQL ● PostgreSQL ● Greenplum ● Other Traditional RDBMS ● NoSQL ● Hadoop
  • 20. Rönnbäck’s Anchor Model/6NF ● Focus is on the data and it’s relationships. ● Anchors ● Model entities and events ● Attributes ● Model properties of anchors ● Ties ● Model relationships between anchors ● Knots ● Model relationships between shared properties
  • 22. Cautions ● Number of joins will be an issue for some databases ● Queries will become complex ● Joins ● Finding properties/valuable information ● Every column in traditional tables becomes own unique table
  • 23. ?
  • 24. Open Source Software ● Anchor Modeling website ● http://www.anchormodeling.com ● Web based design tools ● No databases built specifically for 6NF