Data Warehousing
   Solutions
   with MySQL

   A Breakfast Seminar in London
   4th Feb 2010



                                   1

Sunday, 7 February 2010
9:00 - Welcome Coffee and Tea
                           9:20 - Introduction
                           9:30 - MySQL for Data Warehousing
                          10:00 - Infobright
                          10:30 - Coffee/Tea Break
                          10:45 - Talend
                          11:30 - Seminar Ends.

                                                               2

Sunday, 7 February 2010
Introduction
Sunday, 7 February 2010
MySQL Market Segments


        `

                          Web / Web 2.0                        OEM / ISV's




             On Demand, SaaS,
                 Hosting                  Telecommunications        Enterprise 2.0


          Open-Source Powers the Web & The Network

                                                                                     4

Sunday, 7 February 2010
Timeline
             MAR
             2008
                          Sun acquired MySQL completed March 2008
                          Good acquisition, MySQL continues to grow
             APR
             2009         April 2009 : ORCL agreement to acquire Sun

             JAN
             2010         The EC gives full clearance to the acquisition

             FEB
             2010
                          We continue to develop, maintain, market, sell and
                          support MySQL!


                                                                           5

Sunday, 7 February 2010
Oracle’s MySQL Strategy
      • Becomes part of the Open Source GBU
          > Independent sales organisation - retained from Sun
          > Independent development organisation – retained from Sun
      • Make MySQL better
          > Apply Oracle’s expertise and engineering processes
          > A natural extension of what Oracle has done with InnoDB
      • Make MySQL support better
          > Leverage Oracle’s award winning global support infrastructure
      • Make MySQL part of the Oracle stack
          > Many customers use both MySQL and Oracle database
          > Integrate with Enterprise Manager, Secure Backup, Audit Vault

  http://www.oracle.com/ocom/groups/public/@ocom/documents/webcontent/044521.pdf   6

Sunday, 7 February 2010
Enjoy the
                           event!
                                      7

Sunday, 7 February 2010
Data Warehousing
  with MySQL
Sunday, 7 February 2010
MySQL Data Warehousing Strategy
        • Strongly support common data warehouse use cases
        • Offer modern technology that adheres to MySQL’s
          software priorities (reliability, performance, ease-of-use)
        • Partner with major BI/ETL vendors
        • Offer highly attractive total cost of ownership




                                                                        9

Sunday, 7 February 2010
The MySQL DW Ecosystem
                                BI/REPORTING
                          ETL                  INTEGRATION
                                    TOOLS




      RDBMS


      STORAGE ENGINE


      PLATFORM
                                                        10

Sunday, 7 February 2010
Common Use Cases
         1.Small, semi real-time data marts
         2.Continuous, real-time/query data warehousing
         3.Traditional, standard reporting warehouse
         4.Massive historical, with ad-hoc queries warehouse
         5.BI, analytic in OLTP applications (emerging…)
    Data Mart             Real-Time   Traditional   Historical   Analytical

                                                       SQL




                                                                        11

Sunday, 7 February 2010
MySQL Technical Strategy
    • Provide open source architecture to maximize innovation
    • Offer core data warehousing feature set
    • Provide specialised data warehouse engines for key use
      cases
    • Supply strategies for combating mixed workload
      challenge




                                                                12

Sunday, 7 February 2010
Pluggable Storage Engine Architecture




                                   13

Sunday, 7 February 2010
MySQL Enterprise
                          • MySQL Enterprise Server
                          • Monthly Rapid Updates
      Server              • Quarterly Service Packs
                          • Hot Fix Program
                          • Indemnification
                          • Global Monitoring of All Servers
                          • Web-Based Central Console
      Monitor             • Built-in Advisors and Expert Advice
                          • MySQL Query Analyzer
                          • Replication Monitor
                          • 24 x 7 x 365 Production Support
                          • Web-Based Knowledge Base
      Support             • Consultative Help
                          • High Availability and Scale Out

   http://www.mysql.com/products/enterprise/                      14

Sunday, 7 February 2010
MySQL Enterprise Monitor
                     “Your Virtual MySQL DBA”
                              Assistant
                                                • Single, consolidated view into
                                                    entire MySQL environment
                                                •   Auto discovery of MySQL
                                                    Servers, Replication Topologies
                                                •   New Query Analyzer
                                                •   Customisable rules-based
                                                    monitoring and alerts
                                                •   Identifies problems before they
                                                    occur
                                                •   Reduces risk of downtime
                                                •   Makes it easier
                                                    to scale-out without
                                                    requiring more DBAs


       http://www.mysql.com/products/enterprise/advisors.html
                                                                                15

Sunday, 7 February 2010
MySQL Query Analyzer

                                        • Centralised monitoring of Queries
                                          across all servers
                                        • No reliance on Slow Query Logs,
                                          SHOW PROCESSLIST, VMSTAT,
                                          etc.
                                        • Aggregated view of query
                                          execution counts, time, and rows
                                        • Saves time parsing atomic
                                          executions for total query expense




 “Finds code problems before your customers do.”
                                                                          16

Sunday, 7 February 2010
The MySQL Technology behind a DW Strategy
                 SHARDING                                   REPLICATION   MySQL PROXY




             MEMCACHED                                      QUERY CACHE

                                                                           STORAGE
           PARTITIONING                                                    ENGINES
      Col1 Col2 Col3 Col4 Col5   Col1 Col2 Col3 Col4 Col5




                                 Col1 Col2 Col3 Col4 Col5




                                                                                     17

Sunday, 7 February 2010
Warehouse use cases/mapping
    Data Mart             Real-Time      Traditional    Historical     Analytical

                                                            SQL




 •MyISAM                  •MyISAM        •MyISAM        •MyISAM        •MyISAM
 •InnoDB                  •InnoDB        •InnoDB        •InnoDB        •InnoDB
 •CSV                     •CSV           •CSV           •CSV           •CSV
 •Archive                 •Archive       •Archive       •Archive       •Archive
 •Federated               •Federated     •Federated     •Federated     •Federated
 •Query Cache             •Query Cache   •Query Cache   •Query Cache   •Query Cache
 •Replication             •Replication   •Replication   •Replication   •Replication
 •Sharding                •Sharding      •Sharding      •Sharding      •Sharding
 •Proxy                   •Proxy         •Proxy         •Proxy         •Proxy
 •Memcached               •Memcached     •Memcached     •Memcached     •Memcached
                                                                                18

Sunday, 7 February 2010
MySQL
   Data Warehouse
   Cookbook
Sunday, 7 February 2010
Partitioning
   • Partition Pruning
   • Partitioning key must result in an INT
   • Check table lock with MyISAM
   • Check the number of open files
   • Foreign Keys, Fulltext and spatial indexes are not supported
   • No MyISAM, LOAD INDEX or INSERT DELAYED
   • For DW, it is mainly limited to InnoDB and MyISAM
    Vertical Partitioning                                                       Horizontal Partitioning
   Col1   Col2   Col3   Col4   Col5   Col1   Col2   Col1   Col3   Col4   Col5   Col1   Col2   Col3   Col4   Col5   Col1   Col2   Col3   Col4   Col5




                                                                                                                   Col1   Col2   Col3   Col4   Col5




                                                                                                                                                  20

Sunday, 7 February 2010
SQL Generation
      • Multipass SQL or Subqueries
      • Avoid complex queries
          > More efficient use of query cache, key buffer and buffer pool
          > More shard friendly
          > More scalable for the current version of MySQL
            –No parallel query
      • Use temp tables and stored procedures
      • Check with EXPLAIN
          > ALL (sequential scan)
          > Using filesort
          > Using temporary (for GROUP BY and ORDER BY)


                                                                            21

Sunday, 7 February 2010
Server Tuning
                          Query Cache                          Temporary Tables
 •   SELECT...SQL_NO_CACHE              •   tmp_table_size
 •   query_cache_type                   •   max_heap_table_size
 •   query_cache_limit                  • Implicit tmp tables can be tricky to control
 •   query_cache_size                   • Store intermediate results
 • No time functions                       • Connect > Query > Disconnect




                                                                 Thread Buffers
                                        •   join_buffer_size
                                        •   read_buffer_size
                                        •   read_rnd_buffer_size
                                        •   sort_buffer_size
                                        • For large resultsets and for high number of concurrent users,
                                          they should be set individually or by role




                                                                                                    22

Sunday, 7 February 2010
Modelling
 • Multidimensional, but with care                                                                                                            • Queries
 • Snowflake vs Star Schema                                                                                                                   > Query on Dimension N > Temp Table
   > Do not denormalise descriptions                                                                                                          > Query on Fact 1 > Temp Table
   > Multiple fact tables with 1:1 relationships                                                                                              > Query on Fact 2 Join Temp Table


                                                                                                                                   Key Desc                                                                                                            Key Desc
     Key   Desc   Key   Desc   Key   Desc                    Key     Desc      Key   Desc        Key     Desc                                 Key   Key   Desc                                                                      Key   Key   Desc

                                                                                                                                                                 Key   Key    Key    Desc                  Key   Key   Key   Desc




                  PK     Key   Key    Key   Key   Met    Met       Met      Met      Met                                                                         PK    Key    Key    Key     Key   Met     Met   Met   Met    Met




                                                                                                                                                                 Key    Key    Key   Desc                  Key   Key   Key   Desc

     Key   Desc   Key   Desc   Key   Desc                    Key     Desc      Key   Desc        Key     Desc                                 Key   Key   Desc                                                                      Key   Key   Desc

                                                                                                                                   Key Desc                                                                                                            Key Desc




                                                        PK     Key       Key      Key      Key         ...      Key   Met   Met   Met          PK   Met    Met   Met   Met    Met      Met           Met




                                                                                                                                                                                                                                                       23

Sunday, 7 February 2010
Storage Engines
                                 MyISAM                                                       CSV
   • Compressed Tables                                           • Good ETL trick
   • Use different spindles for data and indexes                 • No Partitioning, no indexing, no nulls
   • Fast inserts - Insert already sorted data (when possible)
   • Key Buffers
      • Multiple Key Buffers
       • SET GLOBAL <key_cache_name>.key_buffer_size...                                    Archive
       • CACHE INDEX ... IN ...                                  • Data compression and fast retrieve
       • key_cache_block_size                                    • INSERT & SELECT
       • bulk_insert_buffer_size                                 • No index (autoincrement only)
   • Spatial and Fulltext indexes
   • All active shared disk cluster
                                                                                          Federated
                                 InnoDB                          • Limited indexing
   • innodb_file_per_table                                       • Tips:
   • innodb_flush_log_at_trx_commit                                  • Queries can be executed on multiple servers + result
   • innodb_buffer_pool_size                                           collection
   • The new Innodb plugin                                           • Use of stored procedures to consolidate results and
                                                                       control the access to the FEDERATED tables
      • Fast index creation
      • Data compression
   • Do not use FK or constraints


                                                                                                                          24

Sunday, 7 February 2010
Replication                                                    Source
                                                                  Master

     • [For some] The easiest way to
       provide real time data marts
                                                       Querying                           Updating
     • Tips:
                                                                           Rotating
          > Delayed replication                                             Slaves

          > Rotating servers
          > Support to more power users

                                                                                      BI/Report
                Read                                                                   Servers

                Write




                          Real   -10      -30    -1           -12
                                                                                  Yesterday
                          Time   Min      Min   Hour         Hours

        Source
        Master

                                                                                                     25

Sunday, 7 February 2010
Sharding
       • Sharding
            > Great to distribute the workload
            > Fantastic if the queries can be executed in parallel thanks to a middle or a client
              layer
            > Tips:
                 – Replicate the dimensions
                 – specialise shards on facts
                          –   partition facts on shards



                                                                                                BI/Report
               Read                                                                              Servers

               Write



                                                                                                     Shards
                                                   A1     A2   B     C1         C2          D



     Dimensions
       Master


                                                                                                            26

Sunday, 7 February 2010
More Resources Available
                          • Webinars
                            • http://www-it.mysql.com/news-and-events/web-seminars/

                          • Consulting
                           • MySQL Architecture & Design
                           • MySQL Performance tuning
                           http://www.mysql.com/consulting/


                          • Training
                           • MySQL 5.1 for developers
                           • MySQL 5.1 for DBAs
                           http://www.mysql.com/training/


                          • White Papers
                           • http://www.mysql.com/why-mysql/white-papers/




                                                                                      27

Sunday, 7 February 2010
Thank You!
   Data Warehouse Solutions
   with MySQL


   ivan@mysql.com
   http://izoratti.blogspot.com   28

Sunday, 7 February 2010

MySQL DW Breakfast

  • 1.
    Data Warehousing Solutions with MySQL A Breakfast Seminar in London 4th Feb 2010 1 Sunday, 7 February 2010
  • 2.
    9:00 - WelcomeCoffee and Tea 9:20 - Introduction 9:30 - MySQL for Data Warehousing 10:00 - Infobright 10:30 - Coffee/Tea Break 10:45 - Talend 11:30 - Seminar Ends. 2 Sunday, 7 February 2010
  • 3.
  • 4.
    MySQL Market Segments ` Web / Web 2.0 OEM / ISV's On Demand, SaaS, Hosting Telecommunications Enterprise 2.0 Open-Source Powers the Web & The Network 4 Sunday, 7 February 2010
  • 5.
    Timeline MAR 2008 Sun acquired MySQL completed March 2008 Good acquisition, MySQL continues to grow APR 2009 April 2009 : ORCL agreement to acquire Sun JAN 2010 The EC gives full clearance to the acquisition FEB 2010 We continue to develop, maintain, market, sell and support MySQL! 5 Sunday, 7 February 2010
  • 6.
    Oracle’s MySQL Strategy • Becomes part of the Open Source GBU > Independent sales organisation - retained from Sun > Independent development organisation – retained from Sun • Make MySQL better > Apply Oracle’s expertise and engineering processes > A natural extension of what Oracle has done with InnoDB • Make MySQL support better > Leverage Oracle’s award winning global support infrastructure • Make MySQL part of the Oracle stack > Many customers use both MySQL and Oracle database > Integrate with Enterprise Manager, Secure Backup, Audit Vault http://www.oracle.com/ocom/groups/public/@ocom/documents/webcontent/044521.pdf 6 Sunday, 7 February 2010
  • 7.
    Enjoy the event! 7 Sunday, 7 February 2010
  • 8.
    Data Warehousing with MySQL Sunday, 7 February 2010
  • 9.
    MySQL Data WarehousingStrategy • Strongly support common data warehouse use cases • Offer modern technology that adheres to MySQL’s software priorities (reliability, performance, ease-of-use) • Partner with major BI/ETL vendors • Offer highly attractive total cost of ownership 9 Sunday, 7 February 2010
  • 10.
    The MySQL DWEcosystem BI/REPORTING ETL INTEGRATION TOOLS RDBMS STORAGE ENGINE PLATFORM 10 Sunday, 7 February 2010
  • 11.
    Common Use Cases 1.Small, semi real-time data marts 2.Continuous, real-time/query data warehousing 3.Traditional, standard reporting warehouse 4.Massive historical, with ad-hoc queries warehouse 5.BI, analytic in OLTP applications (emerging…) Data Mart Real-Time Traditional Historical Analytical SQL 11 Sunday, 7 February 2010
  • 12.
    MySQL Technical Strategy • Provide open source architecture to maximize innovation • Offer core data warehousing feature set • Provide specialised data warehouse engines for key use cases • Supply strategies for combating mixed workload challenge 12 Sunday, 7 February 2010
  • 13.
    Pluggable Storage EngineArchitecture 13 Sunday, 7 February 2010
  • 14.
    MySQL Enterprise • MySQL Enterprise Server • Monthly Rapid Updates Server • Quarterly Service Packs • Hot Fix Program • Indemnification • Global Monitoring of All Servers • Web-Based Central Console Monitor • Built-in Advisors and Expert Advice • MySQL Query Analyzer • Replication Monitor • 24 x 7 x 365 Production Support • Web-Based Knowledge Base Support • Consultative Help • High Availability and Scale Out http://www.mysql.com/products/enterprise/ 14 Sunday, 7 February 2010
  • 15.
    MySQL Enterprise Monitor “Your Virtual MySQL DBA” Assistant • Single, consolidated view into entire MySQL environment • Auto discovery of MySQL Servers, Replication Topologies • New Query Analyzer • Customisable rules-based monitoring and alerts • Identifies problems before they occur • Reduces risk of downtime • Makes it easier to scale-out without requiring more DBAs http://www.mysql.com/products/enterprise/advisors.html 15 Sunday, 7 February 2010
  • 16.
    MySQL Query Analyzer • Centralised monitoring of Queries across all servers • No reliance on Slow Query Logs, SHOW PROCESSLIST, VMSTAT, etc. • Aggregated view of query execution counts, time, and rows • Saves time parsing atomic executions for total query expense “Finds code problems before your customers do.” 16 Sunday, 7 February 2010
  • 17.
    The MySQL Technologybehind a DW Strategy SHARDING REPLICATION MySQL PROXY MEMCACHED QUERY CACHE STORAGE PARTITIONING ENGINES Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5 17 Sunday, 7 February 2010
  • 18.
    Warehouse use cases/mapping Data Mart Real-Time Traditional Historical Analytical SQL •MyISAM •MyISAM •MyISAM •MyISAM •MyISAM •InnoDB •InnoDB •InnoDB •InnoDB •InnoDB •CSV •CSV •CSV •CSV •CSV •Archive •Archive •Archive •Archive •Archive •Federated •Federated •Federated •Federated •Federated •Query Cache •Query Cache •Query Cache •Query Cache •Query Cache •Replication •Replication •Replication •Replication •Replication •Sharding •Sharding •Sharding •Sharding •Sharding •Proxy •Proxy •Proxy •Proxy •Proxy •Memcached •Memcached •Memcached •Memcached •Memcached 18 Sunday, 7 February 2010
  • 19.
    MySQL Data Warehouse Cookbook Sunday, 7 February 2010
  • 20.
    Partitioning • Partition Pruning • Partitioning key must result in an INT • Check table lock with MyISAM • Check the number of open files • Foreign Keys, Fulltext and spatial indexes are not supported • No MyISAM, LOAD INDEX or INSERT DELAYED • For DW, it is mainly limited to InnoDB and MyISAM Vertical Partitioning Horizontal Partitioning Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col1 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5 Col1 Col2 Col3 Col4 Col5 20 Sunday, 7 February 2010
  • 21.
    SQL Generation • Multipass SQL or Subqueries • Avoid complex queries > More efficient use of query cache, key buffer and buffer pool > More shard friendly > More scalable for the current version of MySQL –No parallel query • Use temp tables and stored procedures • Check with EXPLAIN > ALL (sequential scan) > Using filesort > Using temporary (for GROUP BY and ORDER BY) 21 Sunday, 7 February 2010
  • 22.
    Server Tuning Query Cache Temporary Tables • SELECT...SQL_NO_CACHE • tmp_table_size • query_cache_type • max_heap_table_size • query_cache_limit • Implicit tmp tables can be tricky to control • query_cache_size • Store intermediate results • No time functions • Connect > Query > Disconnect Thread Buffers • join_buffer_size • read_buffer_size • read_rnd_buffer_size • sort_buffer_size • For large resultsets and for high number of concurrent users, they should be set individually or by role 22 Sunday, 7 February 2010
  • 23.
    Modelling • Multidimensional,but with care • Queries • Snowflake vs Star Schema > Query on Dimension N > Temp Table > Do not denormalise descriptions > Query on Fact 1 > Temp Table > Multiple fact tables with 1:1 relationships > Query on Fact 2 Join Temp Table Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc Key Key Desc Key Key Desc Key Key Key Desc Key Key Key Desc PK Key Key Key Key Met Met Met Met Met PK Key Key Key Key Met Met Met Met Met Key Key Key Desc Key Key Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc Key Desc Key Key Desc Key Key Desc Key Desc Key Desc PK Key Key Key Key ... Key Met Met Met PK Met Met Met Met Met Met Met 23 Sunday, 7 February 2010
  • 24.
    Storage Engines MyISAM CSV • Compressed Tables • Good ETL trick • Use different spindles for data and indexes • No Partitioning, no indexing, no nulls • Fast inserts - Insert already sorted data (when possible) • Key Buffers • Multiple Key Buffers • SET GLOBAL <key_cache_name>.key_buffer_size... Archive • CACHE INDEX ... IN ... • Data compression and fast retrieve • key_cache_block_size • INSERT & SELECT • bulk_insert_buffer_size • No index (autoincrement only) • Spatial and Fulltext indexes • All active shared disk cluster Federated InnoDB • Limited indexing • innodb_file_per_table • Tips: • innodb_flush_log_at_trx_commit • Queries can be executed on multiple servers + result • innodb_buffer_pool_size collection • The new Innodb plugin • Use of stored procedures to consolidate results and control the access to the FEDERATED tables • Fast index creation • Data compression • Do not use FK or constraints 24 Sunday, 7 February 2010
  • 25.
    Replication Source Master • [For some] The easiest way to provide real time data marts Querying Updating • Tips: Rotating > Delayed replication Slaves > Rotating servers > Support to more power users BI/Report Read Servers Write Real -10 -30 -1 -12 Yesterday Time Min Min Hour Hours Source Master 25 Sunday, 7 February 2010
  • 26.
    Sharding • Sharding > Great to distribute the workload > Fantastic if the queries can be executed in parallel thanks to a middle or a client layer > Tips: – Replicate the dimensions – specialise shards on facts – partition facts on shards BI/Report Read Servers Write Shards A1 A2 B C1 C2 D Dimensions Master 26 Sunday, 7 February 2010
  • 27.
    More Resources Available • Webinars • http://www-it.mysql.com/news-and-events/web-seminars/ • Consulting • MySQL Architecture & Design • MySQL Performance tuning http://www.mysql.com/consulting/ • Training • MySQL 5.1 for developers • MySQL 5.1 for DBAs http://www.mysql.com/training/ • White Papers • http://www.mysql.com/why-mysql/white-papers/ 27 Sunday, 7 February 2010
  • 28.
    Thank You! Data Warehouse Solutions with MySQL ivan@mysql.com http://izoratti.blogspot.com 28 Sunday, 7 February 2010