Making MySQL Great For Business Intelligence


Published on

This presentation describes how to make MySQL a great database for business intelligence, and presents a special focus on column databases and InfiniDB from Calpont

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Making MySQL Great For Business Intelligence

  1. 1. Making MySQL Great for Business Intelligence Robin Schumacher VP Products Calpont
  2. 2. Agenda <ul><li>Quick overview of BI </li></ul><ul><li>Looking at the right technology foundation </li></ul><ul><li>General physical MySQL design decisions that impact success </li></ul><ul><li>A look at row vs. column MySQL databases </li></ul><ul><li>Conclusions </li></ul>
  3. 3. A Quick Overview of Business Intelligence
  4. 4. What is Business Intelligence? Business Intelligence (BI) refers to skills, processes, technologies, applications and practices used to support decision making. BI technologies provide historical, current, and predictive views of business operations. Common functions of Business Intelligence technologies are reporting, online analytical processing, analytics, data mining, business performance management, benchmarking, text mining, and predictive analytics.
  5. 5. Why Business Intelligence? <ul><li>All companies now recognize the need for BI </li></ul><ul><li>Information is a weapon that both large and small companies use to better understand their customer, competitors, and marketplace </li></ul><ul><li>Making poorly informed decisions can be disastrous </li></ul>
  6. 6. Overview of Most BI Frameworks OLTP Files/XML Log Files Operational Source Data Staging or ODS ETL Final ETL Reporting, BI, Notification Layer Ad-Hoc Dashboards Reports Notifications Users Staging Area Data Warehouse Warehouse Archive Purge/Archive Data Warehouse and Metadata Management
  7. 7. Simple Reporting Databases OLTP Database Read Shard One Reporting Database Application Servers End Users ETL Data Archiving Link Replication
  8. 8. Building the Right Technical Foundation
  9. 9. What is the Key Component for Success? In other words, what you do with your MySQL Server – in terms of physical design, schema design, and performance design – will be the biggest factor on whether a BI system hits the mark… * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *
  10. 10. What Technology Decisions are Being Made? * Philip Russom, “Next Generation Data Warehouse Platforms”, TDWI, 2009. *
  11. 11. What General MySQL Design Decisions Help Success?
  12. 12. First – Get/Use a Modeling Tool
  13. 13. Horizontal Partitioning Model
  14. 14. Read Sharding / Horizontal Partitioning
  15. 15. Vertical Partitioning Model
  16. 16. General List of Top BI Design Decisions <ul><li>Storage Engine Selection </li></ul><ul><li>Physical Table/Index Partitioning </li></ul><ul><li>Indexing Creation and Placement </li></ul><ul><li>Set proper amounts for memory caches, etc. </li></ul><ul><li>Row vs. Column Engine / Database </li></ul>
  17. 17. Core BI Features for MySQL <ul><li>No practical storage limits (1 tablespace=110TB) </li></ul><ul><li>Automatic storage management </li></ul><ul><li>ANSI-SQL support for all datatypes (including BLOB and XML) </li></ul><ul><li>Data/Index partitioning (range, hash, key, list, composite) </li></ul><ul><li>Built-in Replication </li></ul><ul><li>Main memory tables (for dimension tables) </li></ul><ul><li>Variety of indexes (b-tree, fulltext, clustered, hash, GIS) </li></ul><ul><li>Multiple-configurable data/index caches </li></ul><ul><li>Pre-loading of index data into index caches </li></ul><ul><li>Unique query cache (caches result set + query; not just data) </li></ul><ul><li>Parallel data load (5.1 and higher – multiple files) </li></ul><ul><li>Multi-insert DML </li></ul><ul><li>Data compression (depends on engine) </li></ul><ul><li>Read-only tables </li></ul><ul><li>Fast connection pooling </li></ul><ul><li>Cost-based optimizer </li></ul><ul><li>Wide platform support </li></ul>
  18. 18. Storage Engines Internal to MySQL MyISAM Archive Memory CSV <ul><li>High-speed query/insert engine </li></ul><ul><li>Non-transactional, table locking </li></ul><ul><li>Good for data marts, small warehouses </li></ul><ul><li>Compresses data by up to 80% </li></ul><ul><li>Fastest for data loads </li></ul><ul><li>Only allows inserts/selects </li></ul><ul><li>Good for seldom accessed data </li></ul><ul><li>Main memory tables </li></ul><ul><li>Good for small dimension tables </li></ul><ul><li>B-tree and hash indexes </li></ul><ul><li>Comma separated values </li></ul><ul><li>Allows both flat file access and editing as well as SQL query/DML </li></ul><ul><li>Allows instantaneous data loads </li></ul>Also:Merge for pre-5.1 partitioning
  19. 19. Partitioning and Performance (5.1+) <ul><li>mysql> CREATE TABLE part_tab </li></ul><ul><li>-> ( c1 int ,c2 varchar(30) ,c3 date ) </li></ul><ul><li>-> PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995), </li></ul><ul><li>-> PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) , </li></ul><ul><li>-> PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) , </li></ul><ul><li>-> PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) , </li></ul><ul><li>-> PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) , </li></ul><ul><li>-> PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010), </li></ul><ul><li>-> PARTITION p11 VALUES LESS THAN MAXVALUE ); </li></ul><ul><li>mysql> create table no_part_tab (c1 int,c2 varchar(30), c3 date); </li></ul><ul><li>*** Load 8 million rows of data into each table *** </li></ul><ul><li>mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; </li></ul><ul><li>+----------+ </li></ul><ul><li>| count(*) | </li></ul><ul><li>+----------+ </li></ul><ul><li>| 795181 | </li></ul><ul><li>+----------+ </li></ul><ul><li>1 row in set (38.30 sec) </li></ul><ul><li>mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'; </li></ul><ul><li>+----------+ </li></ul><ul><li>| count(*) | </li></ul><ul><li>+----------+ </li></ul><ul><li>| 795181 | </li></ul><ul><li>+----------+ </li></ul><ul><li>1 row in set (3.88 sec) </li></ul>90% Response Time Reduction
  20. 20. Index Creation and Placement <ul><li>If query patterns are known and predictable, and data is relatively static, then indexing isn’t that difficult </li></ul><ul><li>If the situation is a very ad-hoc environment, indexing becomes more difficult. Must analyze SQL traffic and index the best you can </li></ul><ul><li>Over-indexing a table that is frequently loaded / refreshed / updated can severely impact load and DML performance. Test dropping and re-creating indexes vs. doing in-place loads and DML. Realize, though, any queries will be impacted from dropped indexes </li></ul><ul><li>Index maintenance (rebuilds, etc.) can cause issues in MySQL (locking, etc.) </li></ul><ul><li>Remember some storage engines don’t support normal indexes (Archive, CSV) </li></ul>
  21. 21. Row vs. Column Engines / Databases
  22. 22. Column vs. Row Orientation A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…
  23. 23. Why a Column Database? <ul><li>Column databases only read the columns needed to satisfy a query vs. full rows </li></ul><ul><li>If you are only selecting a subset of columns from a table and / or are using very wide tables, column DB’s are a great choice for BI </li></ul><ul><li>Column databases (most of them…) remove the need for indexing because the column is the index </li></ul><ul><li>Column databases automatically eliminate unnecessary I/O both logically and physically, so they do away with partitioning needs too as well as materialized views, etc. </li></ul><ul><li>As a rule of thumb, column databases provide 5-10x (or more) the query performance of legacy RDBMS’s </li></ul>
  24. 24. Why a Column Database? &quot;If you're bringing back all the columns, a column-store database isn't going to perform any better than a row-store DBMS, but analytic applications are typically looking at all rows and only a few columns. When you put that type of application on a column-store DBMS, it outperforms anything that doesn't take a column-store approach .&quot; - Donald Feinberg, Gartner Group
  25. 25. Why Not a Column Database? <ul><li>If you routinely have SELECT * queries or queries that request the majority of columns in a table </li></ul><ul><li>If you constantly are doing lots of singleton inserts and deletes. As these are row-based operations they will normally run somewhat slower on a column DB than a row-oriented DB (more block touches are needed). Updates tend to run OK as they are a column operation </li></ul><ul><li>If you want to do pure OLTP work. Some column DB’s are transactional (so data integrity is ensured), but they are not suited for straight OLTP work </li></ul><ul><li>If you have a small database: such a DB eclipses the benefit column databases offer over row DB’s </li></ul>
  26. 26. What is Calpont’s InfiniDB? InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one box or using commodity machines in a scale out configuration. Scale up Scale Out
  27. 27. InfiniDB vs. a Leading Row RDBMS 2 TB’s of raw data; 16 CPU 16GB RAM 14 SAS 15K RPM RAID-0 512MB Cache
  28. 28. Percona’s Test of Column Databases 610 GB of raw data; 8 Core Machine
  29. 29. Calpont Solutions Calpont Analytic Database Server Editions Calpont Analytic Database Solutions InfiniDB Community Server Column-Oriented Multi-threaded Terabyte Capable Single Server InfiniDB Enterprise Server Scale out / Parallel Processing Automatic Failover InfiniDB Enterprise Solution Monitoring 24x7 Support Auto Patch Management Alerts & SNMP Notifications Hot Fix Builds Consultative Help
  30. 30. InfiniDB Community & Enterprise Server Comparison Yes No Multi-Node, MPP scale out capable w/ failover Formal Production Support Forums Only Support Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes InfiniDB Community Yes INSERT/UPDATE/DELETE (DML) support Yes Transaction support (ACID compliant) Yes MySQL front end Yes Logical data compression Yes High-Speed bulk loader w/ no blocking queries while loading Yes Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Crash-recovery Yes Terabyte database capable Yes High concurrency supported Yes Alter Table with online add column capability Yes MVCC support – snapshot read (readers don’t block writers) Yes Automatic vertical (column) and logical horizontal partitioning of data Yes No indexing necessary Yes Column-oriented InfiniDB Enterprise Core Database Server Features
  31. 31. For More Information <ul><li>Download InfiniDB Community Edition </li></ul><ul><li>Download InfiniDB documentation </li></ul><ul><li>Read InfiniDB technical white papers </li></ul><ul><li>Read InfiniDB intro articles on MySQL dev zone </li></ul><ul><li>Visit InfiniDB online forums </li></ul><ul><li>Trial the InfiniDB Enterprise Edition: </li></ul>