Building High Performance MySql Query Systems And Analytic Applications

Building High-Performance MySQL Query Systems and Analytic Applications Robin Schumacher

Agenda ,[object Object],[object Object],[object Object],[object Object]

What are we talking about? ,[object Object],[object Object],[object Object]

Reporting and Business Intelligence DB’s ,[object Object],[object Object],[object Object]

Data Warehouses/Marts/Analytic DB’s OLTP Files/XML Log Files Operational Source Data Staging or ODS ETL Final ETL Reporting, BI, Notification Layer Ad-Hoc Dashboards Reports Notifications Users Staging Area Data Warehouse Warehouse Archive Purge/Archive Data Warehouse and Metadata Management

Reporting Databases OLTP Database Read Shard One Reporting Database Application Servers End Users ETL Data Archiving Link Replication

Application Sharding / Partitioning ,[object Object],[object Object],[object Object]

What are the core rules to follow in order to avoid anxiety over building fast read-intensive, reporting, and analytic databases?

#1 Only Read the Data You Need ,[object Object],[object Object],[object Object],[object Object]

#2 Exploit Modern Hardware ,[object Object],[object Object],[object Object],[object Object]

#3 Divide and Conquer ,[object Object],[object Object],[object Object],[object Object]

#4 Scale both I/O and User Connections ,[object Object],[object Object],[object Object]

#5 Provide Transparent Expansion and Failover ,[object Object],[object Object],[object Object]

#6 Load New Data with Minimal Impact ,[object Object],[object Object],[object Object],[object Object]

#7 Quickly Troubleshoot Poor Read Performance ,[object Object],[object Object],[object Object]

Good suggestions, but how can I practically do all these things…?

What is Calpont’s InfiniDB? InfiniDB is an open source, column-oriented database architected to handle data warehouses, data marts, analytic/BI systems, and other read-intensive applications. It delivers true scale up (more CPU’s/cores, RAM) and massive parallel processing (MPP) scale out capabilities for MySQL users. Linear performance gains are achieved when adding either more capabilities to one box or using commodity machines in a scale out configuration. Scale up Scale Out

#1 Only Read the Data You Need ,[object Object],[object Object],[object Object],[object Object],[object Object],Recommendation : Start using a column-oriented database Caveat : if you are reading all (select *) or most of the columns in a table, then a column database may not be right for your application.

Column vs. Row Orientation A column-oriented architecture looks the same on the surface, but stores data differently than legacy/row-based databases…

#2 Exploit Modern Hardware ,[object Object],[object Object],[object Object],[object Object],Recommendation : Use databases/storage engines that scale up (i.e. use available CPU’s/cores)

InfiniDB Community – Scale Up InfiniDB Community edition is a FOSS, multi-threaded database server that is capable of using a machine’s CPUs/cores to process queries 87% 22.14 164.12 Q3.2 83% 55.04 316.79 Q3.1 87% 15.94 121.33 Q2.3 87% 19.70 151.20 Q2.2 79% 44.65 210.21 Q2.1 Overall Percent Reduction with additional cores InfiniDB 8 cores (elapsed time in seconds) InfiniDB 1 Core (elapsed time in seconds) SSB Query (@100 scale)

#3 Divide and Conquer ,[object Object],[object Object],[object Object],[object Object],Recommendation : Use Scale-Out in addition to Scale-up

InfiniDB Enterprise – Scale Up and Out User Connections User Module 1 User Module n Performance Module 1 Performance Module n Performance Module 2 Shared Storage Database files, System Catalog

#3 Divide and Conquer ,[object Object],[object Object],[object Object],87% 77.74 148.49 297.46 597.97 Q3.2 84% 134.21 316.50 425.25 848.79 Q3.1 87% 51.36 96.03 192.03 386.66 Q2.3 87% 56.41 106.37 214.87 430.25 Q2.2 87% 68.21 129.90 261.35 531.34 Q2.1 Overall Percent Reduction from 1 – 8PM’s 8PM (elapsed time in seconds) 4PM (elapsed time in seconds) 2PM (elapsed time in seconds) 1PM (elapsed time in seconds) SSB Query @1000

#4 Scale both I/O and User Connections Recommendation : Use modular architecture User Connections User Module 1 User Module n Performance Module 1 Performance Module n Performance Module 2 Shared Storage Database files, System Catalog Add more Performance Modules to scale I/O Add more User Modules to scale concurrency

#5 Provide Transparent Expansion and Failover ,[object Object],[object Object],[object Object],[object Object],Recommendation : Use either replication or MPP

#5 Provide Transparent Expansion and Failover Cust_id 1-999 Cust_id 1000-1999 Cust_id 2000-2999 Sharding Architecture MySQL Replication Web/App Servers Browsers

#5 Provide Transparent Expansion and Failover User Connections User Module 1 User Module n Performance Module 1 Performance Module n Performance Module 2 Shared Storage Database files, System Catalog If one Performance Module fails, traffic resumes with the remaining nodes User queries can be redirected to other User Modules if one fails

#6 Load New Data with Minimal Impact ,[object Object],[object Object],[object Object],[object Object],Recommendation : Use two-step ETL feed with non-blocking load utilities and/or MVCC database engine

#6 Load New Data with Minimal Impact OLTP Files/XML Log Files Operational Source Data Staging or ODS ETL High-speed Load Utility Ad-Hoc Dashboards Reports Notifications Users Staging Area Data Warehouse Data Warehouse and Metadata Management

#7 Quickly Troubleshoot Poor Read Performance ,[object Object],[object Object],[object Object],[object Object],Recommendation : Proactively use load testing; reactively use SQL analysis and tracing

InfiniDB Extent Map – No Indexing Needed If a column WHERE filter of “COL1 BETWEEN 220 AND 250 AND COL2 < 10000” is specified, InfiniDB will eliminate extents 1, 2 and 4 from the first column filter, then, looking at just the matching extents for COL2 (i.e. just extent 3), it will determine that no extents match and return zero rows without doing any I/O at all. … Extent Map Also enables logical range partitioning of data… Ext 2 Min 101 Max 200 Ext 3 Min 201 Max 300 Ext 4 Min 301 Max 400 Col1 Ext 1 Min 1 Max 100 Ext 2 Min 10100 Max 20000 Ext 3 Min 20100 Max 30000 Ext 4 Min 30100 Max 40000 Col2 Ext 1 Min 100 Max 10000

Summary Provides both diagnostic and tracing tools; no major design tuning efforts Use load testing and SQL analysis tools Method for troubleshooting poor read performance Has high-speed loader with no blocking and MVCC Use two-step ETL and bulk load process Load data with minimal impact Does transparent failover for I/O and manual for connectivity Use replication and load balancers Provide transparent expansion and failover Modular architecture for scaling both concurrency and I/O Application partition Scale concurrency and I/O Supports MPP scale out Spread load via replication or MPP Divide and Conquer Is multi-threaded and uses multiple CPUs / Cores Use DB’s/storage engines that are multi-threaded Exploit modern hardware Is column-oriented Use column database Only read the data you need InfiniDB General Technique Recommendation

Calpont Solutions Calpont Analytic Database Server Editions Calpont Analytic Database Solutions InfiniDB Community Server Column-Oriented Multi-threaded Terabyte Capable Single Server InfiniDB Enterprise Server Scale out / Parallel Processing Automatic Failover InfiniDB Enterprise Solution Monitoring 24x7 Support Auto Patch Management Alerts & SNMP Notifications Hot Fix Builds Consultative Help

InfiniDB Community & Enterprise Server Comparison Yes No Multi-Node, MPP scale out capable w/ failover Formal Production Support Forums Only Support Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes InfiniDB Community Yes INSERT/UPDATE/DELETE (DML) support Yes Transaction support (ACID compliant) Yes MySQL front end Yes Logical data compression Yes High-Speed bulk loader w/ no blocking queries while loading Yes Multi-threaded engine (queries/writes will use all CPU’s/cores on box) Yes Crash-recovery Yes Terabyte database capable Yes High concurrency supported Yes Alter Table with online add column capability Yes MVCC support – snapshot read (readers don’t block writers) Yes Automatic vertical (column) and logical horizontal partitioning of data Yes No indexing necessary Yes Column-oriented InfiniDB Enterprise Core Database Server Features

For More Information ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],www.infinidb.org

Building High-Performance MySQL Query Systems and Analytic Applications Thanks…!

Building High Performance MySql Query Systems And Analytic Applications

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to Building High Performance MySql Query Systems And Analytic Applications

Similar to Building High Performance MySql Query Systems And Analytic Applications (20)

Recently uploaded

Recently uploaded (20)

Building High Performance MySql Query Systems And Analytic Applications