Shard-Query, an MPP database for the cloud using the LAMP stack
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Shard-Query, an MPP database for the cloud using the LAMP stack

on

  • 355 views

This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEM

This combined #SFMySQL and #SFPHP meetup talked about Shard-Query. You can find the video to accompany this set of slides here: https://www.youtube.com/watch?v=vC3mL_5DfEM

Statistics

Views

Total Views
355
Views on SlideShare
348
Embed Views
7

Actions

Likes
1
Downloads
4
Comments
0

2 Embeds 7

http://www.slideee.com 6
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Shard-Query, an MPP database for the cloud using the LAMP stack Presentation Transcript

  • 1. Shard-Query AN MPP DATABASE FOR THE CLOUD USING THE LAMP STACK
  • 2. Introduction Presenter • Justin Swanhart • Principal Support Engineer at Percona • Previously a trainer and consultant at Percona too Developer • Swanhart-tools • Shard-Query – MPP sharding middleware for MySQL • Flexviews – Materialized views (fast refresh) for MySQL • bcmath UDF – arbitrary precision math for MySQL
  • 3. Intended Audience • MySQL users with data too large to query efficiently using a single machine • Big Data • Analytics / OLAP • User generated content analysis • People interested in distributed database processing
  • 4. Terms
  • 5. MPP – Massively Parallel Processing • An MPP system is a system that can process a SQL statement in parallel on a single machine or even many machines • A collection of machines is often called a Grid • MPP is also sometimes called Grid Computing
  • 6. MPP (cont) • Not many open source databases (none?) support MPP • Community editions of closed source offerings are limited • Some closed source databases include Vertica, Greenplum, Redshift
  • 7. The Cloud • Managed collection of virtual servers • Easy to add servers on demand • Ideal for a federated, distributed database grid • Easy to “scale up” by moving to a VM with more cores • Easy to “scale out” by adding machines • Amazon is one of the most popular cloud environments
  • 8. LAMP stack • Linux • Amazon Linux • RHEL • Ubuntu LTS, etc. • Apache Web Server • Most popular web server on the planet • MySQL • The world’s most popular open source database • PHP • High level language makes development easier
  • 9. Database Middleware • A piece of software that sits between an end-user application and the database • Operates on the queries submitted by the application, then returns the results to the application • Usually a proxy of some sort • MySQL proxy is the open source user configurable proxy for MySQL • Supports Lua scripts which intercept queries • Shard-Query can use MySQL Proxy out of the box
  • 10. Message Queue / Job Server • Accepts jobs or messages and places them in a queue • A worker reads jobs/messages from the queue and acts on them • Offers support for asynchronous jobs • Gearman • My job server of choice for PHP • Has two different PHP interfaces (pear and pecl) • SQ comes bundled with a modified version* of the pear interface • Excellent integration with MySQL as well (UDF) * Removes warnings triggered by modern PHP strict mode
  • 11. Sharding • It is a short for Shared Nothing • Means splitting up your data onto more than one machine • Tables that are split up are called sharded tables • Lookup tables are not sharded. In other words, they must be duplicated on all nodes • Shard-Query supports directory based or hash based sharding
  • 12. Shard mapper • Shard-Query supports DIRECTORY and HASH mapping out of the box • DIRECTORY based sharding allows you to add or remove shards from the system, but lookups may go over the network, reducing performance* compared to HASH mapping • HASH based sharding uses a hash algorithm to balance rows over the sharded database. However, since a HASH algorithm is used, the number of database shards can not change after initial data loading. * But only for queries like “select count(*) from table where customer_id = 50”
  • 13. What is “big data” Most machine generated data • Line order information for a large organization like Wal-Mart™ • Any data so large that you can’t effectively operate on it on one machine • For example, an important query that needs to run daily executes in greater than 24 hours. It is impossible to meet the daily goal unless you can find a way to make the query execute faster. • These kind of problems can happen on relatively small amounts of data (tens of gigabytes)
  • 14. Analytics(OLAP) versus OLTP • OLTP is focused on short lived small transactions that read or write small amounts of data • OLAP is focused on bulk loading and reading large amounts of data in a single query. • Aggregation queries are OLAP queries • Shard-Query is designed for analytics (OLAP) not OLTP • must parse all commands sent to it (and make multiple round trips) • Minium query time of around 20ms
  • 15. PROBLEM: Single Threaded Queries THE BIGGEST BOTTLENECK IN ANALYTICAL QUERIES IS THE SPEED OF A SINGLE CORE
  • 16. Single thread queries in the database • MySQL, PostgreSQL, Firebird and all other major open source databases have single threaded queries • This means that a single query can only ever utilize the resources of a single core • As the data size grows, analytical queries get slower and slower • In memory, as the data grows the speed decreases because the data is accessed in a single query • As the number of rows to be examined increases, performance decreases
  • 17. Why single threaded • MySQL is optimized for getting small amounts of data quickly(OLTP) • It was created at a time when having more than one CPU was not common • Adding parallelism now is a very complex task, particularly since MySQL supports multiple storage engines • So adding parallel query is not a high priority (not even on the roadmap) • Designed to run LOTS of small queries simultaneously, not one big query
  • 18. Single Threading – bad for IO • If the data set is significantly larger than memory, single threaded queries often cause the buffer pool to "churn“ • For example, small lookup tables can easily be pushed out of the buffer pool, resulting in frequent IO to look up values • While SSD may helps somewhat, one database thread can not read from an SSD at maximum device capacity • While the disk may be capable of 1000+ MB/sec, a single thread is generally limited to <100MB/sec (usually 30-40) • This is because a single thread shares doing IO AND running the query on one CPU (MySQL does not use read threads for queries)
  • 19. The OLAP Example • A large company maintains a star schema of their sales history for analytics purposes • This company likes to present a sum total of orders for all time on the dashboard • In the beginning the query is very fast • It gets slower, though, as months of data are added and as the business grows, data increases too • Eventually the query takes more than 24 hours to run, which means it can no longer be updated daily • “Drill down” gets slower as data increases
  • 20. What can be done? • Caching? • Materialized views? • Partitioning? • Sharding?
  • 21. Making OLAP more like OLTP! • Shard-Query breaks on big query up into smaller queries that can access the database in parallel • Partitioning and sharding are used to keep data size for any single query to a minimum • If your table has 16 partitions, you can get up to 16 way parallelism • If you also have 2 nodes, you get 32 way parallelism, and so on • You can use multiple database schema on a single server instead (a form of sharding) if you don’t partition your data
  • 22. Shard-Query ADDING PARALLELISM TO QUERIES
  • 23. Sharding Reviewed • A sharded database contains multiple nodes or databases called shards • One physical machine might host many shards • Each shard has identical schema
  • 24. Sharding Reviewed (cont) • The multiple shards function together as one RDBMS system. • You can think of the shards as a big UNION ALL of the data, with only a portion of the data on any one machine • A mechanism must control which server on which to place particular pieces of data. • In Shard-Query a particular column controls data placement – this is called the shard key
  • 25. Sharding – Data distribution • There are usually one or two large tables that are sharded • These are usually called FACT tables • An example might be blogs, blog_posts and blog_comments. All three share a “shard key” of blog_id • Most common case is one big table with smaller lookup tables
  • 26. Sharding Reviewed (cont) • The shard key is very important! • Since a specific column acts as the “shard key”, all sharded tables must contain the shard key. • For example: blog_id might be the shard key. • The rows for a specific blog_id are then located on the same shard in any table that has the blog_id column
  • 27. Optimization - Shard Elimination • When Shard-Query sees an expression on the shard key it looks up* the shard that contains the appropriate data and only sends queries to the necessary shards. • Equality lookup is most efficient, but IN, BETWEEN and other operators are allowed as well • Lookups may not use subqueries (ie, blog_id IN (1,2,3) is okay, not blog_id in (select …)) • This is called “shard elimination” • Shard elimination is analogous to partition elimination. • where blog_id = 10, for example
  • 28. Can Shard-Query help on 1 machine? • Yes! - Use MySQL partitioning on a single machine • Shard-Query can access the partitions of a table in parallel! • This means that if you have many partitions, then Shard-Query can utilize many cores to answer the query Use partitions for parallelism
  • 29. How does that work? • Shard-Query executes an EXPLAIN PLAN on the query • This EXPLAIN PLAN shows the partitions that MySQL will access when running the query • Shard-Query uses the 5.6 PARTITION hint to generate one query per partition • These queries can execute in parallel
  • 30. Sharding can help too • How? • Shard-Query adds parallelism to queries by spreading them over nodes in parallel • Spread the data over four nodes and queries are 4x faster MySQL database shards Shard-Query
  • 31. Sharding + Partitioning is best • Why? • Partition the tables to add parallelism to each node • Use sharding to have multiple nodes working together • 4 nodes with 3 partitions each = 12 way parallelism Shard-Query MySQL database shards Partitions
  • 32. Shard-Query ARCHITECTURE
  • 33. Configuration Repository • Shard-Query stores all configuration information in a MySQL database called the configuration repository • This should be a highly available replication pair (or XtraDB cluster) for HA • Web interface can change the settings • Manual settings changes can be done via SQL • schemata_config table in Shard-Query repository • Makes using Shard-Query easier, especially when using more than one node
  • 34. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture Interfaces Communication Workers Storage Config Repository Configuration Management
  • 35. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture Gearman job server • Provides the parallel mechanism for Shard-Query • Multiple Gearman are supported for HA • Enables Shard-Query to use a map/reduce like architecture • Sends jobs to workers when they arrive at the queue • If all workers are busy the job waits
  • 36. Gearman at a glance Shard-Query OO Store-resultset Loader worker SQ run SQL worker
  • 37. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture Three kinds of workers • loader_worker – Listens for loader jobs and executes them. Used by parallel loader. • shard_query_worker – Listens for SQL jobs, runs the job via Shard-Query and returns the results as JSON. Used by web and proxy interfaces. • store_resultset_worker – Main worker used by Shard-Query. It runs SQL and stores the result in a table.
  • 38. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture PHP Object Oriented Interface • Very simple to use • Constructor parameters not even usually needed • Just one function to run a SQL query and get results back • Complete example comes with Shard-Query as: bin/run_query
  • 39. PHP OO Example (from bin/run_query): $shard_query = new ShardQuery(); $stime = microtime(true); $stmt = $shard_query->query($sql); $endtime = microtime(true); if(!empty($shard_query->errors)) { if(!empty($shard_query->errors)) { echo "ERRORS RETURNED BY OPERATION:n"; print_r($shard_query->errors); } } if(is_resource($stmt) || is_object($stmt)) { $count=0; while($row = $shard_query->DAL->my_fetch_assoc($stmt)) { print_r($row); ++$count; } echo "$count rows returnedn"; $shard_query->DAL->my_free_result($stmt); } else { if(!empty($shard_query->info)) print_r($shard_query->info); echo "no query resultsn"; } echo "Exec time: " . ($endtime - $stime) . "n"; Simple data access layer comes with Shard-Query Errors are returned as a member of the object Run the query
  • 40. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture Apache web interface • GUI • Easy to set up • Run queries and get results • Serves as an example of using Shard-Query in a web app with asynchronous queries • Submits queries via Gearman • Simple HTTP authentication
  • 41. PHP OO Apache Web Interface MySQL Proxy Gearman Message Queue Worker Worker Worker Worker MySQL database shards Shard-Query Architecture MySQL Proxy Interface • LUA script for MySQL Proxy • Supports most SHOW commands • Intercepts queries, and sends them to Shard-Query using the MySQL Gearman UDF • Serves as another example of using Gearman to execute queries. • Behaves slightly differently than MySQL for some commands
  • 42. Query submitted SQL is parsed Query rewrite for parallelism yields multiple queries Gearman Jobs (map/combine) Final Aggregation (reduce) Return result Shard-Query Data Flow Map/reduce like workflow
  • 43. Query submitted SQL is parsed Query rewrite for parallelism yields multiple queries Gearman Jobs (map/combine) Final Aggregation (reduce) Return result Shard-Query Data Flow
  • 44. SQL Parser • Find it at http://github.com/greenlion/php-sql-parser • Supports • SELECT/INSERT/UPDATE/DELETE • REPLACE • RENAME • SHOW/SET • DROP/CREATE INDEX/CREATE TABLE • EXPLAIN/DESCRIBE Used by SugarCRM too, as well as other open source projects.
  • 45. Query submitted SQL is parsed Query rewrite for parallelism yields multiple queries Gearman Jobs (map/combine) Final Aggregation (reduce) Return result Shard-Query Data Flow
  • 46. Query Rewrite for parallelism • Shard-Query has to manipulate the SQL statement so that it can be executed over more than on partition or machine • COUNT() turns into SUM of COUNTs from each query • AVG turns into SUM and COUNT • SEMI-JOIN is turned into a materialized join • STDDEV/VARIANCE are rewritten as well use the sum of squares method • Push down LIMIT when possible
  • 47. Query Rewrite for parallelism (cont) • Because lookup tables are duplicated on all shards, the query executes in a shared-nothing way • All joins, filtering and aggregation are pushed down • Mean very little data must flow between nodes in most cases • High performance • Meets or beats Amazon Redshift in testing at 200GB of data
  • 48. Query submitted SQL is parsed Query rewrite for parallelism yields multiple queries Gearman Jobs (map/combine) Final Aggregation (reduce) Return result Shard-Query Data Flow
  • 49. Map/Combine • The store_resultset gearman worker runs SQL and stores the result in a table • To keep the number of rows in the table (and the time it takes to aggregate results in the end) small, an INSERT … ON DUPLICATE KEY UPDATE (ODKU) statement is used when inserting the rows • There is a UNIQUE KEY over the GROUP BY attributes to facilitate the upsert
  • 50. Query submitted SQL is parsed Query rewrite for parallelism yields multiple queries Gearman Jobs (map/combine) Final Aggregation (reduce) Return result Shard-Query Data Flow
  • 51. Final aggregation • Shard-Query has to return a proper result, combining the results in the result table together to return the correct answer • Again, for example COUNT must be rewritten as SUM to combine all the counts (from each shard) in the result table • Aggregated result is returned to the client
  • 52. Shard-Query Flow as SQL [justin@localhost bin]$ ./run_query --verbose select count(*) from lineorder; Shard-Query optimizer messages: SQL TO SEND TO SHARDS: Array ( [0] => SELECT COUNT(*) AS expr_2913896658 FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 [1] => SELECT COUNT(*) AS expr_2913896658 FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 [2] => SELECT COUNT(*) AS expr_2913896658 FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 [3] => SELECT COUNT(*) AS expr_2913896658 FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 ) SQL TO SEND TO COORDINATOR NODE: SELECT SUM(expr_2913896658) AS ` count ` FROM `aggregation_tmp_58392079` Array ( [count ] => 0 ) 1 rows returned Exec time: 0.03083610534668 Initial query Query rewrite / map Final aggregation / reduce Final result
  • 53. Map/Combine example select LO_OrderDateKey, count(*) from lineorder group by LO_OrderDateKey; Shard-Query optimizer messages: * The following projections may be selected for a UNIQUE CHECK on the storage node operation: expr$0 * storage node result set merge optimization enabled: ON DUPLICATE KEY UPDATE expr_2445085448=expr_2445085448 + VALUES(expr_2445085448) SQL TO SEND TO SHARDS: Array ( [0] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM lineorder PARTITION(p0) AS `lineorder` WHERE 1=1 GROUP BY expr$0 [1] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM lineorder PARTITION(p1) AS `lineorder` WHERE 1=1 GROUP BY expr$0 [2] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM lineorder PARTITION(p2) AS `lineorder` WHERE 1=1 GROUP BY expr$0 [3] => SELECT LO_OrderDateKey AS expr$0,COUNT(*) AS expr_2445085448 FROM lineorder PARTITION(p3) AS `lineorder` WHERE 1=1 GROUP BY expr$0 ) SQL TO SEND TO COORDINATOR NODE: SELECT expr$0 AS `LO_OrderDateKey`,SUM(expr_2445085448) AS ` count ` FROM `aggregation_tmp_12033903` GROUP BY expr$0 combine reduce
  • 54. Use cases
  • 55. Machine generated data • Sensor readings • Metrics • Logs • Any large table with short lookup tables Star schema are ideal
  • 56. Call detail records • Shard-Query is used in the billing system of a large cellular provider • CDRs generate a lot of data • Shard-Query includes a fast PERCENTILE function
  • 57. Green energy meter processing • High volume of data means sharding is necessary • With Shard-Query, reporting is possible over all the shards, making queries possible that would not work with Fabric or other sharding solutions • Used in India for reporting on a green power grid
  • 58. Log analysis • Performance logs from a web application for example • Aggregate many different statistics and shard if log volumes are high enough • Search text logs with regular expressions
  • 59. Performance
  • 60. Star Schema Benchmark – SF 20 • 119 million rows of data (12GB) • Infobright Community Database • Only 1st query from each “flight” selected • Unsharded compared to four shards (box has 4 cpu - Amazon m1.xlarge)
  • 61. COLD • MySQL – 35.39s • Shard-Query – 11.62s HOT • MySQL – 10.99s • Shard-Query – 2.95s Query 1 select sum(lo_extendedprice*lo_discount) as revenue from lineorder join dim_date on lo_orderdatekey = d_datekey where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;
  • 62. COLD • MySQL – 34.24s • Shard-Query – 12.74s HOT • MySQL – 12.74s • Shard-Query – 3.26s Query 2 select sum(lo_revenue), d_year, p_brand from lineorder join dim_date on lo_orderdatekey = d_datekey join part on lo_partkey = p_partkey join supplier on lo_suppkey = s_suppkey where p_category = 'MFGR#12' and s_region = 'AMERICA' group by d_year, p_brand order by d_year, p_brand;
  • 63. COLD • MySQL – 27.29s • Shard-Query – 7.97s HOT • MySQL – 18.89 • Shard-Query – 5.06s Query 3 select c_nation, s_nation, d_year, sum(lo_revenue) as revenue from customer join lineorder on lo_custkey = c_customerkey join supplier on lo_suppkey = s_suppkey join dim_date on lo_orderdatekey = d_datekey where c_region = 'ASIA' and s_region = 'ASIA' and d_year >= 1992 and d_year <= 1997 group by c_nation, s_nation, d_year order by d_year asc, revenue desc;
  • 64. COLD • MySQL – 23.02s • Shard-Query – 8.48s HOT • MySQL – 14.77 • Shard-Query – 4.29s Query 4 select d_year, c_nation, sum(lo_revenue - lo_supplycost) as profit from lineorder join dim_date on lo_orderdatekey = d_datekey join customer on lo_custkey = c_customerkey join supplier on lo_suppkey = s_suppkey join part on lo_partkey = p_partkey where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2') group by d_year, c_nation order by d_year, c_nation;