Conquering "big data": An introduction to shard query


Published on

This talk introduces Shard-Query, an MPP distributed parallel processing middleware solution for MySQL.

Shard-Query is a federation engine which provides a virutal "grid computing" layer on top of MySQL. This can be used to access data spread over many machines (sharded) and also data partitioned in MySQL tables using the MySQL partitioning option. This is similar to using partitions for parallelism with Oracle Parallel Query.

This talk focuses on why Shard-Query is needed, how it works (not detailed) and the best schema to use with it. Shard-Query is designed to scan massive amounts of data in parallel.

Published in: Technology
1 Comment
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Conquering "big data": An introduction to shard query

  1. 1. Conquering “big data”An Introduction toShard-QueryA MPP distributed middleware solution for MySQL databases
  2. 2. Big Data is a buzzword Shard-Query works with big data, but it workswith small data too You don’t have to have big data to have bigperformance problems with queries
  3. 3. Big performance problems MySQL typically has performance problems onOLAP workloads for even tens of gigabytes* ofdata Analytics Reporting Data mining MySQL is generally not scalable*^ for theseworkloads• * By itself. The point of this talk is to show how Shard-Query fixes this :)• ^ Another presentation goes into depth as to why MySQL doesnt scale for OLAP
  4. 4. Not only MySQL has these issues All major open source databases have problemswith these workloads Why? Single threaded queries When all data is in memory, accessing X rows isgenerally X times as expensive as accessing ONE roweven when multiple cpus could be used
  5. 5. MySQL scalability model is good for OLTP MySQL was created at a time when commoditymachines Had a small (usually one) CPU core Had small amounts of memory and limited disk IOPS Managed a small amount of data It did not make sense to code intra-queryparallelism for these servers. They couldn’t takeadvantage of it anyway.
  6. 6. The new age of multi-core“If your time to you is worth saving,then you better start swimming.Or youll sink like a stone.For the times they are a-changing.”Core CoreCore CoreCore CoreCore CoreCPUCore CoreCore CoreCore CoreCore CoreCPUCore CoreCore CoreCore CoreCore CoreCPUCore CoreCore CoreCore CoreCore CoreCPU- Bob Dylan
  7. 7. It is 2013. Still only single threaded queries. Building a multi-threaded query plan is a lotdifferent than building a single threaded queryplan The time investment to build a parallel query interfaceinside of MySQL would be very high MySQL has continued to focus on excellence for OLTPworkloads while leaving the OLAP market untapped Just adding basic subquery options to the optimizer hastaken many years
  8. 8. MySQL scales great for OLTP because MySQL has been improved significantly, especiallyin 5.5 and 5.6 Many small queries are “balanced” over manyCPUs naturally Large memories allow vast quantities of hot data And very fast disk IO means that The penalty for cache miss is lower No seek penalty for SSD especially reduces cost ofconcurrent misses from multiple threads (no headmovement)
  9. 9. But not for OLAP Big queries "peg" one CPU and can use no moreCPU resources (low efficiency queries) Numerous large queries can "starve" smallerqueries This is often when innodb_thread_concurrency needsto be set > 0
  10. 10. But not for OLAP (cont) When the data set is significantly larger thanmemory, single threaded queries often cause thebuffer pool to "churn" While SSD helps somewhat, one thread can not readfrom an SSD at maximum device capacity Disk may be capable of 1000s of MB/sec, but the singlethread is generally limited to <100MB/sec A multi-threaded workload could much better utilize the disk
  11. 11. Response similar to the NoSQL movement Rather than fix the database or build complexsoftware, users just change the underlyingdatabase Many closed source vendors have stepped in andprovided OLAP SQL solutions Hardware: IBM Netezza, Oracle Exadata Software: HP Vertica, Vectorwise, Teradata, Greenplum
  12. 12. Response similar to the NoSQL movement (cont) Or SQL => map/reduce interfaces Apache Hadoop/Apache Hive Impala Map/R Cloudera CDH Google built a SQL interface to BigTable too… Limitations No correlated subqueries for example
  13. 13. What do those map/reduce things do? Split data up over multiple servers (HDFS) During query processing Map (fetch/extract/select/etc) raw data from files ortables on HDFS Write the data into temporary areas Shuffle temporary data to reduce workers Final reduce written Return results
  14. 14. Those sounds expensive… It is (in terms of dollars for closed solutions) It is (in terms of execution time for open solutions) The map is especially expensive when data isunstructured and it must be done repeatedly foreach different query you run
  15. 15. And complicated… You get a whole new toolchain A new set of data management tools A new set of high availability tools And all new monitoring tools to learn!
  16. 16. Even if MySQL supported parallel query:MySQL* doesn’t do distributed queries Those Map/Reduce solutions (and the closedsource databases) can use more than one server! Building a query plan for queries that mustexecute over a sharded data set has additionalchallenges:SELECT AVG(expr)must be computed as:SUM(expr)/COUNT(expr) AS`AVG(expr)`* Again, Shard-Query does. Almost there.Probably the simplest example of a necessary rewrite
  17. 17. MySQL network storage engines Dont these engines claim to be parallel? Fetching of data from remote machines may be done inparallel, but query processing is coordinated by a serialquery thread A sum still has to examine each individual row from everyserver serially Joins are still evaluated serially (in many cases) The engine is parallel, but the SQL layer using theengine is not.
  18. 18. NDB NDB is bad for star schema Dimension table rows are not usually co-located withfact rows. Engine condition pushdown may help somewhat toalleviate network traffic but joins still have to traversethe network which is expensive Aggregation still serial
  19. 19. SPIDER SPIDER is bad for star schema too Nested loops may be very bad for SPIDER and starschema if the fact table isnt scanned first (must useSTRAIGHT_JOIN hint extensively). MRR/BKA in MariaDB might help? Still no parallel aggregation or join.
  20. 20. CONNECT Has ECP No ICP or ability to expose remote indexes Always uses join buffer(BNLJ) or BKAJ Fetches in parallel No parallel join No parallel aggregation
  21. 21. Those are not parallel query solutions Those engines are not OLAP parallel query They are for OLTP lookup and/or filteringperformance. Often cant sort in parallel. They can offer improved performance when largenumbers of rows are filtered from many machinesin parallel When aggregating, a query must return a smallresultset before aggregation for good performance star schema should be avoided
  22. 22. Enter Shard-QueryMassively parallel query execution for MySQL variants
  23. 23. Enter Shard-Query Keep using MySQL Choose a row store like XtraDB, InnoDB or TokuDB* Choose a column store like ICE*, Groonga** Use CSV, TAB, XML, or other data with the CONNECT**engine in MariaDB 10** These engines have not been thoroughly tested* These engines work, but with some limitations due to bugs
  24. 24. Shard-Query connects to 3306… Shard-Query can use any MySQL variant as a datasource You continue to use regular SQL, no map/reduce Is built on MySQL, PHP and Gearman – well proventechnologiesYou probably already know these things.
  25. 25. Shard-Query re-writes SQL Flexible Does not have to re-implement complex SQLfunctionality because it uses SQL directly Hundreds of MySQL functions and features available outof the box Small subset* of functions not available last_insert_id(), get_lock(), etc.*
  26. 26. Shard-Query re-writes SQL Familiar SQL ORDER BY, GROUP BY, LIMIT, HAVING, subqueries, evenWITH ROLLUP, all continue to work as normal Support for all MySQL aggregate functions includingcount(distinct) Aggregation and join happens in parallel*
  27. 27. You dont have to knowPHP to use Shard-Query!Just use SQL
  28. 28. You can still connect to 3306 (and more)! Shard-Query has multiple ways of interacting withyour application The PHP OO API is the underlying interface. The other interfaces are built on it: MySQL Proxy Lua script (virtual database) HTTP or HTTPS web/REST interface Access the database directly from Javascript? Submit Gearman jobs (as SQL) directly from almost anyprogramming language
  29. 29. MySQL Proxy
  30. 30. Web Interface
  31. 31. Command line (with explain plan)echo "select * from (select count(*) from lineorder) sq;"|phprun_query --verboseSQL SET TO SEND TO SHARDS:Array ( [0] => SELECT COUNT(*) AS expr_2942896428 FROM lineorder AS`lineorder` WHERE 1=1 ORDER BY NULL )SENDING GEARMAN SET to: 2 shardsSQL FOR COORDINATOR NODE:SELECT SUM(expr_2942896428) AS `count(*)` FROM`aggregation_tmp_21498632`SQL SET TO SEND TO SHARDS:Array ( [0] => SELECT * FROM ( SELECT SUM(expr_2942896428) AS`count(*)` FROM `aggregation_tmp_21498632` ) AS `sq` WHERE 1=1 )SENDING GEARMAN SET to: 1 shardsSQL TO SEND TO COORDINATOR NODE:SELECT * FROM `aggregation_tmp_88629847`[count(*)] => 1199721041 rows returned Exec time: 0.053546905517578
  32. 32. Shard-Query constructs parallel queries MySQL can’t run a single query in multiple threadsbut it can run multiple queries at once in multiplethreads (with multiple cores) Shard-Query breaks one query into multiplesmaller queries (aka tasks) Tasks can run in parallel on one or more servers
  33. 33. OLAP into OLTP
  34. 34. Partitioning tables for parallelismThis is similar to Oracle Parallel Query
  35. 35. Partitioning splits queries on a single machine Supports partitioning to divide up a table RANGE, LIST and RANGE/LIST COLUMNS over a singlecolumn Each partition can be accessed in parallel as anindividual task
  36. 36. A different way to look at it:You get to move all the pieces at the same timeT1T4T8T32T48T64T1T4T8VERSUSSINGLE THREADED PARALLEL*Small portion of execution is still serial, so speedup wont be quite linear (but should be close)
  37. 37. Sharding
  38. 38. Sharded tables split data over many servers Works similarly to partitioning. You specify a "shard key". This is like apartitioning key, but it applies to ALL tables in theschema. If a table contains the "shard key", then the table isspread over the shards based on the values of thatcolumn Pick a "shard key" with an even data distribution Currently only a single column is supported
  39. 39. Unsharded Tables Tables that dont contain the "shard key" arecalled "unsharded" tables A copy of these tables is replicated on ALL nodes It is a good idea to keep these tables relatively smalland update them infrequently You can freely join between sharded and unshardedtables You can only join between sharded tables when thejoin includes the shard key** A CONNECT or FEDERATED table to a Shard-Query proxy can be used tosupport cross-shard joins. Consider MySQL Cluster for cross-shard joins.
  40. 40. ParallelExecutionShardingand/orPartitionedTablesGearmanShard-QueryRESTProxyPHP OOTask1 Shard1 Partition 1Task2 Shard1 Partition 2Task3 Shard2 Partition 1Task4 Shard2 Partition 2+ + =Data FlowSQLDATA
  41. 41. Sharding for big dataOr how I stopped worrying and learned to scale out the database
  42. 42. You can only scale up so far MySQL still caps out at between 24 and 48 coresthough it continues to improve (5.7 will be thebest one ever?) If you are collecting enough data you willeventually need to use more than one machine toget good performance on queries over a largeportion of the data set
  43. 43. Scale Out – And Up You could choose to use 4 servers with 16 cores or2 servers with 32 cores Usually depends on how large your data set is Keep as much data in memory as possible
  44. 44. Scale Out – And Up In the cloud many small servers can leveragememory more efficiently than a few large ones Run 8 smaller servers with (in aggregate) 16 cores (52 total ECU) [2/per] 136.8GB memory [17.1/per] 3360MB combined local HDD storage [420/per] This is the almost the same price as a single largeSSD based machine 16 cores 64 GB of ram (35 ECU) 2048MB local SSD storage
  45. 45. The large machine had SSD though If the workload is IO bound (working set >128GB) Go with the large machine with 16 cores Very fast IO Getting data into memory so that the CPUs canwork on it is more important Downgrade to smaller machines if the working setshrinks Still partition for parallelism
  46. 46. Scale "in and out" Splitting a shard in Shard-Query is a manual (buteasy) process Only supported when the directory mapper is used mysqldump the data from the shard with the –T option(or use mydumper) Truncate the tables on the old shard Create the tables on the new shard Update the mapping table to split the data Use the Shard-Query loader to load data
  47. 47. Combine with Map/Reduce Use Map/Reduce jobs to extract data from HDFSand write it into ICE Execute high performance low latency MySQLqueries over the data
  48. 48. Combine with Map/Reduce (cont) Make fast insights into smaller amounts of dataextracted from petabyte HDFS data stores Extract a particular year of climate data Or particular cultivars when comparing genomic plantdata Open source ETL tools can automate this process
  49. 49. Performance Examples
  50. 50. Simple In-Memory COUNT(*) query performance on Wikipedia traffic statsWorking set: 128GB of data2.5528580558.06488761313.326974218.5057123225.341401732.9345543240.1901638144.6940.87129.0382018213.2315872296.091397405.4624271526.9528692643.0426209750.45713501002003004005006007008008 PawnsThe KingLinear (8 Pawns)Linear (The King)Days 8 Pawns The King1 2.552858 40.845732 5.090356 81.44573 8.064888 129.03824 10.74412 171.90595 13.32697 213.23166 16.0227 256.36337 18.50571 296.09148 21.02053 336.32859 25.3414 405.462410 29.69324 475.091811 32.93455 526.952912 36.5517 584.827213 40.19016 643.042614 42.75 699.101115 44.69 750.4571Shard-Query is scanning about 1B rows/sec
  51. 51. Star Schema Benchmark – Scale 106 coresPartitioning for single node scaleup6 worker threadsXTRADB
  52. 52. Star Schema Benchmark – Scale 106 coresPartitioning for single node scaleup6 worker threadsXTRADB
  53. 53. Schema Design for Big Data
  54. 54. Best schema – flat tables (no joins) Scale to hundreds of machines with tens tohundreds of terabytes each Dozens or hundreds of columns per table Can use map/reduce when you need to joinbetween sharded tables (Map/R or somethingother than Shard-Query is used for this) Joins to lookup tables can still be done but do sowith care
  55. 55. One table model (flat table, no joins) Great for machine generated data - quintessentialbig data. Call data records (billing mediation and call analysis) Sensor data (Internet of Things) Web logs (Know thy own self before all others) Hit/click information for advertising Energy metering Almost any large open data set
  56. 56. Ideal schema – flat tables (no joins) Why one big table? ICE/IEE ICE and IEE engines are append-only (or append mostly) ICE/IEE knowledge grid can filter out data moreeffectively when all of the filters are placed on a singletable No indexes means that only hash joins or sort/mergejoins can be performed when joining tables
  57. 57. Ideal schema – flat tables (no joins) Insert-only tables are the easiest on which tobuild summary tables Querying is very easy as all attributes are alwaysavailable But all attributes can be overwhelming. Views can be created in this case When named properly the views can be accessed in parallel too
  58. 58. Special view support Shard-Query has special support for treating viewsas partitioned tables* when the views have theprefix v_ followed by the actual table name select * from v_mysql_metrics from all_metrics wherehost_id = 33 and collect_date = 2013-05-27; Joins to these views are supported too Make sure you only use the MERGE algorithm orthis will not work* Shard-Query does not currently parse the underlying SQL for views, so this naming is necessaryto allow Shard-Query to find the partition metadata for the underlying table.
  59. 59. Schema Design for Analytics/BI andData VisualizationSee better results through faster queries
  60. 60. Star Schema Most common BI/analytics table is star schema ora denormalized table (see prev slides) "Fact" (measurement) table is sharded Dimension (lookup) tables are unsharded JOINs between the fact and dimension tables are freelysupported
  61. 61. Star Schema In some cases a dimension might be sharded sharding by date to spread data around evenly by datefor example date_id is in the fact table and in the date dimension table This is safe because you JOIN by the date_id column sharding by customer (SaaS) is also common customer_id in FACT and in dim_customer Safe because join is by customer_id
  62. 62. Star Schema (cont) Shard-Query has experimental STAR optimizersupport Scan dimension tables Push FACT table IN predicates to SQL WHERE clause Eliminate JOIN to dimension tables without projectedcolumns
  63. 63. Other schema types can work too Master/detail relationship Unsharded small lookup tables comment_type mood_type etc The main tables are sharded by blog_id: blog_info blog_posts blog_commentsThese all must contain the "shard key" (blog_id)because they are joined by blog_id, thus blog metadata, commentsand posts must be stored in the same shard for the sameblog.Table relationships can not currently be defined.Some tables (like comments) require minor de-normalization to includethe blog_id column.
  64. 64. Snowflake schema Shard-Query STAR optimizer not yet extended tosnowflake Consider using star schema or flat table instead
  65. 65. Links and other info
  66. 66. Shard-Query
  67. 67. Percona The high performance MySQL and LAMP experts Training - Support - MySQL, MariaDB, and Percona Server too Remote DBA - We wake up so you dont have to Consulting – Is your site slow? We can help. Development services – Somethings broke? We can fixit. We can add or improve features to fit your use case.
  68. 68. Gearman Job process and concurrent workloadmanagement Run one worker per physical CPU (or more if youare IO bound) Add extra loader workers and exec workers ifneeded
  69. 69. Infobright Infobright Community Edition Append only Infobright Enterprise Edition They are both column stores but they arearchitecturally different. IEE offers intra-query parallelism natively whichShard-Query benefits from becauseInfobright does not support partitioning.
  70. 70. TokuDB Compressing row store for big data Doesnt suffer IO penalty when updatingsecondary indexes Variable compression level by library New, so prepare to test thoroughly
  71. 71. Groonga/Mroonga Column store and text search system Supports text and geospatial search Native(column store) or fulltext wrapper aroundInnoDB/MyISAM
  72. 72. Network Engines NDB(MySQL Cluster) SPIDER storage engine CONNECT engine for MariaDB 10.x alpha
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.