Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 How MariaDB Server Scales with Spider

260 views

Published on

M|18 How MariaDB Server Scales with Spider

Published in: Data & Analytics
  • Be the first to comment

M|18 How MariaDB Server Scales with Spider

  1. 1. How MariaDB Server Scales with Spider Jacob Mathew Senior Software Engineer, MariaDB Kentoku Shiba Author of Spider, Spiral Arms
  2. 2. Spider ● What is Spider? ● Why should I use Spider? ● Sharding with Spider. ● Redundant Data. ● Data Consistency. ● Getting Started with Spider. ● What’s New in Spider? ● What’s Ahead for Spider?
  3. 3. What is Spider?
  4. 4. What is Spider? ● Storage engine plugin. ○ Spider doesn’t itself store data. ● Manage storage and retrieval of data stored using other storage engines. ● Sharding solution that stores data remotely on other servers. ● Partition tables using the Partition Engine. ● View the data as if it is local.
  5. 5. Why Should I Use Spider?
  6. 6. Why Should I Use Spider? ● Very large tables. ● Volume of data is growing. ● Lots of concurrent operations on the data. ● Few or no application code changes required.
  7. 7. Why Should I Use Spider? ● Spider pushes down query information. ● Reduces amount of result data returned by data nodes. ● Parallel execution. ● Data consistency. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-F G-L M-R S-Z
  8. 8. Sharding with Spider
  9. 9. Sharding with Spider 1. Receive a request. 2. Execute the request. a. Distribute SQL to data nodes. b. Receive and consolidate results from data nodes. 3. Send reply. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a 1 3 2a 2b A-F G-L M-R S-Z
  10. 10. Sharding with Spider ● Partition Engine ○ Supports all partitioning rules. ■ Range. ■ Key. ■ Hash. ■ List. ● CREATE SERVER ○ Comment for connection details. ○ Useful when each data node has different connection information.
  11. 11. Sharding with Spider Spider cluster pushdown ● Engine condition. ● Index hints. ● Join. ● Aggregation. ● Direct update/delete.
  12. 12. Redundant Data
  13. 13. Redundant Data ● Full copy of the table on each data node. ● For SELECTs, Spider performs load balancing and chooses the data node. ● INSERTs, UPDATEs and DELETEs are parallelized to the data nodes. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-Z A-Z A-Z A-Z
  14. 14. Data Consistency
  15. 15. Data Consistency ● Data needs to be written to multiple data nodes. ● Spider uses 2-phase commit. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-F G-L M-R S-Z
  16. 16. Getting Started with Spider
  17. 17. Getting Started with Spider 1. Get MariaDB. a. Spider is bundled with MariaDB. 2. Install the database. a. mysql_install_db 3. Start MariaDB server. 4. Install Spider engine. a. mysql < scripts/install_spider.sql 5. CREATE TABLE with options to use Spider.
  18. 18. Getting Started with Spider On the Data Node: CREATE TABLE r_table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100)) ENGINE=innodb DEFAULT CHARSET=UTF8;
  19. 19. Getting Started with Spider On the Spider Node: CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100)) ENGINE=spider DEFAULT CHARSET=UTF8 COMMENT ‘table "r_table_a", database "test", port "3306", host "<host name of data node>", user "<user name for data node>", password "<password for user>"’;
  20. 20. Getting Started with Spider Omit column definitions on the Spider Node: CREATE TABLE table_a ENGINE=spider DEFAULT CHARSET=UTF8 COMMENT ‘table "r_table_a", database "test", port "3306", host "<host name of data node>", user "<user name for data node>", password "<password for user>"’;
  21. 21. Getting Started with Spider CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100)) ENGINE=spider DEFAULT CHARSET=UTF8 COMMENT ‘table "r_table_a", database "test", port "3306", user "<user name for data node>", password "<password for user>"’ PARTITION BY RANGE(c1) (PARTITION p1 VALUES LESS THAN (100000) COMMENT 'host "h1"', PARTITION p2 VALUES LESS THAN (200000) COMMENT 'host "h2"', PARTITION p3 VALUES LESS THAN (300000) COMMENT 'host "h3"', PARTITION p4 VALUES LESS THAN MAXVALUE COMMENT 'host "h4"'); Sharding on the Spider Node
  22. 22. Getting Started with Spider CREATE SERVER server_1 FOREIGN DATA WRAPPER mysql OPTIONS HOST 'host name of data node', DATABASE 'test', USER 'user name for data node', PASSWORD 'password for data node', PORT 3306; CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100)) ENGINE=spider DEFAULT CHARSET=UTF8 COMMENT ‘table "r_table_a", server "server_1"’; CREATE SERVER for connection information on the Spider Node
  23. 23. Getting Started with Spider CREATE SERVER server_1 FOREIGN DATA WRAPPER mysql OPTIONS HOST 'host name of data node 1', DATABASE 'test', USER 'user name for data node 1', PASSWORD 'password for data node 1', PORT 3306; CREATE SERVER server_2 FOREIGN DATA WRAPPER mysql OPTIONS HOST 'host name of data node 2', DATABASE 'test', USER 'user name for data node 2', PASSWORD 'password for data node', PORT 3306; CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100)) ENGINE=spider DEFAULT CHARSET=UTF8 COMMENT ‘table "r_table_a"’ PARTITION BY RANGE(c1) (PARTITION p1 VALUES LESS THAN (200000) COMMENT 'server "server_1"', PARTITION p2 VALUES LESS THAN MAXVALUE COMMENT 'server "server_2"'); CREATE SERVER for shard connection information on the Spider Node
  24. 24. What’s New in Spider?
  25. 25. What’s New in Spider? ● Support in the Partition Engine for additional features. ○ Engine Condition pushdown pushes down to the data nodes. ○ Multi range read. ○ Full Text search. ○ Auto-Increment data type. ● Direct aggregation of min, max, avg, count, sum ● Direct update/delete. ● Direct join. ● Options to log ○ Result errors. ○ Stored Procedure Queries. ● Contributions from Tencent.
  26. 26. What’s New in Spider? Direct Aggregation ● Aggregation is pushed down to the data nodes: min, max, avg, count, sum. ● Aggregation results are returned by the data nodes. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-F G-L M-R S-Z
  27. 27. What’s New in Spider? Direct Update/Delete ● Entire update/delete operation is pushed down to the data nodes. ● Update/delete executed as a single cluster operation instead of one row at a time. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-F G-L M-R S-Z
  28. 28. What’s New in Spider? Direct Join ● Join is pushed down to the data nodes. ● Join results are consolidated by the Spider node. SQL Client Data Node MariaDB Spider Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a Data Node MariaDB table_a A-F G-L M-R S-Z
  29. 29. What’s New in Spider? ● Force pushdown of index hints. ● Optimization for LIMIT. ● Added max connection pool size feature to Spider. ● Bug fixes. Contributions from Tencent
  30. 30. What’s Ahead for Spider?
  31. 31. What’s Ahead for Spider? ● Vertical Partition (VP) Engine. ○ Multi-dimensional sharding. ○ VP merges multiple child tables into a single view. ○ VP efficiently chooses child tables for each query.
  32. 32. Vertical Partitioning with VP SQL Client Spider / VP Node MariaDB table_a table_a_ca table_a_cb Partition by column col_b Partition by column col_a CREATE TABLE table_a_ca ( col_a int,, col_b date, col_c int, primary key(col_a)) ENGINE=innodb partition by ... CREATE TABLE table_a_cb ( col_a int, col_b date, col_c int, key idx1(col_a), key idx2(col_b)) ENGINE=innodb partition by ...
  33. 33. Vertical Partitioning with VP SQL Client Spider / VP Node MariaDB table_a table_a_ca table_a_cb Partition by column col_b Partition by column col_a SELECT … FROM table_a WHERE col_a = 1
  34. 34. Vertical Partitioning with VP SQL Client Spider / VP Node MariaDB table_a table_a_ca table_a_cb Partition by column col_b Partition by column col_a SELECT … FROM table_a WHERE col_b = ‘2016-01-01’
  35. 35. Vertical Partitioning with VP ● When sharding Spider tables which have different partitioning rules for VP child tables, VP chooses sharded Spider tables efficiently.
  36. 36. Vertical Partitioning with VP SELECT … FROM table_a WHERE col_a = 1 SQL Client Spider / VP Node MariaDB Partition by column col_a Data Node MariaDB table_a_cb A-L Data Node MariaDB table_a_cb M-Z Data Node MariaDB table_a_ca A-L Data Node MariaDB table_a_ca M-Z table_a table_a_ca table_a_cb Partition by column col_b
  37. 37. Vertical Partitioning with VP SELECT … FROM table_a WHERE col_b = ‘2016-01-01’ SQL Client Spider / VP Node MariaDB Partition by column col_a Data Node MariaDB table_a_cb A-L Data Node MariaDB table_a_cb M-Z Data Node MariaDB table_a_ca A-L Data Node MariaDB table_a_ca M-Z table_a table_a_ca table_a_cb Partition by column col_b
  38. 38. Thank you!

×