1. How MariaDB Server Scales
with Spider
Jacob Mathew
Senior Software Engineer, MariaDB
Kentoku Shiba
Author of Spider, Spiral Arms
2. Spider
● What is Spider?
● Why should I use Spider?
● Sharding with Spider.
● Redundant Data.
● Data Consistency.
● Getting Started with Spider.
● What’s New in Spider?
● What’s Ahead for Spider?
4. What is Spider?
● Storage engine plugin.
○ Spider doesn’t itself store data.
● Manage storage and retrieval of data stored using other storage engines.
● Sharding solution that stores data remotely on other servers.
● Partition tables using the Partition Engine.
● View the data as if it is local.
6. Why Should I Use Spider?
● Very large tables.
● Volume of data is growing.
● Lots of concurrent operations on the data.
● Few or no application code changes required.
7. Why Should I Use
Spider?
● Spider pushes down query
information.
● Reduces amount of result data
returned by data nodes.
● Parallel execution.
● Data consistency.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-F G-L M-R S-Z
9. Sharding with
Spider
1. Receive a request.
2. Execute the request.
a. Distribute SQL to data
nodes.
b. Receive and consolidate
results from data nodes.
3. Send reply.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
1 3
2a 2b
A-F G-L M-R S-Z
10. Sharding with Spider
● Partition Engine
○ Supports all partitioning rules.
■ Range.
■ Key.
■ Hash.
■ List.
● CREATE SERVER
○ Comment for connection details.
○ Useful when each data node has different connection information.
11. Sharding with Spider
Spider cluster pushdown
● Engine condition.
● Index hints.
● Join.
● Aggregation.
● Direct update/delete.
13. Redundant Data
● Full copy of the table on each
data node.
● For SELECTs, Spider performs
load balancing and chooses the
data node.
● INSERTs, UPDATEs and
DELETEs are parallelized to the
data nodes.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-Z A-Z A-Z A-Z
15. Data Consistency
● Data needs to be written to
multiple data nodes.
● Spider uses 2-phase commit.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-F G-L M-R S-Z
17. Getting Started with Spider
1. Get MariaDB.
a. Spider is bundled with MariaDB.
2. Install the database.
a. mysql_install_db
3. Start MariaDB server.
4. Install Spider engine.
a. mysql < scripts/install_spider.sql
5. CREATE TABLE with options to use Spider.
18. Getting Started
with Spider
On the Data Node:
CREATE TABLE r_table_a
(c1 INT PRIMARY KEY,
c2 VARCHAR(100))
ENGINE=innodb
DEFAULT CHARSET=UTF8;
19. Getting Started
with Spider
On the Spider Node:
CREATE TABLE table_a
(c1 INT PRIMARY KEY,
c2 VARCHAR(100))
ENGINE=spider
DEFAULT CHARSET=UTF8
COMMENT
‘table "r_table_a", database "test",
port "3306",
host "<host name of data node>",
user "<user name for data node>",
password "<password for user>"’;
20. Getting Started
with Spider
Omit column definitions
on the Spider Node:
CREATE TABLE table_a
ENGINE=spider
DEFAULT CHARSET=UTF8
COMMENT
‘table "r_table_a", database "test",
port "3306",
host "<host name of data node>",
user "<user name for data node>",
password "<password for user>"’;
21. Getting Started with Spider
CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100))
ENGINE=spider DEFAULT CHARSET=UTF8
COMMENT
‘table "r_table_a", database "test", port "3306",
user "<user name for data node>",
password "<password for user>"’
PARTITION BY RANGE(c1)
(PARTITION p1 VALUES LESS THAN (100000) COMMENT 'host "h1"',
PARTITION p2 VALUES LESS THAN (200000) COMMENT 'host "h2"',
PARTITION p3 VALUES LESS THAN (300000) COMMENT 'host "h3"',
PARTITION p4 VALUES LESS THAN MAXVALUE COMMENT 'host "h4"');
Sharding on the Spider Node
22. Getting Started with Spider
CREATE SERVER server_1
FOREIGN DATA WRAPPER mysql OPTIONS
HOST 'host name of data node',
DATABASE 'test',
USER 'user name for data node',
PASSWORD 'password for data node',
PORT 3306;
CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100))
ENGINE=spider DEFAULT CHARSET=UTF8
COMMENT ‘table "r_table_a", server "server_1"’;
CREATE SERVER for connection information on the Spider Node
23. Getting Started with Spider
CREATE SERVER server_1 FOREIGN DATA WRAPPER mysql OPTIONS
HOST 'host name of data node 1', DATABASE 'test',
USER 'user name for data node 1', PASSWORD 'password for data node 1', PORT 3306;
CREATE SERVER server_2 FOREIGN DATA WRAPPER mysql OPTIONS
HOST 'host name of data node 2', DATABASE 'test',
USER 'user name for data node 2', PASSWORD 'password for data node', PORT 3306;
CREATE TABLE table_a (c1 INT PRIMARY KEY, c2 VARCHAR(100))
ENGINE=spider DEFAULT CHARSET=UTF8
COMMENT ‘table "r_table_a"’
PARTITION BY RANGE(c1)
(PARTITION p1 VALUES LESS THAN (200000) COMMENT 'server "server_1"',
PARTITION p2 VALUES LESS THAN MAXVALUE COMMENT 'server "server_2"');
CREATE SERVER for shard connection information on the Spider Node
25. What’s New in Spider?
● Support in the Partition Engine for additional features.
○ Engine Condition pushdown pushes down to the data nodes.
○ Multi range read.
○ Full Text search.
○ Auto-Increment data type.
● Direct aggregation of min, max, avg, count, sum
● Direct update/delete.
● Direct join.
● Options to log
○ Result errors.
○ Stored Procedure Queries.
● Contributions from Tencent.
26. What’s New in
Spider?
Direct Aggregation
● Aggregation is pushed down to
the data nodes:
min, max, avg, count, sum.
● Aggregation results are
returned by the data nodes.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-F G-L M-R S-Z
27. What’s New in
Spider?
Direct Update/Delete
● Entire update/delete operation
is pushed down to the data
nodes.
● Update/delete executed as a
single cluster operation instead
of one row at a time.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-F G-L M-R S-Z
28. What’s New in
Spider?
Direct Join
● Join is pushed down to the data
nodes.
● Join results are consolidated by
the Spider node.
SQL Client
Data Node
MariaDB
Spider Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
Data Node
MariaDB
table_a
A-F G-L M-R S-Z
29. What’s New in Spider?
● Force pushdown of index hints.
● Optimization for LIMIT.
● Added max connection pool size feature to Spider.
● Bug fixes.
Contributions from Tencent
33. Vertical Partitioning with VP
SQL Client
Spider / VP Node
MariaDB
table_a
table_a_ca table_a_cb
Partition by
column col_b
Partition by
column col_a
SELECT … FROM table_a WHERE col_a = 1
34. Vertical Partitioning with VP
SQL Client
Spider / VP Node
MariaDB
table_a
table_a_ca table_a_cb
Partition by
column col_b
Partition by
column col_a
SELECT … FROM table_a WHERE col_b = ‘2016-01-01’
35. Vertical Partitioning with VP
● When sharding Spider tables which have different partitioning rules for VP
child tables, VP chooses sharded Spider tables efficiently.
36. Vertical
Partitioning
with VP
SELECT …
FROM
table_a
WHERE
col_a = 1
SQL Client
Spider / VP Node
MariaDB
Partition by
column col_a
Data Node
MariaDB
table_a_cb
A-L
Data Node
MariaDB
table_a_cb
M-Z
Data Node
MariaDB
table_a_ca
A-L
Data Node
MariaDB
table_a_ca
M-Z
table_a
table_a_ca table_a_cb
Partition by
column col_b
37. Vertical
Partitioning
with VP
SELECT …
FROM
table_a
WHERE
col_b =
‘2016-01-01’
SQL Client
Spider / VP Node
MariaDB
Partition by
column col_a
Data Node
MariaDB
table_a_cb
A-L
Data Node
MariaDB
table_a_cb
M-Z
Data Node
MariaDB
table_a_ca
A-L
Data Node
MariaDB
table_a_ca
M-Z
table_a
table_a_ca table_a_cb
Partition by
column col_b