POLARDB for MySQL - Parallel Query

POLARDB for MySQL
Parallel Query
Øystein Grøvlen
Alibaba Cloud

Agenda
• What is Parallel Query
• How to use Parallel Query
• Parallel Query Internals
• Parallel Query Performance
• Future Work

What is Parallel Query?
Parallel Query is an innovative method to accelerate MySQL queries
from Alibaba Cloud.
• Traditionally, 1 MySQL query runs with just 1 thread, and can not take
advantage of multiple cores on modern processors.
• Parallel Query takes advantage of modern processors to distribute
work across many or all available cores:
• 8 parallel threads can be up to 8 times faster.
• 32 parallel threads can be up to 32 times faster

Why Parallel Query?
• 2003: CPUs stopped getting
faster
• 2004-2019 focus on more
cores, sockets.
• PQ lets MySQL take advantage
of last 15 years of progress.

How to Use Parallel Query
Parallel Query runs against your existing InnoDB data.
• No data extraction to another system is required.
• No query modifications are required.
Parallel Query within InnoDB (no extraction needed) is an amazing
feature exclusive to Alibaba Cloud

Query with Parallelism
SELECT count(*) FROM production.product;
Serial execution plan:
• 1
Stream Aggregate: For each of the rows returned by index scan, do the
aggregation.
For the above query, Stream Aggregate operator counts the rows it receives
from the Index Scan operator.
1 active thread
63 idle threads
Thread 1: Scan, Count
SQL
Client

Parallel Execution Plan
Sum
. . .
With 64 parallel
threads, each thread
does < 2% of the work.
SQL
Client

How Parallel Query Works
1. Parallel coordinator can split a table or index scan into equal-size
pieces
2. Each of the worker can execute part of the query plan
3. Gather stream operator is responsible for collecting the
intermediate results from workers

How Parallel Query Works
• Each of the workers write results to their own buffer
Ø threads run without interruption
• Pointers are passed for Merge step
Ø optimized method to hand off data

Parallel Query Internals
Parallel Query uses multiple methods to distribute work among the parallel
threads, including:
In a parallel sequential scan, the data pages for the table will be divided
among the cooperating threads.
In a parallel index operation, the cooperating threads will read a single index
block and will scan and return all records referenced by that block; other
threads can at the same time be returning records from a different index
page. The results of a parallel btree scan are returned in sorted order within
each worker thread.

Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32

Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsInnoDB partitions the B-tree

Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsWorkers see only one partition (at a time)

Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Part. 1 Part. 2 Part. 3 Part. 4 Part. 5 Part. 6
6 partitions

Partitioning
• Server will normally request 100 partitions per worker thread
• “Fast” workers may process more partitions than “slow” workers
• Partitions of more equal size
• When finished with one partition, a worker may be automatically
attached to a new partition.

Parallel Query SORT
SELECT col1, col2, col3 FROM t1 ORDER BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel order by of the data handled by each worker
3. Final merge sort of the results and return to client.
Parallel threads
run local sort
SQL
Client
Merge
Sort
Thread 1: Scan, Sort
. . .

Parallel Query GROUP BY
SELECT col1, col2, SUM(col3) FROM t1 GROUP BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel group by of the data handled by each worker
3. Final merge of the local group by and return results
DISTINCT operation will be similar to GROUP BY.
Parallel threads
run local group
Merge
Groups
Thread 1: Scan, Group
. . .
SQL
Client

Parallel Query Nested-Loops JOIN
SELECT * FROM t1 JOIN t3 ON t1.id = t3. id;
1. Parallel data access (table scan or index) of driving
table
2. Parallel join of the local data handled by each worker
3. Final merge of the and return to client
Parallel scan
and join
Merge
Thread 1: Scan, Join
. . .
SQL
Client

Parallel Query Usage
• To enable parallel execution for a session:
set max_parallel_degree = n
Maximum n worker threads will be used
• MySQL may still decide to not use parallelization. If so, parallel
execution may be forced with
set force_parallel_mode = on

Parallel Query Usage: Hint
• To force parallel query execution for a single query:
SELECT /*+ PARALLEL() */ * FROM ...
• To force the use of a specific number of worker threads, n :
SELECT /*+ PARALLEL(n) */ * FROM ...

Parallel Query Performance
Parallel Query delivers near-perfect
linear acceleration for DBT3 Query 6:
select sum(l_extendedprice * l_discount) as revenue
from lineitem where l_shipdate >= date '1994-01-01’
and l_shipdate < date '1995-01-01’
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24
Tested at 30, 60, 120, and 240 million rows.
Examples:
89 seconds to 3.4 seconds.
177 seconds to 6.3 seconds.

Parallel Query Performance
DBT3 Query 1:
• Scales 29x with 32
worker threads
• Close to linear
scalability
(dashed line)

Why do users care about linear scalability?
Users care about
• Business growth. DB must
deliver stable performance as
business grows
• Faster decisions. Faster
analysis driving faster action.
Faster:
85 seconds to
6 seconds
22.6 seconds
2x data size - 21.6 seconds
4x data size - 21.6 seconds
22.6
21.6
21.6

Linear scalability also for join (DBT3 Q12)

DBT3 Query Performance
• Measured speedup with 32
workers threads
• 9 DBT3 queries can be
executed in parallel (with
default query plans)
• 7 queries shows speedup
above 16x 0x
5x
10x
15x
20x
25x
30x
35x
Q1 Q3 Q5 Q6 Q9 Q10 Q12 Q14 Q19
Speedup

Parallel Query – Current Limitations
Parallel query currently only support:
• SELECT queries
• Parallel scan on driving table of nested-loops join
• InnoDB
Parallel query does not currently execute in parallel:
• JSON
• GIS
• UDFs
• Full text indexes
• Subqueries & CTEs
• Windows functions
• WITH ROLLUP
• Procedures
• SELECT … FOR UPDATE etc.
• SERIALIZABLE isolation level

Parallel Query – Future Work
1 E 2 6 2?D E9 ?6DE65 ?
1 E D 3 6 6D
/6 7 2?46 E 2E ?D E ? ?8 7 6I DE ?8 7 ?4E ?2 E
- 6 5 28? DE 4D D E
I492?86 6 2E D E 6 82E96 6 2E ?D
/2 2 6 92D9 ?
. 5 7 6 2?D E D E 6 677 4 6?E 2 2 6 2E ?
( 6 EE6? E 6 E92E E2 6D 2 2 6 2E ? ?E 244 ?E
) DE 3 E65 6 6I64 E ?

POLARDB for MySQL - Parallel Query

POLARDB for MySQL - Parallel Query

More Related Content

What's hot

Similar to POLARDB for MySQL - Parallel Query

More from oysteing

Recently uploaded

POLARDB for MySQL - Parallel Query