POLARDB for MySQL
Parallel Query
Øystein Grøvlen
Alibaba Cloud
Agenda
• What is Parallel Query
• How to use Parallel Query
• Parallel Query Internals
• Parallel Query Performance
• Future Work
What is Parallel Query?
Parallel Query is an innovative method to accelerate MySQL queries
from Alibaba Cloud.
• Traditionally, 1 MySQL query runs with just 1 thread, and can not take
advantage of multiple cores on modern processors.
• Parallel Query takes advantage of modern processors to distribute
work across many or all available cores:
• 8 parallel threads can be up to 8 times faster.
• 32 parallel threads can be up to 32 times faster
Why Parallel Query?
• 2003: CPUs stopped getting
faster
• 2004-2019 focus on more
cores, sockets.
• PQ lets MySQL take advantage
of last 15 years of progress.
How to Use Parallel Query
Parallel Query runs against your existing InnoDB data.
• No data extraction to another system is required.
• No query modifications are required.
Parallel Query within InnoDB (no extraction needed) is an amazing
feature exclusive to Alibaba Cloud
Query with Parallelism
SELECT count(*) FROM production.product;
Serial execution plan:
• 1
Stream Aggregate: For each of the rows returned by index scan, do the
aggregation.
For the above query, Stream Aggregate operator counts the rows it receives
from the Index Scan operator.
1 active thread
63 idle threads
Thread 1: Scan, Count
SQL
Client
Parallel Execution Plan
Sum
Thread 1: Scan, Count
Thread 2: Scan, Count
Thread 3: Scan, Count
Thread 4: Scan, Count
Thread 5: Scan, Count
Thread 6: Scan, Count
Thread 7: Scan, Count
. . .
Thread 64: Scan, Count
With 64 parallel
threads, each thread
does < 2% of the work.
SQL
Client
How Parallel Query Works
1. Parallel coordinator can split a table or index scan into equal-size
pieces
2. Each of the worker can execute part of the query plan
3. Gather stream operator is responsible for collecting the
intermediate results from workers
How Parallel Query Works
• Each of the workers write results to their own buffer
Ø threads run without interruption
• Pointers are passed for Merge step
Ø optimized method to hand off data
Parallel Query Internals
Parallel Query uses multiple methods to distribute work among the parallel
threads, including:
In a parallel sequential scan, the data pages for the table will be divided
among the cooperating threads.
In a parallel index operation, the cooperating threads will read a single index
block and will scan and return all records referenced by that block; other
threads can at the same time be returning records from a different index
page. The results of a parallel btree scan are returned in sorted order within
each worker thread.
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsInnoDB partitions the B-tree
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsWorkers see only one partition (at a time)
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Part. 1 Part. 2 Part. 3 Part. 4 Part. 5 Part. 6
6 partitions
Partitioning
• Server will normally request 100 partitions per worker thread
• “Fast” workers may process more partitions than “slow” workers
• Partitions of more equal size
• When finished with one partition, a worker may be automatically
attached to a new partition.
Parallel Query SORT
SELECT col1, col2, col3 FROM t1 ORDER BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel order by of the data handled by each worker
3. Final merge sort of the results and return to client.
Parallel threads
run local sort
SQL
Client
Merge
Sort
Thread 1: Scan, Sort
Thread 2: Scan, Sort
Thread 3: Scan, Sort
Thread 4: Scan, Sort
Thread 5: Scan, Sort
Thread 6: Scan, Sort
Thread 7: Scan, Sort
. . .
Thread 64: Scan, Sort
Parallel Query GROUP BY
SELECT col1, col2, SUM(col3) FROM t1 GROUP BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel group by of the data handled by each worker
3. Final merge of the local group by and return results
DISTINCT operation will be similar to GROUP BY.
Parallel threads
run local group
Merge
Groups
Thread 1: Scan, Group
Thread 2: Scan, Group
Thread 3: Scan, Group
Thread 4: Scan, Group
Thread 5: Scan, Group
Thread 6: Scan, Group
Thread 7: Scan, Group
. . .
Thread 64: Scan, Group
SQL
Client
Parallel Query Nested-Loops JOIN
SELECT * FROM t1 JOIN t3 ON t1.id = t3. id;
1. Parallel data access (table scan or index) of driving
table
2. Parallel join of the local data handled by each worker
3. Final merge of the and return to client
Parallel scan
and join
Merge
Thread 1: Scan, Join
Thread 2: Scan, Join
Thread 3: Scan, Join
Thread 4: Scan, Join
Thread 5: Scan, Join
Thread 6: Scan, Join
Thread 7: Scan, Join
. . .
Thread 64: Scan, Join
SQL
Client
Parallel Query Usage
• To enable parallel execution for a session:
set max_parallel_degree = n
Maximum n worker threads will be used
• MySQL may still decide to not use parallelization. If so, parallel
execution may be forced with
set force_parallel_mode = on
Parallel Query Usage: Hint
• To force parallel query execution for a single query:
SELECT /*+ PARALLEL() */ * FROM ...
• To force the use of a specific number of worker threads, n :
SELECT /*+ PARALLEL(n) */ * FROM ...
Parallel Query Usage: EXPLAIN
mysql> EXPLAIN SELECT SUM(l_quantity) FROM lineitem where l_returnflag = 'A';
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | ke
y_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
| 1 | SIMPLE | <gather2> | NULL | ALL | NULL | NULL | NU
LL | NULL | 5938499 | 10.00 | NULL |
| 2 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NU
LL | NULL | 742312 | 10.00 | Parallel scan (8 workers); Using where |
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
2 rows in set, 1 warning (0.00 sec)
Parallel Query Performance
Parallel Query delivers near-perfect
linear acceleration for DBT3 Query 6:
select sum(l_extendedprice * l_discount) as revenue
from lineitem where l_shipdate >= date '1994-01-01’
and l_shipdate < date '1995-01-01’
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24
Tested at 30, 60, 120, and 240 million rows.
Examples:
89 seconds to 3.4 seconds.
177 seconds to 6.3 seconds.
Parallel Query Performance
DBT3 Query 1:
• Scales 29x with 32
worker threads
• Close to linear
scalability
(dashed line)
Why do users care about linear scalability?
Users care about
• Business growth. DB must
deliver stable performance as
business grows
• Faster decisions. Faster
analysis driving faster action.
Faster:
85 seconds to
6 seconds
22.6 seconds
2x data size - 21.6 seconds
4x data size - 21.6 seconds
22.6
21.6
21.6
Linear scalability also for join (DBT3 Q12)
DBT3 Query Performance
• Measured speedup with 32
workers threads
• 9 DBT3 queries can be
executed in parallel (with
default query plans)
• 7 queries shows speedup
above 16x 0x
5x
10x
15x
20x
25x
30x
35x
Q1 Q3 Q5 Q6 Q9 Q10 Q12 Q14 Q19
Speedup
Parallel Query – Current Limitations
Parallel query currently only support:
• SELECT queries
• Parallel scan on driving table of nested-loops join
• InnoDB
Parallel query does not currently execute in parallel:
• JSON
• GIS
• UDFs
• Full text indexes
• Subqueries & CTEs
• Windows functions
• WITH ROLLUP
• Procedures
• SELECT … FOR UPDATE etc.
• SERIALIZABLE isolation level
Parallel Query – Future Work
1 E 2 6 2?D E9 ?6DE65 ?
1 E D 3 6 6D
/6 7 2?46 E 2E ?D E ? ?8 7 6I DE ?8 7 ?4E ?2 E
- 6 5 28? DE 4D D E
I492?86 6 2E D E 6 82E96 6 2E ?D
/2 2 6 92D9 ?
. 5 7 6 2?D E D E 6 677 4 6?E 2 2 6 2E ?
( 6 EE6? E 6 E92E E2 6D 2 2 6 2E ? ?E 244 ?E
) DE 3 E65 6 6I64 E ?
THANK YOU!
POLARDB for MySQL - Parallel Query

POLARDB for MySQL - Parallel Query

  • 1.
    POLARDB for MySQL ParallelQuery Øystein Grøvlen Alibaba Cloud
  • 2.
    Agenda • What isParallel Query • How to use Parallel Query • Parallel Query Internals • Parallel Query Performance • Future Work
  • 3.
    What is ParallelQuery? Parallel Query is an innovative method to accelerate MySQL queries from Alibaba Cloud. • Traditionally, 1 MySQL query runs with just 1 thread, and can not take advantage of multiple cores on modern processors. • Parallel Query takes advantage of modern processors to distribute work across many or all available cores: • 8 parallel threads can be up to 8 times faster. • 32 parallel threads can be up to 32 times faster
  • 4.
    Why Parallel Query? •2003: CPUs stopped getting faster • 2004-2019 focus on more cores, sockets. • PQ lets MySQL take advantage of last 15 years of progress.
  • 5.
    How to UseParallel Query Parallel Query runs against your existing InnoDB data. • No data extraction to another system is required. • No query modifications are required. Parallel Query within InnoDB (no extraction needed) is an amazing feature exclusive to Alibaba Cloud
  • 6.
    Query with Parallelism SELECTcount(*) FROM production.product; Serial execution plan: • 1 Stream Aggregate: For each of the rows returned by index scan, do the aggregation. For the above query, Stream Aggregate operator counts the rows it receives from the Index Scan operator. 1 active thread 63 idle threads Thread 1: Scan, Count SQL Client
  • 7.
    Parallel Execution Plan Sum Thread1: Scan, Count Thread 2: Scan, Count Thread 3: Scan, Count Thread 4: Scan, Count Thread 5: Scan, Count Thread 6: Scan, Count Thread 7: Scan, Count . . . Thread 64: Scan, Count With 64 parallel threads, each thread does < 2% of the work. SQL Client
  • 8.
    How Parallel QueryWorks 1. Parallel coordinator can split a table or index scan into equal-size pieces 2. Each of the worker can execute part of the query plan 3. Gather stream operator is responsible for collecting the intermediate results from workers
  • 9.
    How Parallel QueryWorks • Each of the workers write results to their own buffer Ø threads run without interruption • Pointers are passed for Merge step Ø optimized method to hand off data
  • 10.
    Parallel Query Internals ParallelQuery uses multiple methods to distribute work among the parallel threads, including: In a parallel sequential scan, the data pages for the table will be divided among the cooperating threads. In a parallel index operation, the cooperating threads will read a single index block and will scan and return all records referenced by that block; other threads can at the same time be returning records from a different index page. The results of a parallel btree scan are returned in sorted order within each worker thread.
  • 11.
    Partitioning 11 17 25 58 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32
  • 12.
    Partitioning 11 17 25 58 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Partition 1 Partition 2 2 partitionsInnoDB partitions the B-tree
  • 13.
    Partitioning 11 17 25 58 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Partition 1 Partition 2 2 partitionsWorkers see only one partition (at a time)
  • 14.
    Partitioning 11 17 25 58 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Part. 1 Part. 2 Part. 3 Part. 4 Part. 5 Part. 6 6 partitions
  • 15.
    Partitioning • Server willnormally request 100 partitions per worker thread • “Fast” workers may process more partitions than “slow” workers • Partitions of more equal size • When finished with one partition, a worker may be automatically attached to a new partition.
  • 16.
    Parallel Query SORT SELECTcol1, col2, col3 FROM t1 ORDER BY 1,2; 1. Parallel data access (table scan or index) 2. Parallel order by of the data handled by each worker 3. Final merge sort of the results and return to client. Parallel threads run local sort SQL Client Merge Sort Thread 1: Scan, Sort Thread 2: Scan, Sort Thread 3: Scan, Sort Thread 4: Scan, Sort Thread 5: Scan, Sort Thread 6: Scan, Sort Thread 7: Scan, Sort . . . Thread 64: Scan, Sort
  • 17.
    Parallel Query GROUPBY SELECT col1, col2, SUM(col3) FROM t1 GROUP BY 1,2; 1. Parallel data access (table scan or index) 2. Parallel group by of the data handled by each worker 3. Final merge of the local group by and return results DISTINCT operation will be similar to GROUP BY. Parallel threads run local group Merge Groups Thread 1: Scan, Group Thread 2: Scan, Group Thread 3: Scan, Group Thread 4: Scan, Group Thread 5: Scan, Group Thread 6: Scan, Group Thread 7: Scan, Group . . . Thread 64: Scan, Group SQL Client
  • 18.
    Parallel Query Nested-LoopsJOIN SELECT * FROM t1 JOIN t3 ON t1.id = t3. id; 1. Parallel data access (table scan or index) of driving table 2. Parallel join of the local data handled by each worker 3. Final merge of the and return to client Parallel scan and join Merge Thread 1: Scan, Join Thread 2: Scan, Join Thread 3: Scan, Join Thread 4: Scan, Join Thread 5: Scan, Join Thread 6: Scan, Join Thread 7: Scan, Join . . . Thread 64: Scan, Join SQL Client
  • 19.
    Parallel Query Usage •To enable parallel execution for a session: set max_parallel_degree = n Maximum n worker threads will be used • MySQL may still decide to not use parallelization. If so, parallel execution may be forced with set force_parallel_mode = on
  • 20.
    Parallel Query Usage:Hint • To force parallel query execution for a single query: SELECT /*+ PARALLEL() */ * FROM ... • To force the use of a specific number of worker threads, n : SELECT /*+ PARALLEL(n) */ * FROM ...
  • 21.
    Parallel Query Usage:EXPLAIN mysql> EXPLAIN SELECT SUM(l_quantity) FROM lineitem where l_returnflag = 'A'; +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | ke y_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ | 1 | SIMPLE | <gather2> | NULL | ALL | NULL | NULL | NU LL | NULL | 5938499 | 10.00 | NULL | | 2 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NU LL | NULL | 742312 | 10.00 | Parallel scan (8 workers); Using where | +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
  • 22.
    Parallel Query Performance ParallelQuery delivers near-perfect linear acceleration for DBT3 Query 6: select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01’ and l_shipdate < date '1995-01-01’ and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 24 Tested at 30, 60, 120, and 240 million rows. Examples: 89 seconds to 3.4 seconds. 177 seconds to 6.3 seconds.
  • 23.
    Parallel Query Performance DBT3Query 1: • Scales 29x with 32 worker threads • Close to linear scalability (dashed line)
  • 24.
    Why do userscare about linear scalability? Users care about • Business growth. DB must deliver stable performance as business grows • Faster decisions. Faster analysis driving faster action. Faster: 85 seconds to 6 seconds 22.6 seconds 2x data size - 21.6 seconds 4x data size - 21.6 seconds 22.6 21.6 21.6
  • 25.
    Linear scalability alsofor join (DBT3 Q12)
  • 26.
    DBT3 Query Performance •Measured speedup with 32 workers threads • 9 DBT3 queries can be executed in parallel (with default query plans) • 7 queries shows speedup above 16x 0x 5x 10x 15x 20x 25x 30x 35x Q1 Q3 Q5 Q6 Q9 Q10 Q12 Q14 Q19 Speedup
  • 27.
    Parallel Query –Current Limitations Parallel query currently only support: • SELECT queries • Parallel scan on driving table of nested-loops join • InnoDB Parallel query does not currently execute in parallel: • JSON • GIS • UDFs • Full text indexes • Subqueries & CTEs • Windows functions • WITH ROLLUP • Procedures • SELECT … FOR UPDATE etc. • SERIALIZABLE isolation level
  • 28.
    Parallel Query –Future Work 1 E 2 6 2?D E9 ?6DE65 ? 1 E D 3 6 6D /6 7 2?46 E 2E ?D E ? ?8 7 6I DE ?8 7 ?4E ?2 E - 6 5 28? DE 4D D E I492?86 6 2E D E 6 82E96 6 2E ?D /2 2 6 92D9 ? . 5 7 6 2?D E D E 6 677 4 6?E 2 2 6 2E ? ( 6 EE6? E 6 E92E E2 6D 2 2 6 2E ? ?E 244 ?E ) DE 3 E65 6 6I64 E ?
  • 29.