Praveen GR
Associate database consultant,
Mydbops
July 17th, 2021
Mydbops 10th meetup
Parallel query in AWS Aurora
MySQL
Interested in Databases
Cloud database
Active Tech Speaker
Active Learner
About Me
Consulting
Services
Managed
Services
Focuses on Top Opensource database MySQL,
MongoDB and PostgreSQL
Mydbops Services
500 + Clients In 5 Yrs. of Operations
Our Clients
Agenda
1.	 About aurora
2. 	 Parallel query feature in Aurora and use case.
3. 	 Prerequisites.
4. 	 Implementation.
5. 	 Test case.
6. 	 Limitation.
About aurora
Aurora architecture
Read Read
Read
Write
Write
Write
Primary instance Aurora reader
Aurora reader
Availability zone a Availability zone b Availability zone c
About aurora
1.	 Volume level sync among the cluster node.
2. 	 Has it's own AZ of the volume in a different zone.
3. 	 The writer supports read and writes. Reader supports only reads.
4. 	 Up to 15 read replica is supported.
5. 	 Automated failure when the writer is down.
About aurora
6. 	 Dynamic disk adjustment.
7. 	 Aurora storage is also self-healing.
8. 	 Auto-scaling of the reader.
9. 	 Low-Latency Read Replicas.
10. 	 Custom endpoint with autoscaling.
Prominent Aurora MySQL feature
Prominent Aurora MySQL feature
1.	 Hash join in MySQL 5.7.
2. 	 Parallel query
3. 	 Storage Auto-Scaling
Parallel query in Aurora
Three steps in parallel query
1. 	 SQL execution in parallel.
2. 	 Splits among the cluster.
3. 	 Send required data to the network.
Architecture flow
Node 1
Node 4
Node 3
Node 2
Application
Use cases
Use cases
1.	 Better with query with equal, in, range.
2. 	 More optimised for analytical queries.
3. 	 Reduced IOPS and CPU utilisation.
4. 	 Uniform load sharing.
Prerequisites
Prerequisites
1.	 Aurora version > 2.09 or 1.23
2. 	 Instance class - R series
3. 	 Non-partitioned table.
4. 	 Hash join optimisation enabled.
Implementation
Implementation
Global variable
1.	 aurora_parallel_query = ON

2. 	 aurora_disable_hash_join = OFF
Test case
Monitoring
Status variable
1.	 Aurora_pq_request_attempted

2. 	 Aurora_pq_request_executed

3. 	 Aurora_pq_request_not_chosen_below_min_rows

4. 	 Aurora_pq_max_concurrent_requests

5. 	 Aurora_pq_request_in_progress
Lab Environment
Instance type  r3.large
Aurora version 2.09
MySQL version 5.7
 Table size 255 GB
Record creation sysbench
Test case
Without parallel query
mysql> explain select count(id) from sbtest1 where id > 12345 and k < 6789;
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
| 1 | SIMPLE | sbtest1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 654698147 | 33.33 | Using where |
+----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
With parallel query
mysql> explain select count(id) from sbtest1 where id > 12345 and k < 6789;
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
|
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
| 1 | SIMPLE | sbtest1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1309396294 | 16.66 | Using where; Using parallel query (2 columns, 2 filters, 0
exprs; 0 extra) |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
1 row in set, 1 warning (0.00 sec)
Explain plan
Test case
Explain Extra option

 

Columns No.of column in the query.
Filters
No.of column in where clause with
equal, not-equal, range.
Exprs
Column with function or operator , that
can proceed by the parallel query.
Extra
Number of expression that cannot be
proceed by parallel query.
Performance Analysis
Without parallel query
mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null;
+--------+
| sum(k) |
+--------+
| NULL |
+--------+
1 row in set (2 hours 33 min 22.40 sec)
With parallel query
mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null;
+--------+
| sum(k) |
+--------+
| NULL |
+--------+
1 row in set (1 min 6.61 sec)
With function
Performance Analysis
Without parallel query
CPU is in complete saturation for the query execution.
Performance Analysis
Without parallel query
Read latency of the server is high.
Performance Analysis
With parallel query
Single spike for query processing.
Performance Analysis
With parallel query
Test case ( using eq Ref )
Without parallel query
mysql> explain select sql_no_cache count(*) from sbtest1 where k=7256238746;
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+
| 1 | SIMPLE | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 1205294616 | 10.00 | Using where |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
With parallel query
mysql> explain select count(*) from sbtest1 where k=7256238746;
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
|
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
| 1 | SIMPLE | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 1205294616 | 10.00 | Using where; Using parallel query (1 columns, 0 filters, 1
exprs; 0 extra) |
+----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------
------------+
1 row in set, 1 warning (0.01 sec)
Explain plan
Performance
Without parallel query
mysql> select sql_no_cache count(*) from sbtest1 where k=7256238746;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (2 hours 25 min 9.11 sec)
With parallel query
mmysql> select count(*) from sbtest1 where k=7256238746;
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (2 min 12.22 sec)
Performance Analysis
Without parallel query
CPU is in complete saturation for the query execution.
Performance
Without parallel query
Read latency of the server is high.
Performance
With parallel query
Single spike for query processing.
Performance Analysis
With parallel query
Test case ( Using Join Cond )
Without parallel query
mysql> explain select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423;
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+
| 1 | SIMPLE | t2 | NULL | index | PRIMARY | k_1 | 4 | NULL | 17588790 | 100.00 | Using index |
| 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | sbtest.t2.id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
With parallel query
mysql> explain select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423;
+----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+-----------------------------------------------------------------
---------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra
|
+----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+-----------------------------------------------------------------
---------------------------------------------------------+
| 1 | SIMPLE | t2 | NULL | index | PRIMARY | k_1 | 4 | NULL | 17588790 | 100.00 | Using index
|
| 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1205294616 | 0.00 | Using where; Using join buffer (Hash Join Outer table t1); Using
parallel query (2 columns, 1 filters, 1 exprs; 0 extra) |
+----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+-----------------------------------------------------------------
---------------------------------------------------------+
2 rows in set, 1 warning (0.00 sec)
Explain plan
Without parallel query
With parallel query
mysql> select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423;
+-------------+
| count(t1.k) |
+-------------+
| 0 |
+-------------+
1 row in set (2 min 38.48 sec)
mysql> select sql_no_cache count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where
t1.k=247423;
+-------------+
| count(t1.k) |
+-------------+
| 0 |
+-------------+
1 row in set (10 min 57.72 sec)
Performance Analysis
Performance
Without parallel query
Performance Analysis
Performance
Without parallel query
Performance Analysis
With parallel query
Performance Analysis
Performance
With parallel query
Performance Analysis
Performance Improvement
Efficiency
120
100
80
60
40
20
0
PQ Normal​
Method
Title
120
100
80
60
40
20
Normal PQ​
CPU
Performance Summary

 Without parallel Query With parallel query
Function 2 hrs 33 mins 1 min
Eq Ref 2 hrs 25 mins 2 mins
Join condition 10 mins 2 mins
Limitations
Limitation
1.Row format should not be compressed. Supports only dynamic
2. Won't work for smaller tables.
3. Limited number of parallel query execution.
4. Aurora version should > 2.09.
Reach Us : Info@mydbops.com
Thank You

Parallel Query in AWS Aurora MySQL

  • 1.
    Praveen GR Associate databaseconsultant, Mydbops July 17th, 2021 Mydbops 10th meetup Parallel query in AWS Aurora MySQL
  • 2.
    Interested in Databases Clouddatabase Active Tech Speaker Active Learner About Me
  • 3.
    Consulting Services Managed Services Focuses on TopOpensource database MySQL, MongoDB and PostgreSQL Mydbops Services
  • 4.
    500 + ClientsIn 5 Yrs. of Operations Our Clients
  • 5.
  • 6.
    1. About aurora 2. Parallel query feature in Aurora and use case. 3. Prerequisites. 4. Implementation. 5. Test case. 6. Limitation.
  • 7.
  • 8.
    Aurora architecture Read Read Read Write Write Write Primaryinstance Aurora reader Aurora reader Availability zone a Availability zone b Availability zone c
  • 9.
    About aurora 1. Volumelevel sync among the cluster node. 2. Has it's own AZ of the volume in a different zone. 3. The writer supports read and writes. Reader supports only reads. 4. Up to 15 read replica is supported. 5. Automated failure when the writer is down.
  • 10.
    About aurora 6. Dynamic disk adjustment. 7. Aurora storage is also self-healing. 8. Auto-scaling of the reader. 9. Low-Latency Read Replicas. 10. Custom endpoint with autoscaling.
  • 11.
  • 12.
    Prominent Aurora MySQLfeature 1. Hash join in MySQL 5.7. 2. Parallel query 3. Storage Auto-Scaling
  • 13.
  • 14.
    Three steps inparallel query 1. SQL execution in parallel. 2. Splits among the cluster. 3. Send required data to the network.
  • 15.
    Architecture flow Node 1 Node4 Node 3 Node 2 Application
  • 16.
  • 17.
    Use cases 1. Betterwith query with equal, in, range. 2. More optimised for analytical queries. 3. Reduced IOPS and CPU utilisation. 4. Uniform load sharing.
  • 18.
  • 19.
    Prerequisites 1. Aurora version> 2.09 or 1.23 2. Instance class - R series 3. Non-partitioned table. 4. Hash join optimisation enabled.
  • 20.
  • 21.
    Implementation Global variable 1. aurora_parallel_query= ON 2. aurora_disable_hash_join = OFF
  • 22.
  • 23.
    Monitoring Status variable 1. Aurora_pq_request_attempted 2. Aurora_pq_request_executed 3. Aurora_pq_request_not_chosen_below_min_rows 4. Aurora_pq_max_concurrent_requests 5. Aurora_pq_request_in_progress
  • 24.
    Lab Environment Instance type r3.large Aurora version 2.09 MySQL version 5.7  Table size 255 GB Record creation sysbench
  • 25.
    Test case Without parallelquery mysql> explain select count(id) from sbtest1 where id > 12345 and k < 6789; +----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+ | 1 | SIMPLE | sbtest1 | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 654698147 | 33.33 | Using where | +----+-------------+---------+------------+-------+---------------+---------+---------+------+-----------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) With parallel query mysql> explain select count(id) from sbtest1 where id > 12345 and k < 6789; +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ | 1 | SIMPLE | sbtest1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1309396294 | 16.66 | Using where; Using parallel query (2 columns, 2 filters, 0 exprs; 0 extra) | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ 1 row in set, 1 warning (0.00 sec) Explain plan
  • 26.
    Test case Explain Extraoption Columns No.of column in the query. Filters No.of column in where clause with equal, not-equal, range. Exprs Column with function or operator , that can proceed by the parallel query. Extra Number of expression that cannot be proceed by parallel query.
  • 27.
    Performance Analysis Without parallelquery mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null; +--------+ | sum(k) | +--------+ | NULL | +--------+ 1 row in set (2 hours 33 min 22.40 sec) With parallel query mysql> select sql_no_cache sum(k) from sbtest1 where upper(k)=231212 and upper(c) is not null; +--------+ | sum(k) | +--------+ | NULL | +--------+ 1 row in set (1 min 6.61 sec) With function
  • 28.
    Performance Analysis Without parallelquery CPU is in complete saturation for the query execution.
  • 29.
    Performance Analysis Without parallelquery Read latency of the server is high.
  • 30.
    Performance Analysis With parallelquery Single spike for query processing.
  • 31.
  • 32.
    Test case (using eq Ref ) Without parallel query mysql> explain select sql_no_cache count(*) from sbtest1 where k=7256238746; +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+ | 1 | SIMPLE | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 1205294616 | 10.00 | Using where | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+-------------+ 1 row in set, 1 warning (0.00 sec) With parallel query mysql> explain select count(*) from sbtest1 where k=7256238746; +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ | 1 | SIMPLE | sbtest1 | NULL | ALL | NULL | NULL | NULL | NULL | 1205294616 | 10.00 | Using where; Using parallel query (1 columns, 0 filters, 1 exprs; 0 extra) | +----+-------------+---------+------------+------+---------------+------+---------+------+------------+----------+---------------------------------------------------------------- ------------+ 1 row in set, 1 warning (0.01 sec) Explain plan
  • 33.
    Performance Without parallel query mysql>select sql_no_cache count(*) from sbtest1 where k=7256238746; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (2 hours 25 min 9.11 sec) With parallel query mmysql> select count(*) from sbtest1 where k=7256238746; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (2 min 12.22 sec)
  • 34.
    Performance Analysis Without parallelquery CPU is in complete saturation for the query execution.
  • 35.
    Performance Without parallel query Readlatency of the server is high.
  • 36.
    Performance With parallel query Singlespike for query processing.
  • 37.
  • 38.
    Test case (Using Join Cond ) Without parallel query mysql> explain select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423; +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+ | 1 | SIMPLE | t2 | NULL | index | PRIMARY | k_1 | 4 | NULL | 17588790 | 100.00 | Using index | | 1 | SIMPLE | t1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | sbtest.t2.id | 1 | 10.00 | Using where | +----+-------------+-------+------------+--------+---------------+---------+---------+--------------+----------+----------+-------------+ 2 rows in set, 1 warning (0.00 sec) With parallel query mysql> explain select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423; +----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------- ---------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------- ---------------------------------------------------------+ | 1 | SIMPLE | t2 | NULL | index | PRIMARY | k_1 | 4 | NULL | 17588790 | 100.00 | Using index | | 1 | SIMPLE | t1 | NULL | ALL | PRIMARY | NULL | NULL | NULL | 1205294616 | 0.00 | Using where; Using join buffer (Hash Join Outer table t1); Using parallel query (2 columns, 1 filters, 1 exprs; 0 extra) | +----+-------------+-------+------------+-------+---------------+------+---------+------+------------+----------+----------------------------------------------------------------- ---------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec) Explain plan
  • 39.
    Without parallel query Withparallel query mysql> select count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423; +-------------+ | count(t1.k) | +-------------+ | 0 | +-------------+ 1 row in set (2 min 38.48 sec) mysql> select sql_no_cache count(t1.k) from sbtest1 t1 inner join sbtest3 t2 on t1.id=t2.id where t1.k=247423; +-------------+ | count(t1.k) | +-------------+ | 0 | +-------------+ 1 row in set (10 min 57.72 sec) Performance Analysis
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
    Performance Summary Withoutparallel Query With parallel query Function 2 hrs 33 mins 1 min Eq Ref 2 hrs 25 mins 2 mins Join condition 10 mins 2 mins
  • 46.
  • 47.
    Limitation 1.Row format shouldnot be compressed. Supports only dynamic 2. Won't work for smaller tables. 3. Limited number of parallel query execution. 4. Aurora version should > 2.09.
  • 48.
    Reach Us :Info@mydbops.com Thank You