SlideShare a Scribd company logo
1 of 30
Download to read offline
POLARDB for MySQL
Parallel Query
Øystein Grøvlen
Alibaba Cloud
Agenda
• What is Parallel Query
• How to use Parallel Query
• Parallel Query Internals
• Parallel Query Performance
• Future Work
What is Parallel Query?
Parallel Query is an innovative method to accelerate MySQL queries
from Alibaba Cloud.
• Traditionally, 1 MySQL query runs with just 1 thread, and can not take
advantage of multiple cores on modern processors.
• Parallel Query takes advantage of modern processors to distribute
work across many or all available cores:
• 8 parallel threads can be up to 8 times faster.
• 32 parallel threads can be up to 32 times faster
Why Parallel Query?
• 2003: CPUs stopped getting
faster
• 2004-2019 focus on more
cores, sockets.
• PQ lets MySQL take advantage
of last 15 years of progress.
How to Use Parallel Query
Parallel Query runs against your existing InnoDB data.
• No data extraction to another system is required.
• No query modifications are required.
Parallel Query within InnoDB (no extraction needed) is an amazing
feature exclusive to Alibaba Cloud
Query with Parallelism
SELECT count(*) FROM production.product;
Serial execution plan:
• 1
Stream Aggregate: For each of the rows returned by index scan, do the
aggregation.
For the above query, Stream Aggregate operator counts the rows it receives
from the Index Scan operator.
1 active thread
63 idle threads
Thread 1: Scan, Count
SQL
Client
Parallel Execution Plan
Sum
Thread 1: Scan, Count
Thread 2: Scan, Count
Thread 3: Scan, Count
Thread 4: Scan, Count
Thread 5: Scan, Count
Thread 6: Scan, Count
Thread 7: Scan, Count
. . .
Thread 64: Scan, Count
With 64 parallel
threads, each thread
does < 2% of the work.
SQL
Client
How Parallel Query Works
1. Parallel coordinator can split a table or index scan into equal-size
pieces
2. Each of the worker can execute part of the query plan
3. Gather stream operator is responsible for collecting the
intermediate results from workers
How Parallel Query Works
• Each of the workers write results to their own buffer
Ø threads run without interruption
• Pointers are passed for Merge step
Ø optimized method to hand off data
Parallel Query Internals
Parallel Query uses multiple methods to distribute work among the parallel
threads, including:
In a parallel sequential scan, the data pages for the table will be divided
among the cooperating threads.
In a parallel index operation, the cooperating threads will read a single index
block and will scan and return all records referenced by that block; other
threads can at the same time be returning records from a different index
page. The results of a parallel btree scan are returned in sorted order within
each worker thread.
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsInnoDB partitions the B-tree
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Partition 1 Partition 2
2 partitionsWorkers see only one partition (at a time)
Partitioning
11 17 25
5 8
1 2 3 4 5 6 7 8 9 10
14
11 12 13 14 15 16
20 22
17 18 19 20 21 22 23 24
28 31
25 26 27 28 29 30 31 32
Part. 1 Part. 2 Part. 3 Part. 4 Part. 5 Part. 6
6 partitions
Partitioning
• Server will normally request 100 partitions per worker thread
• “Fast” workers may process more partitions than “slow” workers
• Partitions of more equal size
• When finished with one partition, a worker may be automatically
attached to a new partition.
Parallel Query SORT
SELECT col1, col2, col3 FROM t1 ORDER BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel order by of the data handled by each worker
3. Final merge sort of the results and return to client.
Parallel threads
run local sort
SQL
Client
Merge
Sort
Thread 1: Scan, Sort
Thread 2: Scan, Sort
Thread 3: Scan, Sort
Thread 4: Scan, Sort
Thread 5: Scan, Sort
Thread 6: Scan, Sort
Thread 7: Scan, Sort
. . .
Thread 64: Scan, Sort
Parallel Query GROUP BY
SELECT col1, col2, SUM(col3) FROM t1 GROUP BY 1,2;
1. Parallel data access (table scan or index)
2. Parallel group by of the data handled by each worker
3. Final merge of the local group by and return results
DISTINCT operation will be similar to GROUP BY.
Parallel threads
run local group
Merge
Groups
Thread 1: Scan, Group
Thread 2: Scan, Group
Thread 3: Scan, Group
Thread 4: Scan, Group
Thread 5: Scan, Group
Thread 6: Scan, Group
Thread 7: Scan, Group
. . .
Thread 64: Scan, Group
SQL
Client
Parallel Query Nested-Loops JOIN
SELECT * FROM t1 JOIN t3 ON t1.id = t3. id;
1. Parallel data access (table scan or index) of driving
table
2. Parallel join of the local data handled by each worker
3. Final merge of the and return to client
Parallel scan
and join
Merge
Thread 1: Scan, Join
Thread 2: Scan, Join
Thread 3: Scan, Join
Thread 4: Scan, Join
Thread 5: Scan, Join
Thread 6: Scan, Join
Thread 7: Scan, Join
. . .
Thread 64: Scan, Join
SQL
Client
Parallel Query Usage
• To enable parallel execution for a session:
set max_parallel_degree = n
Maximum n worker threads will be used
• MySQL may still decide to not use parallelization. If so, parallel
execution may be forced with
set force_parallel_mode = on
Parallel Query Usage: Hint
• To force parallel query execution for a single query:
SELECT /*+ PARALLEL() */ * FROM ...
• To force the use of a specific number of worker threads, n :
SELECT /*+ PARALLEL(n) */ * FROM ...
Parallel Query Usage: EXPLAIN
mysql> EXPLAIN SELECT SUM(l_quantity) FROM lineitem where l_returnflag = 'A';
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | ke
y_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
| 1 | SIMPLE | <gather2> | NULL | ALL | NULL | NULL | NU
LL | NULL | 5938499 | 10.00 | NULL |
| 2 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NU
LL | NULL | 742312 | 10.00 | Parallel scan (8 workers); Using where |
+----+-------------+-----------+------------+------+---------------+------+---
------+------+---------+----------+----------------------------------------+
2 rows in set, 1 warning (0.00 sec)
Parallel Query Performance
Parallel Query delivers near-perfect
linear acceleration for DBT3 Query 6:
select sum(l_extendedprice * l_discount) as revenue
from lineitem where l_shipdate >= date '1994-01-01’
and l_shipdate < date '1995-01-01’
and l_discount between 0.06 - 0.01 and 0.06 + 0.01
and l_quantity < 24
Tested at 30, 60, 120, and 240 million rows.
Examples:
89 seconds to 3.4 seconds.
177 seconds to 6.3 seconds.
Parallel Query Performance
DBT3 Query 1:
• Scales 29x with 32
worker threads
• Close to linear
scalability
(dashed line)
Why do users care about linear scalability?
Users care about
• Business growth. DB must
deliver stable performance as
business grows
• Faster decisions. Faster
analysis driving faster action.
Faster:
85 seconds to
6 seconds
22.6 seconds
2x data size - 21.6 seconds
4x data size - 21.6 seconds
22.6
21.6
21.6
Linear scalability also for join (DBT3 Q12)
DBT3 Query Performance
• Measured speedup with 32
workers threads
• 9 DBT3 queries can be
executed in parallel (with
default query plans)
• 7 queries shows speedup
above 16x 0x
5x
10x
15x
20x
25x
30x
35x
Q1 Q3 Q5 Q6 Q9 Q10 Q12 Q14 Q19
Speedup
Parallel Query – Current Limitations
Parallel query currently only support:
• SELECT queries
• Parallel scan on driving table of nested-loops join
• InnoDB
Parallel query does not currently execute in parallel:
• JSON
• GIS
• UDFs
• Full text indexes
• Subqueries & CTEs
• Windows functions
• WITH ROLLUP
• Procedures
• SELECT … FOR UPDATE etc.
• SERIALIZABLE isolation level
Parallel Query – Future Work
1 E 2 6 2?D E9 ?6DE65 ?
1 E D 3 6 6D
/6 7 2?46 E 2E ?D E ? ?8 7 6I DE ?8 7 ?4E ?2 E
- 6 5 28? DE 4D D E
I492?86 6 2E D E 6 82E96 6 2E ?D
/2 2 6 92D9 ?
. 5 7 6 2?D E D E 6 677 4 6?E 2 2 6 2E ?
( 6 EE6? E 6 E92E E2 6D 2 2 6 2E ? ?E 244 ?E
) DE 3 E65 6 6I64 E ?
THANK YOU!
POLARDB for MySQL - Parallel Query

More Related Content

What's hot

What's hot (20)

(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
(Berkeley CS186 guest lecture) Big Data Analytics Systems: What Goes Around C...
 
Taking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFramesTaking Spark Streaming to the Next Level with Datasets and DataFrames
Taking Spark Streaming to the Next Level with Datasets and DataFrames
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
Very Large Data Files, Object Stores, and Deep Learning—Lessons Learned While...
 
Top 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applicationsTop 5 mistakes when writing Streaming applications
Top 5 mistakes when writing Streaming applications
 
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin SeyfeSOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
SOS: Optimizing Shuffle I/O with Brian Cho and Ergin Seyfe
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Scaling Apache Spark at Facebook
Scaling Apache Spark at FacebookScaling Apache Spark at Facebook
Scaling Apache Spark at Facebook
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case How to ensure Presto scalability 
in multi use case
How to ensure Presto scalability 
in multi use case
 
Hoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on SparkHoodie: How (And Why) We built an analytical datastore on Spark
Hoodie: How (And Why) We built an analytical datastore on Spark
 
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
Use of Spark MLib for Predicting the Offlining of Digital Media-(Christopher ...
 
Managing ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache SparkManaging ADLS gen2 using Apache Spark
Managing ADLS gen2 using Apache Spark
 
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
Deep Dive Into Apache Spark Multi-User Performance Michael Feiman, Mikhail Ge...
 
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.Apache Flink vs Apache Spark - Reproducible experiments on cloud.
Apache Flink vs Apache Spark - Reproducible experiments on cloud.
 
Operating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in ProductionOperating and Supporting Delta Lake in Production
Operating and Supporting Delta Lake in Production
 
Continuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert XueContinuous Application with FAIR Scheduler with Robert Xue
Continuous Application with FAIR Scheduler with Robert Xue
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad FeinbergSpark Summit EU talk by Ram Sriharsha and Vlad Feinberg
Spark Summit EU talk by Ram Sriharsha and Vlad Feinberg
 
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
Dynamic DDL: Adding Structure to Streaming Data on the Fly with David Winters...
 

Similar to POLARDB for MySQL - Parallel Query

2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
Dimas Prasetyo
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Kristofferson A
 

Similar to POLARDB for MySQL - Parallel Query (20)

2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview2010 12 mysql_clusteroverview
2010 12 mysql_clusteroverview
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysis
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Project Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptxProject Slides for Website 2020-22.pptx
Project Slides for Website 2020-22.pptx
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 
2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query2016 may-countdown-to-postgres-v96-parallel-query
2016 may-countdown-to-postgres-v96-parallel-query
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
 
PostgreSQL query planner's internals
PostgreSQL query planner's internalsPostgreSQL query planner's internals
PostgreSQL query planner's internals
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
MySQL performance tuning
MySQL performance tuningMySQL performance tuning
MySQL performance tuning
 
Sql query performance analysis
Sql query performance analysisSql query performance analysis
Sql query performance analysis
 
PostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability ImprovementsPostgreSQL 9.6 Performance-Scalability Improvements
PostgreSQL 9.6 Performance-Scalability Improvements
 
Oracle Database In-Memory Option in Action
Oracle Database In-Memory Option in ActionOracle Database In-Memory Option in Action
Oracle Database In-Memory Option in Action
 
In Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry OsborneIn Memory Database In Action by Tanel Poder and Kerry Osborne
In Memory Database In Action by Tanel Poder and Kerry Osborne
 
Database and application performance vivek sharma
Database and application performance vivek sharmaDatabase and application performance vivek sharma
Database and application performance vivek sharma
 
Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
PostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major FeaturesPostgreSQL 9.5 - Major Features
PostgreSQL 9.5 - Major Features
 
Enabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speedEnabling presto to handle massive scale at lightning speed
Enabling presto to handle massive scale at lightning speed
 

More from oysteing

How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
oysteing
 

More from oysteing (15)

The MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer TraceThe MySQL Query Optimizer Explained Through Optimizer Trace
The MySQL Query Optimizer Explained Through Optimizer Trace
 
JSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worldsJSON_TABLE -- The best of both worlds
JSON_TABLE -- The best of both worlds
 
Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0Histogram Support in MySQL 8.0
Histogram Support in MySQL 8.0
 
MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0MySQL Optimizer: What’s New in 8.0
MySQL Optimizer: What’s New in 8.0
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0Common Table Expressions (CTE) & Window Functions in MySQL 8.0
Common Table Expressions (CTE) & Window Functions in MySQL 8.0
 
How to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performanceHow to analyze and tune sql queries for better performance
How to analyze and tune sql queries for better performance
 
Using Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query PerformanceUsing Optimizer Hints to Improve MySQL Query Performance
Using Optimizer Hints to Improve MySQL Query Performance
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
MySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table ExpressionsMySQL 8.0: Common Table Expressions
MySQL 8.0: Common Table Expressions
 
How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016How to analyze and tune sql queries for better performance vts2016
How to analyze and tune sql queries for better performance vts2016
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15How to analyze and tune sql queries for better performance percona15
How to analyze and tune sql queries for better performance percona15
 
How to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinarHow to analyze and tune sql queries for better performance webinar
How to analyze and tune sql queries for better performance webinar
 

Recently uploaded

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 

Recently uploaded (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 

POLARDB for MySQL - Parallel Query

  • 1. POLARDB for MySQL Parallel Query Øystein Grøvlen Alibaba Cloud
  • 2. Agenda • What is Parallel Query • How to use Parallel Query • Parallel Query Internals • Parallel Query Performance • Future Work
  • 3. What is Parallel Query? Parallel Query is an innovative method to accelerate MySQL queries from Alibaba Cloud. • Traditionally, 1 MySQL query runs with just 1 thread, and can not take advantage of multiple cores on modern processors. • Parallel Query takes advantage of modern processors to distribute work across many or all available cores: • 8 parallel threads can be up to 8 times faster. • 32 parallel threads can be up to 32 times faster
  • 4. Why Parallel Query? • 2003: CPUs stopped getting faster • 2004-2019 focus on more cores, sockets. • PQ lets MySQL take advantage of last 15 years of progress.
  • 5. How to Use Parallel Query Parallel Query runs against your existing InnoDB data. • No data extraction to another system is required. • No query modifications are required. Parallel Query within InnoDB (no extraction needed) is an amazing feature exclusive to Alibaba Cloud
  • 6. Query with Parallelism SELECT count(*) FROM production.product; Serial execution plan: • 1 Stream Aggregate: For each of the rows returned by index scan, do the aggregation. For the above query, Stream Aggregate operator counts the rows it receives from the Index Scan operator. 1 active thread 63 idle threads Thread 1: Scan, Count SQL Client
  • 7. Parallel Execution Plan Sum Thread 1: Scan, Count Thread 2: Scan, Count Thread 3: Scan, Count Thread 4: Scan, Count Thread 5: Scan, Count Thread 6: Scan, Count Thread 7: Scan, Count . . . Thread 64: Scan, Count With 64 parallel threads, each thread does < 2% of the work. SQL Client
  • 8. How Parallel Query Works 1. Parallel coordinator can split a table or index scan into equal-size pieces 2. Each of the worker can execute part of the query plan 3. Gather stream operator is responsible for collecting the intermediate results from workers
  • 9. How Parallel Query Works • Each of the workers write results to their own buffer Ø threads run without interruption • Pointers are passed for Merge step Ø optimized method to hand off data
  • 10. Parallel Query Internals Parallel Query uses multiple methods to distribute work among the parallel threads, including: In a parallel sequential scan, the data pages for the table will be divided among the cooperating threads. In a parallel index operation, the cooperating threads will read a single index block and will scan and return all records referenced by that block; other threads can at the same time be returning records from a different index page. The results of a parallel btree scan are returned in sorted order within each worker thread.
  • 11. Partitioning 11 17 25 5 8 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32
  • 12. Partitioning 11 17 25 5 8 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Partition 1 Partition 2 2 partitionsInnoDB partitions the B-tree
  • 13. Partitioning 11 17 25 5 8 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Partition 1 Partition 2 2 partitionsWorkers see only one partition (at a time)
  • 14. Partitioning 11 17 25 5 8 1 2 3 4 5 6 7 8 9 10 14 11 12 13 14 15 16 20 22 17 18 19 20 21 22 23 24 28 31 25 26 27 28 29 30 31 32 Part. 1 Part. 2 Part. 3 Part. 4 Part. 5 Part. 6 6 partitions
  • 15. Partitioning • Server will normally request 100 partitions per worker thread • “Fast” workers may process more partitions than “slow” workers • Partitions of more equal size • When finished with one partition, a worker may be automatically attached to a new partition.
  • 16. Parallel Query SORT SELECT col1, col2, col3 FROM t1 ORDER BY 1,2; 1. Parallel data access (table scan or index) 2. Parallel order by of the data handled by each worker 3. Final merge sort of the results and return to client. Parallel threads run local sort SQL Client Merge Sort Thread 1: Scan, Sort Thread 2: Scan, Sort Thread 3: Scan, Sort Thread 4: Scan, Sort Thread 5: Scan, Sort Thread 6: Scan, Sort Thread 7: Scan, Sort . . . Thread 64: Scan, Sort
  • 17. Parallel Query GROUP BY SELECT col1, col2, SUM(col3) FROM t1 GROUP BY 1,2; 1. Parallel data access (table scan or index) 2. Parallel group by of the data handled by each worker 3. Final merge of the local group by and return results DISTINCT operation will be similar to GROUP BY. Parallel threads run local group Merge Groups Thread 1: Scan, Group Thread 2: Scan, Group Thread 3: Scan, Group Thread 4: Scan, Group Thread 5: Scan, Group Thread 6: Scan, Group Thread 7: Scan, Group . . . Thread 64: Scan, Group SQL Client
  • 18. Parallel Query Nested-Loops JOIN SELECT * FROM t1 JOIN t3 ON t1.id = t3. id; 1. Parallel data access (table scan or index) of driving table 2. Parallel join of the local data handled by each worker 3. Final merge of the and return to client Parallel scan and join Merge Thread 1: Scan, Join Thread 2: Scan, Join Thread 3: Scan, Join Thread 4: Scan, Join Thread 5: Scan, Join Thread 6: Scan, Join Thread 7: Scan, Join . . . Thread 64: Scan, Join SQL Client
  • 19. Parallel Query Usage • To enable parallel execution for a session: set max_parallel_degree = n Maximum n worker threads will be used • MySQL may still decide to not use parallelization. If so, parallel execution may be forced with set force_parallel_mode = on
  • 20. Parallel Query Usage: Hint • To force parallel query execution for a single query: SELECT /*+ PARALLEL() */ * FROM ... • To force the use of a specific number of worker threads, n : SELECT /*+ PARALLEL(n) */ * FROM ...
  • 21. Parallel Query Usage: EXPLAIN mysql> EXPLAIN SELECT SUM(l_quantity) FROM lineitem where l_returnflag = 'A'; +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | ke y_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ | 1 | SIMPLE | <gather2> | NULL | ALL | NULL | NULL | NU LL | NULL | 5938499 | 10.00 | NULL | | 2 | SIMPLE | lineitem | NULL | ALL | NULL | NULL | NU LL | NULL | 742312 | 10.00 | Parallel scan (8 workers); Using where | +----+-------------+-----------+------------+------+---------------+------+--- ------+------+---------+----------+----------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
  • 22. Parallel Query Performance Parallel Query delivers near-perfect linear acceleration for DBT3 Query 6: select sum(l_extendedprice * l_discount) as revenue from lineitem where l_shipdate >= date '1994-01-01’ and l_shipdate < date '1995-01-01’ and l_discount between 0.06 - 0.01 and 0.06 + 0.01 and l_quantity < 24 Tested at 30, 60, 120, and 240 million rows. Examples: 89 seconds to 3.4 seconds. 177 seconds to 6.3 seconds.
  • 23. Parallel Query Performance DBT3 Query 1: • Scales 29x with 32 worker threads • Close to linear scalability (dashed line)
  • 24. Why do users care about linear scalability? Users care about • Business growth. DB must deliver stable performance as business grows • Faster decisions. Faster analysis driving faster action. Faster: 85 seconds to 6 seconds 22.6 seconds 2x data size - 21.6 seconds 4x data size - 21.6 seconds 22.6 21.6 21.6
  • 25. Linear scalability also for join (DBT3 Q12)
  • 26. DBT3 Query Performance • Measured speedup with 32 workers threads • 9 DBT3 queries can be executed in parallel (with default query plans) • 7 queries shows speedup above 16x 0x 5x 10x 15x 20x 25x 30x 35x Q1 Q3 Q5 Q6 Q9 Q10 Q12 Q14 Q19 Speedup
  • 27. Parallel Query – Current Limitations Parallel query currently only support: • SELECT queries • Parallel scan on driving table of nested-loops join • InnoDB Parallel query does not currently execute in parallel: • JSON • GIS • UDFs • Full text indexes • Subqueries & CTEs • Windows functions • WITH ROLLUP • Procedures • SELECT … FOR UPDATE etc. • SERIALIZABLE isolation level
  • 28. Parallel Query – Future Work 1 E 2 6 2?D E9 ?6DE65 ? 1 E D 3 6 6D /6 7 2?46 E 2E ?D E ? ?8 7 6I DE ?8 7 ?4E ?2 E - 6 5 28? DE 4D D E I492?86 6 2E D E 6 82E96 6 2E ?D /2 2 6 92D9 ? . 5 7 6 2?D E D E 6 677 4 6?E 2 2 6 2E ? ( 6 EE6? E 6 E92E E2 6D 2 2 6 2E ? ?E 244 ?E ) DE 3 E65 6 6I64 E ?