© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 1
Query Optimizations For Partitioned
Tables
Ashutosh Bapat
@PGCONF INDIA 2018
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 2
• Partition pruning
• Run-time partition pruning
• Partition-wise join
• Partition-wise aggregation
• Partition-wise sorting/ordering
Query Optimization Techniques
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 3
Partitioned Table
Partition 1
FOR VALUES
FROM (0) TO (100)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
Unpartitioned table
t1 (c1 int, c2 int, …)
10, …, ...
90, …, ...
80, …, ...
325, …, …,
375, …, …,
Partitioned table
t1 (c1 int, c2 int, …)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 4
Partition Pruning
SELECT * FROM t1 WHERE c1 = 350;
Partition 1
FOR VALUES
FROM (0) TO (100)
Partitioned table
t1 (c1 int, c2 int, …)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
SELECT * FROM t1
WHERE c1 BETWEEN 150 AND 250;
Partitionboundsbasedelimination
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 5
Run-time Partition Pruning
EXEC scan_t1(350);
Partition 1
FOR VALUES
FROM (0) TO (100)
Partitioned table
t1 (c1 int, c2 int, …)
Partition 4
FOR VALUES
FROM (300) TO (400)
Partition 3
FOR VALUES
FROM (200) TO (300)
Partition 2
FOR VALUES
FROM (100) TO (200)
Partitionboundsbasedelimination
PREPARE scan_t1(int) AS
SELECT * FROM t1 WHERE c1 = $1;
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 7
Partition-wise Join
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 8
Partition-wise Join
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t1 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t2 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
t1 JOIN t2
ON t1.c1 = t2.c1
Partitioned join
Partition 3
FOR VALUES
(200) TO (300)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 3
FOR VALUES
(200) TO (300)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 10
• Scale 20
• Schema changes
– lineitems PARTITION BY RANGE(l_orderkey)
– orders PARTITION BY RANGE(o_orderkey)
– Each with 17 partitions
• GUCs
– work_mem - 1GB
– effective_cache_size - 8GB
– shared_buffers - 8GB
– enable_partition_wise_join = on
TPCH Vs. Partition-wise Join (Scale 20)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 11
TPCH Vs. Partition-wise Join (Scale 20)
(linear time scale)
Q3 Q4 Q5 Q7 Q10 Q12 Q18
0
100
200
300
400
500
600
700
800
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
Queryexecutiontime(seconds)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 12
TPCH Vs. Partition-wise Join (Scale 20)
(scaled execution time)
Q3 Q4 Q5 Q7 Q10 Q12 Q18
1
10
100
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
scaledqueryexecutiontime
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 13
TPCH Vs. partition-wise join: observations
● Join strategy change from join between partitioned
tables to join between partitions
– MergeJoin to HashJoin
● Q3, Q5, Q10, Q12, Q18
– HashJoin to parameterized NestLoop join
● Q4
● Different join strategies for different partition-joins
● Q7
● Change in order of joining partitioned tables
– Q3, Q5, Q10, Q12, Q18, Q7
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 14
• Scale 300
• Schema changes
– lineitems PARTITION BY RANGE(l_orderkey)
– orders PARTITION BY RANGE(o_orderkey)
– Each with 106 partitions
• GUCs
– work_mem - 1GB
– effective_cache_size - 10GB
– shared_buffers - 10GB
– enable_partition_wise_join = on
– max_parallel_workers_per_gather = 4
TPCH Vs. Partition-wise Join (Scale 300)
Reported by Rafia Sabih
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 15
TPCH Vs. Partition-wise Join (Scale 300)
Q3 Q4 Q5 Q10 Q12
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
TPCH queries (scale 20)
Queryexecutiontime(seconds)
Partitioning makes queries faster
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 16
Partition pruning and Partition-wise Join
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t1 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partitioned table
t2 (c1 int, ...)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
t1 JOIN t2
ON t1.c1 = t2.c1
SELECT … FROM t1 JOIN t2 ON t1.c1 = t2.c1
WHERE t1.c1 > 100 AND t2.c1 < 200
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 17
Partition pruning and Partition-wise Join
Partitioned join
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partition 1
FOR VALUES
(0) TO (100)
Partition 3
FOR VALUES
(200) TO (300)
Partition 2
FOR VALUES
(100) TO (200)
Partitioned join
Partition 2
FOR VALUES
(100) TO (200)
Partition 2
FOR VALUES
(100) TO (200)
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 18
Partition-wise aggregation and
grouping
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 19
Full partition-wise aggregation
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Group by t1.c1
Group by t1.c1
Group by t1.c1
Group by t1.c1
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 20
Partial partition-wise aggregation
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Group by t1.c2
Partial Aggregation
Group by t1.c2
Partial Aggregation
Group by t1.c2
Partial Aggregation
Group by t1.c2
Full Aggregation
Group by t1.c2
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 22
Source: Jeevan Chalke's partition-wise aggregate proposal
Query: SELECT a, count(*) FROM plt1 GROUP BY a;
plt1: partitioned table with 3 foreign partitions, each with 1M rows
Query returns 30 rows, 10 rows per partition
enable_partition_wise_agg to false
QUERY PLAN
-------------------------------------
HashAggregate
Group Key: plt1.a
-> Append
-> Foreign Scan on fplt1_p1
-> Foreign Scan on fplt1_p2
-> Foreign Scan on fplt1_p3
Planning time: 0.251 ms
Execution time: 6499.018ms ~ 6.5s
enable_partition_wise_agg to true
QUERY PLAN
-------------------------------------------------------
Append
-> Foreign Scan: Aggregate on (public.fplt1_p1 plt1)
-> Foreign Scan: Aggregate on (public.fplt1_p2 plt1)
-> Foreign Scan: Aggregate on (public.fplt1_p3 plt1)
Planning time: 0.370ms
Execution time: 945.384ms ~ .9s
Example
7x faster
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 23
Partition-wise sorting
Partitioned table
t1 (c1 int, ...)
Partition 1
Partition 3
Partition 2
Partition 1
Partitioned table
t2 (c1 int, ...)
Partition 3
Partition 2
t1 JOIN t2
ON t1.c1 = t2.c1
Partition 1
Partition 3
Partition 2
Partition 1
Partition 3
Partition 2
Sort by c1
Sort by c1
Sort by c1
Sort by c1
© Copyright EnterpriseDB Corporation, 2015. All Rights Reserved. 24
Parallel Append and Partitioning
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 25
Parallel Append and Partitioning
Partition 1 Partition 3Partition 2Partition 1 Partition 3Partition 2
Group by t1.c1 Group by t1.c1 Group by t1.c1
Worker
Backend
1
Worker
Backend
2
Worker
Backend
1
Worker
Backend
3
Leader Backend
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 26
TPCH Vs. PWJ + PA
Q3 Q4 Q5 Q7 Q10 Q12 Q18
0
100
200
300
400
500
600
700
800
Unpartitioned tables
Partitioned tables without PWJ
Partitioned tables with PWJ
Partitioned tables with PWJ and
PA
TPCH queries (scale 20)
Queryexecutiontime(seconds)
© Copyright EnterpriseDB Corporation, 2018. All Rights Reserved. 27
• Committed patches
– Basic partition-wise join - Ashutosh Bapat (EDB)
– Parallel append – Amit Khandekar (EDB)
• Patches submitted on hackers and being reviewed
– Partition pruning – Amit Langote (NTT)
– Run-time partition pruning – Beena Emerson (EDB)
– Partition-wise aggregation – Jeevan Chalke (EDB)
– Partition-wise sorting/ordering – Ronan Dunklau, Julien
Rouhaud (Dalibo)
●
Benchmarking by Rafia Sabih (EDB)
Query Optimization Techniques - patches
Query optimization techniques for partitioned tables.

Query optimization techniques for partitioned tables.

  • 1.
    © Copyright EnterpriseDBCorporation, 2015. All Rights Reserved. 1 Query Optimizations For Partitioned Tables Ashutosh Bapat @PGCONF INDIA 2018
  • 2.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 2 • Partition pruning • Run-time partition pruning • Partition-wise join • Partition-wise aggregation • Partition-wise sorting/ordering Query Optimization Techniques
  • 3.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 3 Partitioned Table Partition 1 FOR VALUES FROM (0) TO (100) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) Unpartitioned table t1 (c1 int, c2 int, …) 10, …, ... 90, …, ... 80, …, ... 325, …, …, 375, …, …, Partitioned table t1 (c1 int, c2 int, …)
  • 4.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 4 Partition Pruning SELECT * FROM t1 WHERE c1 = 350; Partition 1 FOR VALUES FROM (0) TO (100) Partitioned table t1 (c1 int, c2 int, …) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) SELECT * FROM t1 WHERE c1 BETWEEN 150 AND 250; Partitionboundsbasedelimination
  • 5.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 5 Run-time Partition Pruning EXEC scan_t1(350); Partition 1 FOR VALUES FROM (0) TO (100) Partitioned table t1 (c1 int, c2 int, …) Partition 4 FOR VALUES FROM (300) TO (400) Partition 3 FOR VALUES FROM (200) TO (300) Partition 2 FOR VALUES FROM (100) TO (200) Partitionboundsbasedelimination PREPARE scan_t1(int) AS SELECT * FROM t1 WHERE c1 = $1;
  • 6.
    © Copyright EnterpriseDBCorporation, 2015. All Rights Reserved. 7 Partition-wise Join
  • 7.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 8 Partition-wise Join Partition 1 FOR VALUES (0) TO (100) Partitioned table t1 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partitioned table t2 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) t1 JOIN t2 ON t1.c1 = t2.c1 Partitioned join Partition 3 FOR VALUES (200) TO (300) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 3 FOR VALUES (200) TO (300) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200)
  • 8.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 10 • Scale 20 • Schema changes – lineitems PARTITION BY RANGE(l_orderkey) – orders PARTITION BY RANGE(o_orderkey) – Each with 17 partitions • GUCs – work_mem - 1GB – effective_cache_size - 8GB – shared_buffers - 8GB – enable_partition_wise_join = on TPCH Vs. Partition-wise Join (Scale 20)
  • 9.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 11 TPCH Vs. Partition-wise Join (Scale 20) (linear time scale) Q3 Q4 Q5 Q7 Q10 Q12 Q18 0 100 200 300 400 500 600 700 800 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) Queryexecutiontime(seconds)
  • 10.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 12 TPCH Vs. Partition-wise Join (Scale 20) (scaled execution time) Q3 Q4 Q5 Q7 Q10 Q12 Q18 1 10 100 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) scaledqueryexecutiontime
  • 11.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 13 TPCH Vs. partition-wise join: observations ● Join strategy change from join between partitioned tables to join between partitions – MergeJoin to HashJoin ● Q3, Q5, Q10, Q12, Q18 – HashJoin to parameterized NestLoop join ● Q4 ● Different join strategies for different partition-joins ● Q7 ● Change in order of joining partitioned tables – Q3, Q5, Q10, Q12, Q18, Q7
  • 12.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 14 • Scale 300 • Schema changes – lineitems PARTITION BY RANGE(l_orderkey) – orders PARTITION BY RANGE(o_orderkey) – Each with 106 partitions • GUCs – work_mem - 1GB – effective_cache_size - 10GB – shared_buffers - 10GB – enable_partition_wise_join = on – max_parallel_workers_per_gather = 4 TPCH Vs. Partition-wise Join (Scale 300) Reported by Rafia Sabih
  • 13.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 15 TPCH Vs. Partition-wise Join (Scale 300) Q3 Q4 Q5 Q10 Q12 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ TPCH queries (scale 20) Queryexecutiontime(seconds) Partitioning makes queries faster
  • 14.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 16 Partition pruning and Partition-wise Join Partition 1 FOR VALUES (0) TO (100) Partitioned table t1 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partitioned table t2 (c1 int, ...) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) t1 JOIN t2 ON t1.c1 = t2.c1 SELECT … FROM t1 JOIN t2 ON t1.c1 = t2.c1 WHERE t1.c1 > 100 AND t2.c1 < 200
  • 15.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 17 Partition pruning and Partition-wise Join Partitioned join Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partition 1 FOR VALUES (0) TO (100) Partition 3 FOR VALUES (200) TO (300) Partition 2 FOR VALUES (100) TO (200) Partitioned join Partition 2 FOR VALUES (100) TO (200) Partition 2 FOR VALUES (100) TO (200)
  • 16.
    © Copyright EnterpriseDBCorporation, 2015. All Rights Reserved. 18 Partition-wise aggregation and grouping
  • 17.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 19 Full partition-wise aggregation Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Group by t1.c1 Group by t1.c1 Group by t1.c1 Group by t1.c1
  • 18.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 20 Partial partition-wise aggregation Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Group by t1.c2 Partial Aggregation Group by t1.c2 Partial Aggregation Group by t1.c2 Partial Aggregation Group by t1.c2 Full Aggregation Group by t1.c2
  • 19.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 22 Source: Jeevan Chalke's partition-wise aggregate proposal Query: SELECT a, count(*) FROM plt1 GROUP BY a; plt1: partitioned table with 3 foreign partitions, each with 1M rows Query returns 30 rows, 10 rows per partition enable_partition_wise_agg to false QUERY PLAN ------------------------------------- HashAggregate Group Key: plt1.a -> Append -> Foreign Scan on fplt1_p1 -> Foreign Scan on fplt1_p2 -> Foreign Scan on fplt1_p3 Planning time: 0.251 ms Execution time: 6499.018ms ~ 6.5s enable_partition_wise_agg to true QUERY PLAN ------------------------------------------------------- Append -> Foreign Scan: Aggregate on (public.fplt1_p1 plt1) -> Foreign Scan: Aggregate on (public.fplt1_p2 plt1) -> Foreign Scan: Aggregate on (public.fplt1_p3 plt1) Planning time: 0.370ms Execution time: 945.384ms ~ .9s Example 7x faster
  • 20.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 23 Partition-wise sorting Partitioned table t1 (c1 int, ...) Partition 1 Partition 3 Partition 2 Partition 1 Partitioned table t2 (c1 int, ...) Partition 3 Partition 2 t1 JOIN t2 ON t1.c1 = t2.c1 Partition 1 Partition 3 Partition 2 Partition 1 Partition 3 Partition 2 Sort by c1 Sort by c1 Sort by c1 Sort by c1
  • 21.
    © Copyright EnterpriseDBCorporation, 2015. All Rights Reserved. 24 Parallel Append and Partitioning
  • 22.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 25 Parallel Append and Partitioning Partition 1 Partition 3Partition 2Partition 1 Partition 3Partition 2 Group by t1.c1 Group by t1.c1 Group by t1.c1 Worker Backend 1 Worker Backend 2 Worker Backend 1 Worker Backend 3 Leader Backend
  • 23.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 26 TPCH Vs. PWJ + PA Q3 Q4 Q5 Q7 Q10 Q12 Q18 0 100 200 300 400 500 600 700 800 Unpartitioned tables Partitioned tables without PWJ Partitioned tables with PWJ Partitioned tables with PWJ and PA TPCH queries (scale 20) Queryexecutiontime(seconds)
  • 24.
    © Copyright EnterpriseDBCorporation, 2018. All Rights Reserved. 27 • Committed patches – Basic partition-wise join - Ashutosh Bapat (EDB) – Parallel append – Amit Khandekar (EDB) • Patches submitted on hackers and being reviewed – Partition pruning – Amit Langote (NTT) – Run-time partition pruning – Beena Emerson (EDB) – Partition-wise aggregation – Jeevan Chalke (EDB) – Partition-wise sorting/ordering – Ronan Dunklau, Julien Rouhaud (Dalibo) ● Benchmarking by Rafia Sabih (EDB) Query Optimization Techniques - patches