Common Table Expressions (CTEs) allow for temporary query results to be stored and reused within the same SQL statement. There are two types of CTEs - non-recursive and recursive. Non-recursive CTEs can refer to other CTEs and are optimized by MariaDB through techniques like CTE merging and condition pushdown. Recursive CTEs enable querying recursive relationships and computing transitive closures through a recursive part that is executed repeatedly until it produces no new results.
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
M|18 Taking Advantage of Common Table Expressions
1. Common Table Expressions
Igor Babaev | Principal Software Engineer, MariaDB
Sergei Petrunia | Senior Software Engineer, MariaDB
{igor,sergey}@mariadb.com
2. 2
Common Table Expressions
A standard SQL feature
Two kinds of CTEs
●
Recursive
●
Non-recursive
Supported by Oracle, MS SQL Server, PostgreSQL, SQLite, …
Available in MariaDB 10.2 (Stable since May, 2017)
Available in MySQL 8.0 ( RC since Sept, 2017)
5. 5
CTE name
CTE Body
CTE Usage
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where ...
WITH
CTE syntax
Similar to DERIVED
tables
“Query-local VIEWs”
6. 6
select *
from
(
select *
from employees
where
dept='Engineering'
) as engineers
where
...
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where
...
CTEs are like derived tables
7. 7
with engineers as (
select * from employees
where dept in ('Development','Support')
),
eu_engineers as (
select * from engineers where country IN ('NL',...)
)
select
...
from
eu_engineers;
Use case #1: CTEs refer to CTEs
More readable than nested FROM(SELECT …)
8. 8
with engineers as (
select * from employees
where dept in ('Development','Support')
),
select *
from
engineers E1
where not exists (select 1
from engineers E2
where E2.country=E1.country
and E2.name <> E1.name);
Use case #2: Multiple uses of CTE
Anti-self-join
9. 9
select *
from
sales_product_year CUR,
sales_product_year PREV,
where
CUR.product=PREV.product and
CUR.year=PREV.year + 1 and
CUR.total_amt > PREV.total_amt
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 2
Year-over-year comparisons
10. 10
select *
from
sales_product_year S1
where
total_amt > (select
0.1*sum(total_amt)
from
sales_product_year S2
where
S2.year=S1.year)
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 3
Compare individuals against their group
11. 11
Conclusions so far
Non-recursive CTEs are “Query-local VIEWs”
One CTE can refer to another
Better than nested FROM (SELECT …)
Can refer to a CTE from multiple places
Better than copy-pasting FROM(SELECT …)
CTE adoption
TPC-H (1999) - no CTEs
SQL:1999 introduces CTEs
TPC-DS (2011) - 38 of 100 queries use CTEs.
13. 13
with engineers as (
select * from employees
where
dept='Engineering' or dept='Support'
)
select
...
from
engineers,
other_table, ...
Base algorithm: materialize in a temporary table
Always works
Often not optimal
14. 14
with engineers as (
select * from employees
where
dept='Development'
)
select
...
from
engineers E,
support_cases SC
where
E.name=SC.assignee and
SC.created='2016-09-30' and
E.location='Amsterdam'
select
...
from
employees E,
support_cases SC
where
E.dept='Development' and
E.name=SC.assignee and
SC.created='2016-09-30' and
E.location='Amsterdam'
Optimization #1: CTE Merging
Join optimizer can pick any plan
e.g. support employee→
15. 15
Optimization #1: CTE Merging (2)
Requirement
CTE is just a JOIN : no GROUP BY, DISTINCT, etc
Output
CTE is merged into parent’s join
Optimizer can pick the best query plan
This is the same as ALGORITHM=MERGE for VIEWs
16. 16
with sales_per_year as (
select
year(order.date) as year
sum(order.amount) as sales
from
order
group by
year
)
select *
from sales_per_year
where
year in ('2015','2016')
with sales_per_year as (
select
year(order.date) as year
sum(order.amount) as sales
from
order
where
year in ('2015','2016')
group by
year
)
select *
from sales_per_year
Optimization #2: condition pushdown
17. 17
Condition pushdown summary
Used when merging is not possible
CTE has a GROUP BY
Makes temp. table smaller
Allows to filter out whole GROUP-BY groups
Besides CTEs, works for derived tables and VIEWs
Based on Galina Shalygina’s GSOC 2016 project:
“Pushing conditions into non-mergeable views and derived
tables in MariaDB”
18. 18
with product_sales as (
select
product_name,
year(sale_date),
count(*) as count
from
product_sales
group by product, year)
select *
from
product_sales P1,
product_sales P2
where
P1.year = 2010 AND
P2.year = 2011 AND ...
Optimization #3: CTE reuse
The idea
Fill the CTE once
Then use multiple times
Interferes with condition
pushdown
19. 19
CTE Merge Condition
pushdown
CTE reuse
MariaDB 10.2 ✔ ✔ ✘
MS SQL Server ✔ ✔ ✘
PostgreSQL ✘ ✘ ✔
MySQL 8.0.0 ✔ ✘ ✔
CTE Optimizations summary
Merge and condition pushdown are most important
MariaDB supports them, like MS SQL.
PostgreSQL’s approach is *weird*
“CTEs are optimization barriers”
MySQL: “try merging, otherwise reuse”
23. 23
Recursive part
Anchor part
Recursive use of CTE
“recursive”
Recursive CTE syntax
with recursive ancestors as (
select * from folks
where name = 'Alex'
union [all]
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
24. 24
Sister AmyAlex
Mom Dad
Grandpa Bill
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Recursive CTE computation
Consider a dataset
25. 25
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
+------+--------------+--------+--------+
Result table
Step #1: execute the anchor part
Computation
26. 26
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Step #2: execute the recursive part
Computation
Result table
27. 27
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
+------+--------------+--------+--------+
Result table
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Step #2: Add results the result table
Computation
28. 28
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
Result table+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Computation
Step #3: Execute the recursive part again
29. 29
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
+------+--------------+--------+--------+
Result table
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Computation
●
Step #2: Add results the result table
●
Dad and Mom are already there
30. 30
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
Result table
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
+------+--------------+--------+--------+
Computation
Step #4: Execute the recursive part again
31. 31
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
Result table
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
+------+--------------+--------+--------+
Computation
●
Step #4: No [new] results
●
The process finishes.
32. 32
1. Compute anchor_part
2. Compute recursive_part
to get the new data
3. if (new data is non-empty)
goto 2;
with recursive R as (
select anchor_part
union [all]
select recursive_part
from R, …
)
select …
Summary so far
34. 34
bus_routes
+------------+------------+
| origin | dst |
+------------+------------+
| New York | Boston |
| Boston | New York |
| New York | Washington |
| Washington | Boston |
| Washington | Raleigh |
+------------+------------+
New York
Boston Washington
Raleigh
Transitive closure
35. 35
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_dst, bus_routes
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
Transitive closure
New York
Boston Washington
Raleigh
● bus_dst is where one can be
● Start from New York (with a datatype trick)
36. 36
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_dst, bus_routes
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
Transitive closure
New York
Boston Washington
Raleigh
● Put into the work table
+------------+
| dst |
+------------+
| New York |
+------------+
37. 37
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_dst, bus_routes
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
+------------+
| dst |
+------------+
| New York |
+------------+
Transitive closure
New York
Boston Washington
Raleigh
● Join bus_dst with bus_routes.
● New destinations: Boston, Washington
38. 38
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
+------------+
| dst |
+------------+
| New York |
| Boston |
| Washington |
+------------+
Transitive closure
New York
Boston Washington
Raleigh
● Add new destinations to the temp. table
39. 39
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
+------------+
| dst |
+------------+
| New York |
| Boston |
| Washington |
+------------+
Transitive closure
New York
Boston Washington
Raleigh
●
Join bus_dst with bus_routes
– Raleigh, Boston, New York
40. 40
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
+------------+
| dst |
+------------+
| New York |
| Boston |
| Washington |
| Raleigh |
+------------+
Transitive closure
New York
Boston Washington
Raleigh
●
Join bus_dst with bus_routes
– Raleigh, Boston, New York
41. 41
Summary so far
Can compute transitive closure
UNION prevents loops.
New York
Boston Washington
Raleigh
43. 43
bus_routes
+------------+------------+
| origin | dst |
+------------+------------+
| New York | Boston |
| Boston | New York |
| New York | Washington |
| Washington | Boston |
| Washington | Raleigh |
+------------+------------+
Computing “Paths”
New York
Boston Washington
Raleigh
●
Want paths like New York Washington Raleigh→ →
44. 44
with recursive paths (cur_path, cur_dest) as
(
select origin, origin
from bus_routes
where origin='New York'
union
select
concat(paths.cur_path, ',',
bus_routes.dest),
bus_routes.dest
from paths, bus_routes
where
paths.cur_dest= bus_routes.origin and
locate(bus_routes.dest, paths.cur_path)=0
)
select * from paths
Computing “Paths”
New York
Boston Washington
Raleigh
Collect a path
Don’t construct loops
45. 45
select
concat(paths.cur_path, ',',
bus_routes.dest),
bus_routes.dest
from paths, bus_routes
where
paths.cur_dest= bus_routes.origin and
locate(bus_routes.dest, paths.cur_path)=0
+-----------------------------+------------+
| cur_path | cur_dest |
+-----------------------------+------------+
| New York | New York |
| New York,Boston | Boston |
| New York,Washington | Washington |
| New York,Washington,Boston | Boston |
| New York,Washington,Raleigh | Raleigh |
+-----------------------------+------------+
Computing “Paths”
New York
Boston Washington
Raleigh
46. 46
How recursion stops
Tree or Directed Acyclic Graph walking
Execution is guaranteed to stop
Computing transitive closure
Use UNION
Computing “Paths” over graph with loops
Put condition into WHERE to stop loops/growth
Safety measure: @@max_recursive_iterations
Like in SQL Server
MySQL-8.0: @@cte_max_recursion_depth
48. 48
with recursive R as (
select anchor_part
union [all]
select recursive_part
from R, …
)
select …
[Non-]linear recursion
The SQL standard requires
that recursion is linear:
recursive_part must refer to
R only once
No self-joins
Not from subqueries
Not from inner side of an
outer join
...
50. 50
with recursive R as (
select anchor_part
union [all]
select recursive_part
from R, …
)
select …
Linear recursion
New data is generated by “wave-front” elements
Contents of R are always growing
52. 52
with recursive C1 as (
select …
from anchor_table
union
select …
from C2
),
C2 as (
select …
from C1
)
select ...
Mutual recursion
Multiple CTEs refer to each
other
Useful for “bi-partite” graphs
MariaDB supports it
No other database does
53. 53
Modules and objects
M1
v3
v9v9 v4
A module consumes objects and produces other objects
v
objects
v3
v9
v4
...
m
modules
m1
...
m1
...
(m, v)
m1,v3
...
m1,v9
module_arguments
module_results
(m, v)
m1,v4
...
54. 54
Modules and objects
M1 M2 M3
v3
v9v9 v4
v7 v1
v6 v10
A module consumes objects and produces other objects
56. 56
Query part #1: objects produced from modules
with recursive
reached_objects as
(
select v, "init"
from objects
where v in ('v3','v7','v9')
union
select module_results.v, module_results.m
from module_results, applied_modules
where module_results.m = applied_modules.m
),
57. 57
Query part #2: modules ready to be applied
applied_modules as
(
select * from modules where 1=0
union
select modules.m
from
modules
where
not exists (select * from module_arguments
where module_arguments.m = modules.m and
module_arguments.v not in
(select v from reached_objects))
)
select * from reached_objects;
59. 59
Conclusions
MariaDB 10.2 has Common Table Expressions
Both Recursive and Non-recursive are supported
MariaDB 10.3 add EXCEPT and INTERSECT
- Recursive references in EXCEPT and INTERSECT are allowed
Non-recursive
“Query-local VIEWs”
Competitive set of query optimizations
Recursive
Useful for tree/graph-walking queries
Mutual and non-linear recursions are supported.