2. 2
● A standard SQL feature
● Two kinds of CTEs
•Recursive
•Non-recursive
● Supported by Oracle, MS SQL Server, PostgreSQL, SQLite, …
● MySQL world:
•MariaDB: 10.2 Beta (Sept, 27th)
•MySQL: MySQL-8.0.0-labs-optimizer tree
Common Table Expressions
5. 5
CTE Syntax
CTE name
CTE Body
CTE Usage
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where ...
WITH
6. 6
select *
from
(
select *
from employees
where
dept='Engineering'
) as engineers
where
...
with engineers as (
select *
from employees
where dept='Engineering'
)
select *
from engineers
where ...
CTEs are like derived tables
7. 7
Use case #1: CTEs refer to CTEs
● More readable than nested FROM(SELECT …)
with engineers as (
select * from employees
where dept in ('Development','Support')
),
eu_engineers as (
select * from engineers where country IN ('NL',...)
)
select
...
from
eu_engineers;
8. 8
with engineers as (
select * from employees
where dept in ('Development','Support')
),
select *
from
engineers E1
where not exists (select 1
from engineers E2
where E2.country=E1.country
and E2.name <> E1.name);
Use case #2: Multiple use of CTE
● Anti-self-join
9. 9
select *
from
sales_product_year CUR,
sales_product_year PREV,
where
CUR.product=PREV.product and
CUR.year=PREV.year + 1 and
CUR.total_amt > PREV.total_amt
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 2
● Year-over-year comparisons
10. 10
select *
from
sales_product_year S1
where
total_amt > (select
0.1*sum(total_amt)
from
sales_product_year S2
where
S2.year=S1.year)
with sales_product_year as (
select
product,
year(ship_date) as year,
sum(price) as total_amt
from
item_sales
group by
product, year
)
Use case #2: example 3
● Compare individuals against their group
11. 11
● Non-recursive CTES are “Query-local VIEWs”
● One CTE can refer to another
•Better than nested FROM (SELECT …)
● Can refer to a CTE from multiple places
•Better than copy-pasting FROM(SELECT …)
● CTE adoption
•TPC-H (1999) - no CTEs
•TPC-DS (2011) - 38 of 100 queries use CTEs.
Conclusions so far
13. 13
Base algorithm: materialize in a temp.table
● Always works
● Often not optimal
with engineers as (
select * from employees
where
dept='Engineering' or dept='Support'
)
select
...
from
engineers,
other_table, ...
14. 14
Optimization #1: CTE Merging
with engineers as (
select * from employees
where
dept='Development'
)
select
...
from
engineers E,
support_cases SC
where
E.name=SC.assignee and
SC.created='2016-09-30' and
E.location='Amsterdam'
select
...
from
employees E,
support_cases SC
where
E.dept='Development' and
E.name=SC.assignee and
SC.created='2016-09-30' and
E.location=’Amsterdam’
● for each support case
•lookup employee by name
15. 15
Optimization #1: CTE Merging (2)
● Requirement
•CTE is a JOIN : no GROUP BY, DISTINCT, etc
● Output
•CTE is merged into parent’s join
•Optimizer can pick the best query plan
● This is the same as ALGORITHM=MERGE for VIEWs
16. 16
with sales_per_year as (
select
year(order.date) as year
sum(order.amount) as sales
from
order
group by
year
)
select *
from sales_per_year
where
year in ('2015','2016')
with sales_per_year as (
select
year(order.date) as year
sum(order.amount) as sales
from
order
where
year in ('2015','2016')
group by
year
)
select *
from sales_per_year
Optimization #2: condition pushdown
17. 17
Optimization #2: condition pushdown (2)
Summary
● Used when merging is not possible
•CTE has a GROUP BY
● Makes temp. table smaller
● Allows to filter out whole groups
● Besides CTEs, works for derived tables and VIEWs
● MariaDB 10.2, Galina’s GSOC 2016 project:
“Pushing conditions into non-mergeable views and derived tables in
MariaDB”
18. 18
with product_sales as (
select
product_name,
year(sale_date),
count(*) as count
from
product_sales
group by product, year
)
select *
from
product_sales P1,
product_sales P2
where
P1.year = 2010 AND
P2.year = 2011 AND ...
Optimization #3: CTE reuse
● Idea
•Fill the CTE once
•Then use multiple times
● Doesn’t work together with
condition pushdown or
merging
19. 19
CTE Optimizations summary
● Merge and condition pushdown are most important
•MariaDB supports them, like MS SQL.
● PostgreSQL’s approach is *weird*
•“CTEs are optimization barriers”
● MySQL’s labs tree: “try merging, otherwise reuse”
CTE Merge Condition pushdown CTE reuse
MariaDB 10.2 ✔ ✔ ✘
MS SQL Server ✔ ✔ ✘
PostgreSQL ✘ ✘ ✔
MySQL 8.0.0-labs-optimizer ✔ ✘ ✔*
22. 22
Recursive CTEs
● First attempt: Oracle’s CONNECT BY syntax (80’s)
● Superseded by Recursive CTEs
•SQL:1999, implementations in 2007-2009
● SQL is poor at “recursive” data structures/algorithms
wheel
boltcapnut
tire valve
rimtirespokes
- Trees - Graphs
Chicago
Nashville Atlanta
Orlando
24. 24
Recursive CTE syntax
Recursive part
Anchor part
Recursive use of CTE
“recursive”
with recursive ancestors as (
select * from folks
where name = 'Alex'
union [all]
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
25. 25
Sister AmyAlex
Mom Dad
Grandpa Bill
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Recursive CTE computation
26. 26
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
+------+--------------+--------+--------+
Computation Result table
Step #1:
execution of the anchor part
27. 27
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
+------+--------------+--------+--------+
Computation Result table
+------+--------------+--------+--------+
| id | name | father | moher |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
Step #2:
execution of the recursive part
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
28. 28
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
+------+--------------+--------+--------+
Computation Result table
Step #2:
execution of the recursive part
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
29. 29
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
Computation Result table
Step #3:
execution of the recursive part
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
30. 30
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
+------+--------------+--------+--------+
Computation Result table
Step #3:
execution of the recursive part
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
31. 31
with recursive ancestors as (
select * from folks
where name = 'Alex'
union
select f.*
from folks as f, ancestors AS a
where
f.id = a.father or f.id = a.mother
)
select * from ancestors;
Computation Result table
Step #4:
execution of the recursive part
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
| 98 | Sister Amy | 20 | 30 |
+------+--------------+--------+--------+
+------+--------------+--------+--------+
| id | name | father | mother |
+------+--------------+--------+--------+
| 100 | Alex | 20 | 30 |
| 20 | Dad | 10 | NULL |
| 30 | Mom | NULL | NULL |
| 10 | Grandpa Bill | NULL | NULL |
+------+--------------+--------+--------+
No results!
32. 32
Summary so far
1. Compute anchor_data
2. Compute recursive_part to
get the new data
3. if (new data is non-empty)
goto 2;
with recursive R as (
select anchor_data
union [all]
select recursive_part
from R, …
)
select …
34. 34
Transitive closure
bus_routes
+------------+------------+
| origin | dst |
+------------+------------+
| New York | Boston |
| Boston | New York |
| New York | Washington |
| Washington | Boston |
| Washington | Raleigh |
+------------+------------+
New York
Boston Washington
Raleigh
35. 35
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
● Start from New York
New York
Boston Washington
Raleigh
Transitive closure
36. 36
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
● Step #1: add Boston and Washington
Transitive closure
+------------+
| dst |
+------------+
| New York |
+------------+
New York
Boston Washington
Raleigh
37. 37
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
● Step #1: add Boston and Washington
New York
Boston Washington
Raleigh
Transitive closure
+------------+
| dst |
+------------+
| New York |
| Boston |
| Washington |
+------------+
38. 38
with recursive bus_dst as
(
select origin as dst
from bus_routes
where origin='New York'
union
select bus_routes.dst
from bus_routes, bus_dst
where
bus_dst.dst= bus_routes.origin
)
select * from bus_dst
● Step#2:
•Add Raleigh
•UNION excludes nodes that are already present.
New York
Boston Washington
Raleigh
Transitive closure
+------------+
| dst |
+------------+
| New York |
| Boston |
| Washington |
| Raleigh |
+------------+
39. 39
UNION and MySQL-8.0-labs
● UNION is useful for “transitive closure” queries
● Supported in all databases
● MySQL 8.0.0-labs
ERROR 42000: This version of MySQL doesn't yet support
'UNION DISTINCT in recursive Common Table Expression (use
UNION ALL)'
•No idea why they have this limitation.
41. 41
Computing “Paths”
bus_routes
+------------+------------+
| origin | dst |
+------------+------------+
| New York | Boston |
| Boston | New York |
| New York | Washington |
| Washington | Boston |
| Washington | Raleigh |
+------------+------------+
New York
Boston Washington
Raleigh
● Want paths like “New York -> Washington -> Raleigh”
42. 42
with recursive paths (cur_path, cur_dest) as
(
select origin, origin
from bus_routes
where origin='New York'
union
select
concat(paths.cur_path, ',',
bus_routes.dest),
bus_routes.dest
from paths, bus_routes
where
paths.cur_dest= bus_routes.origin and
locate(bus_routes.dest, paths.cur_path)=0
)
select * from paths
New York
Boston Washington
Raleigh
Computing “Paths”
43. 43
New York
Boston Washington
Raleigh
select
concat(paths.cur_path, ',',
bus_routes.dest),
bus_routes.dest
from paths, bus_routes
where
paths.cur_dest= bus_routes.origin and
locate(bus_routes.dest, paths.cur_path)=0
+-----------------------------+------------+
| cur_path | cur_dest |
+-----------------------------+------------+
| New York | New York |
| New York,Boston | Boston |
| New York,Washington | Washington |
| New York,Washington,Boston | Boston |
| New York,Washington,Raleigh | Raleigh |
+-----------------------------+------------+
Computing “Paths”
44. 44
Recursion stopping summary
● Tree or Directed Acyclic Graph walking
•Execution is guaranteed to stop
● Computing transitive closure
•Use UNION
● Computing “Paths” over graph with loops
•Put condition into WHERE to stop loops/growth
● Safety measure: @@max_recursive_iterations
•Like in SQL Server
•MySQL-8.0: @@max_execution_time ?
46. 46
[Non-]linear recursion
with recursive R as (
select anchor_data
union [all]
select recursive_part
from R, …
)
select …
The SQL standard requires that
recursion is linear:
● recursive_part must refer to R only
once
•No self-joins
•No subqueries
● Not from inner side of an outer join
● …
47. 47
Linearity of SELECT statements
● Recursive CTE is a “function”: R(x)=a
R
● Linear “function”: R(x+y)=a+b
•added rows in source -> added rows in the result
x a
+y
+b
49. 49
with recursive R as (
select anchor_data
union [all]
select recursive_part
from R, …
)
select …
Linear recursion
● New data is generated by “wave-front” elements
● Contents of R are always growing
50. 50
Non-linear recursion
● Next version of R is computed from the
entire R (not just “wavefront”)
● R may grow, shrink, or change
● It’s possible to screw things up
•R may flip-flop
•R may be non-deterministic
● Still, MariaDB supports it:
set @@standard_compliant_cte=0
•Be sure to know what you’re doing.
52. 52
● Multiple CTEs refer to each other
● Useful for “bipartite” graphs
•emp->dept->emp
● MariaDB supports it
● No other database does
with recursive C1 as (
select …
from anchor_table
union
select …
from C2
),
C2 as (
select …
from C1
)
select ...
Mutual recursion
53. 53
Conclusions
● MariaDB 10.2 has Common Table Expressions
● Both Recursive and Non-recursive are supported
● Non-recursive
•“Query-local VIEWs”
•Competitive set of query optimizations
● Recursive
•Useful for tree/graph-walking queries
•Mutual and non-linear recursion is supported.
55. 55
● Bill of materials
Tree-walking
Which part of bicycle
is nut situated in?
56. 56
+------+------------+-------+---------+
| id | name | count | part_of |
+------+------------+-------+---------+
| 10 | nut | 1 | 7 |
+------+------------+-------+---------+
Bill of materials
wheel
boltcapnut
tire valve
rimtirespokes
with recursive nut as (
select * from bicycle_parts
where name = 'nut'
union
select b.*
from bicycle_parts as b, nut AS n
where b.id = n.part_of
)
select * from nut
where part_of = NULL;
57. 57
+------+------------+-------+---------+
| id | name | count | part_of |
+------+------------+-------+---------+
| 10 | nut | 1 | 7 |
| 7 | tire valve | 2 | 4 |
+------+------------+-------+---------+
Bill of materials
wheel
boltcapnut
tire valve
rimtirespokes
with recursive nut as (
select * from bicycle_parts
where name = 'nut'
union
select b.*
from bicycle_parts as b, nut AS n
where b.id = n.part_of
)
select * from nut
where part_of = NULL;
58. 58
+------+------------+-------+---------+
| id | name | count | part_of |
+------+------------+-------+---------+
| 10 | nut | 1 | 7 |
| 7 | tire valve | 2 | 4 |
| 4 | tire | 2 | 1 |
+------+------------+-------+---------+
Bill of materials
wheel
boltcapnut
tire valve
rimtirespokes
with recursive nut as (
select * from bicycle_parts
where name = 'nut'
union
select b.*
from bicycle_parts as b, nut AS n
where b.id = n.part_of
)
select * from nut
where part_of = NULL;
59. 59
+------+------------+-------+---------+
| id | name | count | part_of |
+------+------------+-------+---------+
| 10 | nut | 1 | 7 |
| 7 | tire valve | 2 | 4 |
| 4 | tire | 2 | 1 |
| 1 | wheel | 2 | NULL |
+------+------------+-------+---------+
wheel
boltcapnut
tire valve
rimtirespokes
Bill of materials
with recursive nut as (
select * from bicycle_parts
where name = 'nut'
union
select b.*
from bicycle_parts as b, nut AS n
where b.id = n.part_of
)
select * from nut
where part_of = NULL;
60. 60
with recursive nut as (
select * from bicycle_parts
where name = 'nut'
union
select b.*
from bicycle_parts as b, nut AS n
where b.id = n.part_of
)
select * from nut
where part_of = NULL;
It is a part of wheel!
Bill of materials
+------+------------+-------+---------+
| id | name | count | part_of |
+------+------------+-------+---------+
| 1 | wheel | 2 | NULL |
+------+------------+-------+---------+