Recursive Query Throwdown
in MySQL 8
BILL KARWIN
PERCONA LIVE OPEN SOURCE DATABASE CONFERENCE 2017
Bill Karwin
Software developer, consultant, trainer
Using MySQL since 2000
Senior Database Architect at SchoolMessenger
Author of SQL Antipatterns: Avoiding the Pitfalls of
Database Programming
Oracle ACE Director
How to Query a Tree?
Hierarchical data
§ Organization charts
§ Categories and sub-categories
§ Parts explosion
§ Threaded discussions
https://commons.wikimedia.org/wiki/File:Staff_Organisation_Diagram,_1896.jpg
Example: Threaded Comments
Adjacency List Example Data
comment_id parent_id author comment
1 NULL Fran What’s the cause of this bug?
2 1 Ollie I think it’s a null pointer.
3 2 Fran No, I checked for that.
4 1 Kukla We need to check valid input.
5 4 Ollie Yes, that’s a bug.
6 4 Fran Yes, please add a check
7 6 Kukla That fixed it.
Can’t Easily Query Deep Trees
SELECT * FROM Comments c1
LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id)
LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id)
LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id)
LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id)
LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id)
LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id)
LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id)
LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id)
LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id)
...
MySQL Workarounds
MySQL Workarounds
MySQL lacked support for recursive queries, so workarounds were needed
These are all denormalized designs, most don’t have referential integrity
§Path enumeration
§Nested sets
§Closure table
Path Enumeration Example Data
comment_id path author comment
1 1/ Fran What’s the cause of this bug?
2 1/2/ Ollie I think it’s a null pointer.
3 1/2/3/ Fran No, I checked for that.
4 1/4/ Kukla We need to check valid input.
5 1/4/5/ Ollie Yes, that’s a bug.
6 1/4/6/ Fran Yes, please add a check
7 1/4/6/7/ Kukla That fixed it.
Path Enumeration Example Queries
Query ancestors of comment #7:
SELECT * FROM Comments
WHERE '1/4/6/7/' LIKE CONCAT(path, '%');
Query descendants of comment #4:
SELECT * FROM Comments
WHERE path LIKE '1/4/%';
Path Enumeration Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
Cons:
§Complex updates to add or remove a node
§Numbers are stored in a string—no referential integrity
Nested Sets
Each comment encodes its descendants using two numbers:
§ A comment’s left number is less than all numbers used by the comment’s descendants.
§ A comment’s right number is greater than all numbers used by the comment’s
descendants.
§ A comment’s numbers are between all
numbers used by the comment’s ancestors.
References:
§ “Recursive Hierarchies: The Relational Taboo!” Michael J. Kamfonas,
Relational Journal, Oct/Nov 1992
§ “Trees and Hierarchies in SQL For Smarties,” Joe Celko, 2004
§ “Managing Hierarchical Data in MySQL,” Mike Hillyer, 2005
Nested Sets Example
Nested Sets Example Data
comment_id nsleft nsright author comment
1 1 14 Fran What’s the cause of this bug?
2 2 5 Ollie I think it’s a null pointer.
3 3 4 Fran No, I checked for that.
4 6 13 Kukla We need to check valid input.
5 7 8 Ollie Yes, that’s a bug.
6 9 12 Fran Yes, please add a check
7 10 11 Kukla That fixed it.
Nested Sets Example Queries
Query ancestors of comment #7:
SELECT ancestor.* FROM Comments child
JOIN Comments ancestor
ON child.nsleft BETWEEN ancestor.nsleft AND ancestor.nsright
WHERE child.comment_id = 7;
Query subtree under comment #4:
SELECT descendant.* FROM Comments parent
JOIN Comments descendant
ON descendant.nsleft BETWEEN parent.nsleft AND parent.nsright
WHERE parent.comment_id = 4;
Nested Sets Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
Cons:
§Complex updates to add or remove a node
§Numbers are not foreign keys—no referential integrity
Closure Table
Many-to-many table
Stores every path from each node to each of its descendants
A node even connects to itself
CREATE TABLE Closure (
ancestor INT NOT NULL,
descendant INT NOT NULL,
length INT NOT NULL,
PRIMARY KEY (ancestor, descendant),
FOREIGN KEY(ancestor) REFERENCES Comments(comment_id),
FOREIGN KEY(descendant) REFERENCES Comments(comment_id)
);
Closure Table Example
Closure Table Example Data
comment_id author comment
1 Fran What’s the cause of this bug?
2 Ollie I think it’s a null pointer.
3 Fran No, I checked for that.
4 Kukla We need to check valid input.
5 Ollie Yes, that’s a bug.
6 Fran Yes, please add a check
7 Kukla That fixed it.
ancestor descendant length
1 1 0
1 2 1
1 3 2
1 4 1
1 5 2
1 6 2
1 7 3
2 2 0
2 3 1
3 3 0
4 4 0
4 5 1
4 6 1
4 7 2
5 5 0
6 6 0
6 7 1
7 7 0
Closure Table Example Queries
Query ancestors of comment #7:
SELECT c.* FROM Comments c
JOIN Closure t
ON (c.comment_id = t.ancestor)
WHERE t.descendant = 7;
Query subtree under comment #4:
SELECT c.* FROM Comments c
JOIN Closure t
ON (c.comment_id = t.descendant)
WHERE t.ancestor = 4;
Closure Table Pros and Cons
Pros:
§Single non-recursive query to get a tree or a subtree
§Referential integrity!
Cons:
§Extra table is required
§Hierarchy is stored redundantly, too easy to mess up
§Lots of joins to do most kinds of queries
ANSI SQL Recursive CTE
WITHer Recursive Queries in MySQL?
SQL vendors gradually implemented SQL-99 WITH syntax:
§ IBM DB2 UDB 8 (Dec. 2002)
§ Microsoft SQL Server 2005 (Oct. 2005)
§ Sybase SQL Anywhere 11 (Aug. 2008)
§ Firebird 2.1 (Sep. 2008)
§ PostgreSQL 8.4 (Jul. 2009)
§ Oracle 11g release 2 (Sep. 2009)
§ Teradata (date and version of support unknown, at least 2009)
§ HSQLDB 2.3 (Jul. 2013)
§ SQLite 3.8.3.1 (Feb. 2014)
§ H2 (date and version unknown)
https://www.percona.com/blog/2014/02/11/wither-recursive-queries/
ANSI SQL Recursive Common Table Expression
WITH RECURSIVE cte_name (col_name, col_name, col_name) AS
(
subquery base case
UNION ALL
subquery referencing cte_name
)
SELECT ... FROM cte_name ...
https://dev.mysql.com/doc/refman/8.0/en/with.html
Generating a Series of Numbers
WITH RECURSIVE MySeries (n) AS
(
SELECT 1 AS n
UNION ALL
SELECT 1+n FROM MySeries WHERE n < 10
)
SELECT * FROM MySeries;
+------+
| n |
+------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------+
Generating a Series of Dates
WITH RECURSIVE MyDates (d) AS
(
SELECT CURRENT_DATE() AS d
UNION ALL
SELECT d + INTERVAL 1 DAY FROM MyDates
WHERE d < CURRENT_DATE() + INTERVAL 7 DAY
)
SELECT * FROM MyDates;
+------------+
| d |
+------------+
| 2017-04-24 |
| 2017-04-25 |
| 2017-04-26 |
| 2017-04-27 |
| 2017-04-28 |
| 2017-04-29 |
| 2017-04-30 |
| 2017-05-01 |
+------------+
Query ancestors of comment #7
WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,
depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depth
FROM Comments
WHERE comment_id = 7
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1
FROM CommentTree ct
JOIN Comments c ON (ct.parent_id = c.comment_id)
)
SELECT * FROM CommentTree;
Query subtree under comment #4
WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment,
depth) AS
(
SELECT comment_id, parent_id, author, comment, 0 AS depth
FROM Comments
WHERE comment_id = 4
UNION ALL
SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1
FROM CommentTree ct
JOIN Comments c ON (ct.comment_id = c.parent_id)
)
SELECT * FROM CommentTree;
Recursive CTE Pros and Cons
Pros:
§ ANSI SQL-99 Standard
§ Compatible with other SQL implementations
§ Works with Adjacency List (single source of authority)
§ Referential integrity!
Cons:
§ Not compatible with earlier MySQL versions
§ Use of materialized temporary tables may cause performance problems
MySQL CTE Implementation: 💯
Thanks	to	@MarkusWinand for	his	preview	analysis	based	on	8.0.1-dmr
http://modern-sql.com/feature/with
Big Hierarchies
ITIS: Sample Hierarchical Data
Integrated Taxonomic Information System
(https://www.itis.gov/)
§Biological database of species of animals, plants, fungi
§One big tree of 544,954 nodes
§Data comes in adjacency list & path enumeration format
§I converted to closure table for query tests
ITIS Data Model
mysql> select * from longnames
where completename = 'Eschscholzia californica';
+--------+---------------------------+
| tsn | completename |
+--------+---------------------------+
| 18956 | Eschscholzia californica |
+--------+---------------------------+
mysql> select * from hierarchy where TSN = '18956'G
TSN: 18956
Parent_TSN: 18954
level: 11
ChildrenCount: 8
hierarchy_string: 202422-954898-846494-954900-846496-846504-18063-846547-18409-18880-18954-18956
Indexes
mysql> ALTER TABLE hierarchy ADD KEY (tsn, parent_tsn);
Query OK, 0 rows affected (1.30 sec)
Breadcrumbs Query
WITH RECURSIVE taxonomy AS
(
SELECT base.tsn, base.parent_tsn, 0 as depth
FROM hierarchy base
WHERE tsn = '18956'
UNION ALL
SELECT next.tsn, next.parent_tsn, t.depth+1
FROM hierarchy next JOIN taxonomy t
WHERE t.parent_tsn = next.tsn
)
SELECT * FROM taxonomy JOIN longnames USING (tsn)
ORDER BY depth DESC;
Breadcrumbs Query Result
+--------+------------+-------+--------------------------+
| tsn | parent_tsn | depth | completename |
+--------+------------+-------+--------------------------+
| 202422 | 0 | 11 | Plantae |
| 954898 | 202422 | 10 | Viridiplantae |
| 846494 | 954898 | 9 | Streptophyta |
| 954900 | 846494 | 8 | Embryophyta |
| 846496 | 954900 | 7 | Tracheophyta |
| 846504 | 846496 | 6 | Spermatophytina |
| 18063 | 846504 | 5 | Magnoliopsida |
| 846547 | 18063 | 4 | Ranunculanae |
| 18409 | 846547 | 3 | Ranunculales |
| 18880 | 18409 | 2 | Papaveraceae |
| 18954 | 18880 | 1 | Eschscholzia |
| 18956 | 18954 | 0 | Eschscholzia californica |
+--------+------------+-------+--------------------------+
12 rows in set (0.00 sec)
Breadcrumbs Query EXPLAIN Plan
§New note in Extra: "Recursive"
§Using index (covering index) for both base case and recursive case
§I can eliminate the filesort if I allow natural order (base case first)
§No "Using Temporary"? Not so fast…
+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort |
| 1 | PRIMARY | longnames | eq_ref | PRIMARY,tsn | PRIMARY | 4 | taxonomy.tsn | 1 | 100.00 | NULL |
| 2 | DERIVED | base | ref | TSN | TSN | 4 | const | 1 | 100.00 | Using index |
| 3 | UNION | t | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where |
| 3 | UNION | next | ref | TSN | TSN | 4 | t.parent_tsn | 1 | 100.00 | Using index |
+----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+
Breadcrumbs Query Performance
mysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLESG
query: WITH RECURSIVE `taxonomy` AS ( ...
`tsn` ) ORDER BY `depth` DESC
db: itis
exec_count: 1
total_latency: 10.05 ms
memory_tmp_tables: 1
disk_tmp_tables: 0
avg_tmp_tables_per_query: 1
tmp_tables_to_disk_pct: 0
first_seen: 2017-04-24 22:07:56
last_seen: 2017-04-24 22:07:56
digest: 8438633360bedce178823bb868589fd0
Breadcrumbs Query Stages
mysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;
+------+--------------------------------+-------+---------------+-------------+
| user | event_name | total | total_latency | avg_latency |
+------+--------------------------------+-------+---------------+-------------+
| root | stage/sql/System lock | 40 | 6.62 ms | 165.60 us |
| root | stage/sql/Opening tables | 191 | 3.16 ms | 16.52 us |
| root | stage/sql/checking permissions | 45 | 1.50 ms | 33.44 us |
| root | stage/sql/Creating sort index | 1 | 239.63 us | 239.63 us |
| root | stage/sql/closing tables | 191 | 191.03 us | 1.00 us |
| root | stage/sql/starting | 2 | 188.44 us | 94.22 us |
| root | stage/sql/Sending data | 6 | 138.96 us | 23.16 us |
| root | stage/sql/statistics | 4 | 122.42 us | 30.60 us |
| root | stage/sql/query end | 191 | 56.67 us | 296.00 ns |
| root | stage/sql/preparing | 4 | 33.57 us | 8.39 us |
| root | stage/sql/freeing items | 2 | 27.93 us | 13.96 us |
| root | stage/sql/optimizing | 5 | 20.03 us | 4.01 us |
| root | stage/sql/executing | 7 | 15.39 us | 2.20 us |
| root | stage/sql/removing tmp table | 4 | 9.35 us | 2.34 us |
| root | stage/sql/init | 3 | 8.76 us | 2.92 us |
| root | stage/sql/Sorting result | 2 | 4.16 us | 2.08 us |
| root | stage/sql/end | 3 | 1.93 us | 644.00 ns |
| root | stage/sql/cleaning up | 2 | 1.43 us | 715.00 ns |
+------+--------------------------------+-------+---------------+-------------+
Tree Expansion Query Result
See Demo
Tree Expansion Query
WITH RECURSIVE ancestors (tsn, parent_tsn) AS (
SELECT h.tsn, h.parent_tsn FROM hierarchy AS h WHERE h.tsn = %s
UNION ALL
SELECT h.tsn, h.parent_tsn FROM hierarchy AS h JOIN ancestors AS base ON h.tsn = base.parent_tsn
),
breadcrumbs (tsn, parent_tsn, depth, breadcrumbs) AS (
SELECT h.tsn, h.parent_tsn, 0 AS depth, CAST(LPAD(h.tsn, 8, '0') AS CHAR(255)) AS breadcrumbs
FROM hierarchy AS h WHERE h.parent_tsn = 0
UNION ALL
SELECT h.tsn, h.parent_tsn, base.depth+1 AS depth, CONCAT(base.breadcrumbs, ',', LPAD(h.tsn, 8,
'0'))
FROM hierarchy AS h
JOIN ancestors AS a ON h.tsn = a.tsn
JOIN breadcrumbs AS base ON h.parent_tsn = base.tsn
)
SELECT l.tsn, l.completename, b.depth, b.breadcrumbs
FROM breadcrumbs AS b JOIN longnames AS l ON b.tsn = l.tsn
UNION
SELECT l.tsn, l.completename, b.depth+1, CONCAT(b.breadcrumbs, ',', LPAD(h.tsn, 8, '0'))
FROM breadcrumbs AS b
JOIN hierarchy AS h ON b.tsn = h.parent_tsn
JOIN longnames AS l ON l.tsn = h.tsn
ORDER BY breadcrumbs
Tree Expansion Query EXPLAIN
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
select_type | table | type | key | key_len | ref | rows | filtered | Extra
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 250230 | 100.00 | Using where
PRIMARY | l | eq_ref | PRIMARY | 4 | b.tsn | 1 | 100.00 | NULL
DERIVED | h | index | TSN | 9 | NULL | 500466 | 10.00 | Using where; Using index
UNION | base | ALL | NULL | NULL | NULL | 50046 | 100.00 | Recursive; Using where
UNION | <derived4> | ALL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using join buffer
UNION | h | ref | TSN | 9 | a.tsn,base.tsn | 1 | 100.00 | Using index
DERIVED | h | ref | TSN | 4 | const | 1 | 100.00 | Using index
UNION | base | ALL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where
UNION | h | ref | TSN | 4 | base.parent_tsn | 1 | 100.00 | Using index
UNION | h | index | TSN | 9 | NULL | 500466 | 100.00 | Using where; Using index
UNION | l | eq_ref | PRIMARY | 4 | itis.h.TSN | 1 | 100.00 | NULL
UNION | <derived2> | ref | <auto_key0> | 5 | itis.h.Parent_TSN | 10 | 100.00 | NULL
| UNION RESULT | <union1,8> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort
--------------+------------+--------+-------------+---------+-------------------+--------+----------+--------------------------------
Maybe I need more indexes?
Unfortunately I ran out of time to analyze.
Tree Expansion Query Performance
mysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLESG
query: WITH RECURSIVE `ancestors` ( ` ... `l`
. `completename` , `b` .
db: itis
exec_count: 1
total_latency: 1.24 s
memory_tmp_tables: 3
disk_tmp_tables: 0
avg_tmp_tables_per_query: 3
tmp_tables_to_disk_pct: 0
first_seen: 2017-04-27 01:33:14
last_seen: 2017-04-27 01:33:14
digest: 86c1417d2ff3679863db754eff425e94
Tree Expansion Query Stages
mysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES;
+------+--------------------------------+-------+---------------+-------------+
| user | event_name | total | total_latency | avg_latency |
+------+--------------------------------+-------+---------------+-------------+
| root | stage/sql/Sending data | 12 | 979.42 ms | 81.62 ms |
| root | stage/sql/System lock | 40 | 6.34 ms | 158.52 us |
| root | stage/sql/Opening tables | 191 | 3.34 ms | 17.51 us |
| root | stage/sql/checking permissions | 53 | 1.35 ms | 25.45 us |
| root | stage/sql/starting | 2 | 356.31 us | 178.16 us |
| root | stage/sql/statistics | 12 | 271.01 us | 22.58 us |
| root | stage/sql/closing tables | 191 | 179.15 us | 937.00 ns |
| root | stage/sql/preparing | 12 | 98.18 us | 8.18 us |
| root | stage/sql/query end | 191 | 57.60 us | 301.00 ns |
| root | stage/sql/freeing items | 2 | 47.93 us | 23.96 us |
| root | stage/sql/Creating sort index | 1 | 37.38 us | 37.38 us |
| root | stage/sql/optimizing | 13 | 30.60 us | 2.35 us |
| root | stage/sql/executing | 13 | 30.27 us | 2.33 us |
| root | stage/sql/removing tmp table | 14 | 24.44 us | 1.74 us |
| root | stage/sql/init | 3 | 14.78 us | 4.93 us |
| root | stage/sql/cleaning up | 2 | 11.66 us | 5.83 us |
| root | stage/sql/Sorting result | 2 | 3.67 us | 1.84 us |
| root | stage/sql/end | 3 | 3.04 us | 1.01 us |
+------+--------------------------------+-------+---------------+-------------+
Conclusions
Conclusions
§Overall, MySQL 8 support for recursive CTE queries is
worth the wait.
§Exotic cases exist that are beyond any optimizer.
§I'm excited to upgrade to MySQL 8.0.x ASAP!
§Now that virtually all major SQL brands support
recursive CTE's, we need developer tools and popular
apps to use them!
License and Copyright
Copyright 2017 Bill Karwin
http://www.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License:
http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share—to copy, distribute,
and transmit this work, under the following conditions:
Attribution.
You must attribute this
work to Bill Karwin.
Noncommercial.
You may not use this
work for commercial
purposes.
No Derivative Works.
You may not alter,
transform, or build
upon this work.

Recursive Query Throwdown

  • 1.
    Recursive Query Throwdown inMySQL 8 BILL KARWIN PERCONA LIVE OPEN SOURCE DATABASE CONFERENCE 2017
  • 2.
    Bill Karwin Software developer,consultant, trainer Using MySQL since 2000 Senior Database Architect at SchoolMessenger Author of SQL Antipatterns: Avoiding the Pitfalls of Database Programming Oracle ACE Director
  • 3.
    How to Querya Tree? Hierarchical data § Organization charts § Categories and sub-categories § Parts explosion § Threaded discussions https://commons.wikimedia.org/wiki/File:Staff_Organisation_Diagram,_1896.jpg
  • 4.
  • 5.
    Adjacency List ExampleData comment_id parent_id author comment 1 NULL Fran What’s the cause of this bug? 2 1 Ollie I think it’s a null pointer. 3 2 Fran No, I checked for that. 4 1 Kukla We need to check valid input. 5 4 Ollie Yes, that’s a bug. 6 4 Fran Yes, please add a check 7 6 Kukla That fixed it.
  • 6.
    Can’t Easily QueryDeep Trees SELECT * FROM Comments c1 LEFT JOIN Comments c2 ON (c2.parent_id = c1.comment_id) LEFT JOIN Comments c3 ON (c3.parent_id = c2.comment_id) LEFT JOIN Comments c4 ON (c4.parent_id = c3.comment_id) LEFT JOIN Comments c5 ON (c5.parent_id = c4.comment_id) LEFT JOIN Comments c6 ON (c6.parent_id = c5.comment_id) LEFT JOIN Comments c7 ON (c7.parent_id = c6.comment_id) LEFT JOIN Comments c8 ON (c8.parent_id = c7.comment_id) LEFT JOIN Comments c9 ON (c9.parent_id = c8.comment_id) LEFT JOIN Comments c10 ON (c10.parent_id = c9.comment_id) ...
  • 7.
  • 8.
    MySQL Workarounds MySQL lackedsupport for recursive queries, so workarounds were needed These are all denormalized designs, most don’t have referential integrity §Path enumeration §Nested sets §Closure table
  • 9.
    Path Enumeration ExampleData comment_id path author comment 1 1/ Fran What’s the cause of this bug? 2 1/2/ Ollie I think it’s a null pointer. 3 1/2/3/ Fran No, I checked for that. 4 1/4/ Kukla We need to check valid input. 5 1/4/5/ Ollie Yes, that’s a bug. 6 1/4/6/ Fran Yes, please add a check 7 1/4/6/7/ Kukla That fixed it.
  • 10.
    Path Enumeration ExampleQueries Query ancestors of comment #7: SELECT * FROM Comments WHERE '1/4/6/7/' LIKE CONCAT(path, '%'); Query descendants of comment #4: SELECT * FROM Comments WHERE path LIKE '1/4/%';
  • 11.
    Path Enumeration Prosand Cons Pros: §Single non-recursive query to get a tree or a subtree Cons: §Complex updates to add or remove a node §Numbers are stored in a string—no referential integrity
  • 12.
    Nested Sets Each commentencodes its descendants using two numbers: § A comment’s left number is less than all numbers used by the comment’s descendants. § A comment’s right number is greater than all numbers used by the comment’s descendants. § A comment’s numbers are between all numbers used by the comment’s ancestors. References: § “Recursive Hierarchies: The Relational Taboo!” Michael J. Kamfonas, Relational Journal, Oct/Nov 1992 § “Trees and Hierarchies in SQL For Smarties,” Joe Celko, 2004 § “Managing Hierarchical Data in MySQL,” Mike Hillyer, 2005
  • 13.
  • 14.
    Nested Sets ExampleData comment_id nsleft nsright author comment 1 1 14 Fran What’s the cause of this bug? 2 2 5 Ollie I think it’s a null pointer. 3 3 4 Fran No, I checked for that. 4 6 13 Kukla We need to check valid input. 5 7 8 Ollie Yes, that’s a bug. 6 9 12 Fran Yes, please add a check 7 10 11 Kukla That fixed it.
  • 15.
    Nested Sets ExampleQueries Query ancestors of comment #7: SELECT ancestor.* FROM Comments child JOIN Comments ancestor ON child.nsleft BETWEEN ancestor.nsleft AND ancestor.nsright WHERE child.comment_id = 7; Query subtree under comment #4: SELECT descendant.* FROM Comments parent JOIN Comments descendant ON descendant.nsleft BETWEEN parent.nsleft AND parent.nsright WHERE parent.comment_id = 4;
  • 16.
    Nested Sets Prosand Cons Pros: §Single non-recursive query to get a tree or a subtree Cons: §Complex updates to add or remove a node §Numbers are not foreign keys—no referential integrity
  • 17.
    Closure Table Many-to-many table Storesevery path from each node to each of its descendants A node even connects to itself CREATE TABLE Closure ( ancestor INT NOT NULL, descendant INT NOT NULL, length INT NOT NULL, PRIMARY KEY (ancestor, descendant), FOREIGN KEY(ancestor) REFERENCES Comments(comment_id), FOREIGN KEY(descendant) REFERENCES Comments(comment_id) );
  • 18.
  • 19.
    Closure Table ExampleData comment_id author comment 1 Fran What’s the cause of this bug? 2 Ollie I think it’s a null pointer. 3 Fran No, I checked for that. 4 Kukla We need to check valid input. 5 Ollie Yes, that’s a bug. 6 Fran Yes, please add a check 7 Kukla That fixed it. ancestor descendant length 1 1 0 1 2 1 1 3 2 1 4 1 1 5 2 1 6 2 1 7 3 2 2 0 2 3 1 3 3 0 4 4 0 4 5 1 4 6 1 4 7 2 5 5 0 6 6 0 6 7 1 7 7 0
  • 20.
    Closure Table ExampleQueries Query ancestors of comment #7: SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.ancestor) WHERE t.descendant = 7; Query subtree under comment #4: SELECT c.* FROM Comments c JOIN Closure t ON (c.comment_id = t.descendant) WHERE t.ancestor = 4;
  • 21.
    Closure Table Prosand Cons Pros: §Single non-recursive query to get a tree or a subtree §Referential integrity! Cons: §Extra table is required §Hierarchy is stored redundantly, too easy to mess up §Lots of joins to do most kinds of queries
  • 22.
  • 23.
    WITHer Recursive Queriesin MySQL? SQL vendors gradually implemented SQL-99 WITH syntax: § IBM DB2 UDB 8 (Dec. 2002) § Microsoft SQL Server 2005 (Oct. 2005) § Sybase SQL Anywhere 11 (Aug. 2008) § Firebird 2.1 (Sep. 2008) § PostgreSQL 8.4 (Jul. 2009) § Oracle 11g release 2 (Sep. 2009) § Teradata (date and version of support unknown, at least 2009) § HSQLDB 2.3 (Jul. 2013) § SQLite 3.8.3.1 (Feb. 2014) § H2 (date and version unknown) https://www.percona.com/blog/2014/02/11/wither-recursive-queries/
  • 24.
    ANSI SQL RecursiveCommon Table Expression WITH RECURSIVE cte_name (col_name, col_name, col_name) AS ( subquery base case UNION ALL subquery referencing cte_name ) SELECT ... FROM cte_name ... https://dev.mysql.com/doc/refman/8.0/en/with.html
  • 25.
    Generating a Seriesof Numbers WITH RECURSIVE MySeries (n) AS ( SELECT 1 AS n UNION ALL SELECT 1+n FROM MySeries WHERE n < 10 ) SELECT * FROM MySeries; +------+ | n | +------+ | 1 | | 2 | | 3 | | 4 | | 5 | | 6 | | 7 | | 8 | | 9 | | 10 | +------+
  • 26.
    Generating a Seriesof Dates WITH RECURSIVE MyDates (d) AS ( SELECT CURRENT_DATE() AS d UNION ALL SELECT d + INTERVAL 1 DAY FROM MyDates WHERE d < CURRENT_DATE() + INTERVAL 7 DAY ) SELECT * FROM MyDates; +------------+ | d | +------------+ | 2017-04-24 | | 2017-04-25 | | 2017-04-26 | | 2017-04-27 | | 2017-04-28 | | 2017-04-29 | | 2017-04-30 | | 2017-05-01 | +------------+
  • 27.
    Query ancestors ofcomment #7 WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment, depth) AS ( SELECT comment_id, parent_id, author, comment, 0 AS depth FROM Comments WHERE comment_id = 7 UNION ALL SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1 FROM CommentTree ct JOIN Comments c ON (ct.parent_id = c.comment_id) ) SELECT * FROM CommentTree;
  • 29.
    Query subtree undercomment #4 WITH RECURSIVE CommentTree (comment_id, parent_id, author, comment, depth) AS ( SELECT comment_id, parent_id, author, comment, 0 AS depth FROM Comments WHERE comment_id = 4 UNION ALL SELECT c.comment_id, c.parent_id, c.author, c.comment, ct.depth+1 FROM CommentTree ct JOIN Comments c ON (ct.comment_id = c.parent_id) ) SELECT * FROM CommentTree;
  • 30.
    Recursive CTE Prosand Cons Pros: § ANSI SQL-99 Standard § Compatible with other SQL implementations § Works with Adjacency List (single source of authority) § Referential integrity! Cons: § Not compatible with earlier MySQL versions § Use of materialized temporary tables may cause performance problems
  • 31.
    MySQL CTE Implementation:💯 Thanks to @MarkusWinand for his preview analysis based on 8.0.1-dmr http://modern-sql.com/feature/with
  • 32.
  • 33.
    ITIS: Sample HierarchicalData Integrated Taxonomic Information System (https://www.itis.gov/) §Biological database of species of animals, plants, fungi §One big tree of 544,954 nodes §Data comes in adjacency list & path enumeration format §I converted to closure table for query tests
  • 34.
    ITIS Data Model mysql>select * from longnames where completename = 'Eschscholzia californica'; +--------+---------------------------+ | tsn | completename | +--------+---------------------------+ | 18956 | Eschscholzia californica | +--------+---------------------------+ mysql> select * from hierarchy where TSN = '18956'G TSN: 18956 Parent_TSN: 18954 level: 11 ChildrenCount: 8 hierarchy_string: 202422-954898-846494-954900-846496-846504-18063-846547-18409-18880-18954-18956
  • 35.
    Indexes mysql> ALTER TABLEhierarchy ADD KEY (tsn, parent_tsn); Query OK, 0 rows affected (1.30 sec)
  • 36.
    Breadcrumbs Query WITH RECURSIVEtaxonomy AS ( SELECT base.tsn, base.parent_tsn, 0 as depth FROM hierarchy base WHERE tsn = '18956' UNION ALL SELECT next.tsn, next.parent_tsn, t.depth+1 FROM hierarchy next JOIN taxonomy t WHERE t.parent_tsn = next.tsn ) SELECT * FROM taxonomy JOIN longnames USING (tsn) ORDER BY depth DESC;
  • 37.
    Breadcrumbs Query Result +--------+------------+-------+--------------------------+ |tsn | parent_tsn | depth | completename | +--------+------------+-------+--------------------------+ | 202422 | 0 | 11 | Plantae | | 954898 | 202422 | 10 | Viridiplantae | | 846494 | 954898 | 9 | Streptophyta | | 954900 | 846494 | 8 | Embryophyta | | 846496 | 954900 | 7 | Tracheophyta | | 846504 | 846496 | 6 | Spermatophytina | | 18063 | 846504 | 5 | Magnoliopsida | | 846547 | 18063 | 4 | Ranunculanae | | 18409 | 846547 | 3 | Ranunculales | | 18880 | 18409 | 2 | Papaveraceae | | 18954 | 18880 | 1 | Eschscholzia | | 18956 | 18954 | 0 | Eschscholzia californica | +--------+------------+-------+--------------------------+ 12 rows in set (0.00 sec)
  • 38.
    Breadcrumbs Query EXPLAINPlan §New note in Extra: "Recursive" §Using index (covering index) for both base case and recursive case §I can eliminate the filesort if I allow natural order (base case first) §No "Using Temporary"? Not so fast… +----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using filesort | | 1 | PRIMARY | longnames | eq_ref | PRIMARY,tsn | PRIMARY | 4 | taxonomy.tsn | 1 | 100.00 | NULL | | 2 | DERIVED | base | ref | TSN | TSN | 4 | const | 1 | 100.00 | Using index | | 3 | UNION | t | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where | | 3 | UNION | next | ref | TSN | TSN | 4 | t.parent_tsn | 1 | 100.00 | Using index | +----+-------------+------------+--------+---------------+---------+---------+--------------+------+----------+-----------------------------+
  • 39.
    Breadcrumbs Query Performance mysql>SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLESG query: WITH RECURSIVE `taxonomy` AS ( ... `tsn` ) ORDER BY `depth` DESC db: itis exec_count: 1 total_latency: 10.05 ms memory_tmp_tables: 1 disk_tmp_tables: 0 avg_tmp_tables_per_query: 1 tmp_tables_to_disk_pct: 0 first_seen: 2017-04-24 22:07:56 last_seen: 2017-04-24 22:07:56 digest: 8438633360bedce178823bb868589fd0
  • 40.
    Breadcrumbs Query Stages mysql>SELECT * FROM SYS.USER_SUMMARY_BY_STAGES; +------+--------------------------------+-------+---------------+-------------+ | user | event_name | total | total_latency | avg_latency | +------+--------------------------------+-------+---------------+-------------+ | root | stage/sql/System lock | 40 | 6.62 ms | 165.60 us | | root | stage/sql/Opening tables | 191 | 3.16 ms | 16.52 us | | root | stage/sql/checking permissions | 45 | 1.50 ms | 33.44 us | | root | stage/sql/Creating sort index | 1 | 239.63 us | 239.63 us | | root | stage/sql/closing tables | 191 | 191.03 us | 1.00 us | | root | stage/sql/starting | 2 | 188.44 us | 94.22 us | | root | stage/sql/Sending data | 6 | 138.96 us | 23.16 us | | root | stage/sql/statistics | 4 | 122.42 us | 30.60 us | | root | stage/sql/query end | 191 | 56.67 us | 296.00 ns | | root | stage/sql/preparing | 4 | 33.57 us | 8.39 us | | root | stage/sql/freeing items | 2 | 27.93 us | 13.96 us | | root | stage/sql/optimizing | 5 | 20.03 us | 4.01 us | | root | stage/sql/executing | 7 | 15.39 us | 2.20 us | | root | stage/sql/removing tmp table | 4 | 9.35 us | 2.34 us | | root | stage/sql/init | 3 | 8.76 us | 2.92 us | | root | stage/sql/Sorting result | 2 | 4.16 us | 2.08 us | | root | stage/sql/end | 3 | 1.93 us | 644.00 ns | | root | stage/sql/cleaning up | 2 | 1.43 us | 715.00 ns | +------+--------------------------------+-------+---------------+-------------+
  • 41.
    Tree Expansion QueryResult See Demo
  • 42.
    Tree Expansion Query WITHRECURSIVE ancestors (tsn, parent_tsn) AS ( SELECT h.tsn, h.parent_tsn FROM hierarchy AS h WHERE h.tsn = %s UNION ALL SELECT h.tsn, h.parent_tsn FROM hierarchy AS h JOIN ancestors AS base ON h.tsn = base.parent_tsn ), breadcrumbs (tsn, parent_tsn, depth, breadcrumbs) AS ( SELECT h.tsn, h.parent_tsn, 0 AS depth, CAST(LPAD(h.tsn, 8, '0') AS CHAR(255)) AS breadcrumbs FROM hierarchy AS h WHERE h.parent_tsn = 0 UNION ALL SELECT h.tsn, h.parent_tsn, base.depth+1 AS depth, CONCAT(base.breadcrumbs, ',', LPAD(h.tsn, 8, '0')) FROM hierarchy AS h JOIN ancestors AS a ON h.tsn = a.tsn JOIN breadcrumbs AS base ON h.parent_tsn = base.tsn ) SELECT l.tsn, l.completename, b.depth, b.breadcrumbs FROM breadcrumbs AS b JOIN longnames AS l ON b.tsn = l.tsn UNION SELECT l.tsn, l.completename, b.depth+1, CONCAT(b.breadcrumbs, ',', LPAD(h.tsn, 8, '0')) FROM breadcrumbs AS b JOIN hierarchy AS h ON b.tsn = h.parent_tsn JOIN longnames AS l ON l.tsn = h.tsn ORDER BY breadcrumbs
  • 43.
    Tree Expansion QueryEXPLAIN --------------+------------+--------+-------------+---------+-------------------+--------+----------+-------------------------------- select_type | table | type | key | key_len | ref | rows | filtered | Extra --------------+------------+--------+-------------+---------+-------------------+--------+----------+-------------------------------- PRIMARY | <derived2> | ALL | NULL | NULL | NULL | 250230 | 100.00 | Using where PRIMARY | l | eq_ref | PRIMARY | 4 | b.tsn | 1 | 100.00 | NULL DERIVED | h | index | TSN | 9 | NULL | 500466 | 10.00 | Using where; Using index UNION | base | ALL | NULL | NULL | NULL | 50046 | 100.00 | Recursive; Using where UNION | <derived4> | ALL | NULL | NULL | NULL | 4 | 100.00 | Using where; Using join buffer UNION | h | ref | TSN | 9 | a.tsn,base.tsn | 1 | 100.00 | Using index DERIVED | h | ref | TSN | 4 | const | 1 | 100.00 | Using index UNION | base | ALL | NULL | NULL | NULL | 2 | 100.00 | Recursive; Using where UNION | h | ref | TSN | 4 | base.parent_tsn | 1 | 100.00 | Using index UNION | h | index | TSN | 9 | NULL | 500466 | 100.00 | Using where; Using index UNION | l | eq_ref | PRIMARY | 4 | itis.h.TSN | 1 | 100.00 | NULL UNION | <derived2> | ref | <auto_key0> | 5 | itis.h.Parent_TSN | 10 | 100.00 | NULL | UNION RESULT | <union1,8> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary; Using filesort --------------+------------+--------+-------------+---------+-------------------+--------+----------+-------------------------------- Maybe I need more indexes? Unfortunately I ran out of time to analyze.
  • 44.
    Tree Expansion QueryPerformance mysql> SELECT * FROM SYS.STATEMENTS_WITH_TEMP_TABLESG query: WITH RECURSIVE `ancestors` ( ` ... `l` . `completename` , `b` . db: itis exec_count: 1 total_latency: 1.24 s memory_tmp_tables: 3 disk_tmp_tables: 0 avg_tmp_tables_per_query: 3 tmp_tables_to_disk_pct: 0 first_seen: 2017-04-27 01:33:14 last_seen: 2017-04-27 01:33:14 digest: 86c1417d2ff3679863db754eff425e94
  • 45.
    Tree Expansion QueryStages mysql> SELECT * FROM SYS.USER_SUMMARY_BY_STAGES; +------+--------------------------------+-------+---------------+-------------+ | user | event_name | total | total_latency | avg_latency | +------+--------------------------------+-------+---------------+-------------+ | root | stage/sql/Sending data | 12 | 979.42 ms | 81.62 ms | | root | stage/sql/System lock | 40 | 6.34 ms | 158.52 us | | root | stage/sql/Opening tables | 191 | 3.34 ms | 17.51 us | | root | stage/sql/checking permissions | 53 | 1.35 ms | 25.45 us | | root | stage/sql/starting | 2 | 356.31 us | 178.16 us | | root | stage/sql/statistics | 12 | 271.01 us | 22.58 us | | root | stage/sql/closing tables | 191 | 179.15 us | 937.00 ns | | root | stage/sql/preparing | 12 | 98.18 us | 8.18 us | | root | stage/sql/query end | 191 | 57.60 us | 301.00 ns | | root | stage/sql/freeing items | 2 | 47.93 us | 23.96 us | | root | stage/sql/Creating sort index | 1 | 37.38 us | 37.38 us | | root | stage/sql/optimizing | 13 | 30.60 us | 2.35 us | | root | stage/sql/executing | 13 | 30.27 us | 2.33 us | | root | stage/sql/removing tmp table | 14 | 24.44 us | 1.74 us | | root | stage/sql/init | 3 | 14.78 us | 4.93 us | | root | stage/sql/cleaning up | 2 | 11.66 us | 5.83 us | | root | stage/sql/Sorting result | 2 | 3.67 us | 1.84 us | | root | stage/sql/end | 3 | 3.04 us | 1.01 us | +------+--------------------------------+-------+---------------+-------------+
  • 46.
  • 47.
    Conclusions §Overall, MySQL 8support for recursive CTE queries is worth the wait. §Exotic cases exist that are beyond any optimizer. §I'm excited to upgrade to MySQL 8.0.x ASAP! §Now that virtually all major SQL brands support recursive CTE's, we need developer tools and popular apps to use them!
  • 48.
    License and Copyright Copyright2017 Bill Karwin http://www.slideshare.net/billkarwin Released under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/ You are free to share—to copy, distribute, and transmit this work, under the following conditions: Attribution. You must attribute this work to Bill Karwin. Noncommercial. You may not use this work for commercial purposes. No Derivative Works. You may not alter, transform, or build upon this work.