Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

M|18 Taking Advantage of Common Table Expressions

47 views

Published on

M|18 Taking Advantage of Common Table Expressions

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

M|18 Taking Advantage of Common Table Expressions

  1. 1. Common Table Expressions Igor Babaev | Principal Software Engineer, MariaDB Sergei Petrunia | Senior Software Engineer, MariaDB {igor,sergey}@mariadb.com
  2. 2. 2 Common Table Expressions  A standard SQL feature  Two kinds of CTEs ● Recursive ● Non-recursive  Supported by Oracle, MS SQL Server, PostgreSQL, SQLite, …  Available in MariaDB 10.2 (Stable since May, 2017)  Available in MySQL 8.0 ( RC since Sept, 2017)
  3. 3. 3 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  4. 4. 4 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  5. 5. 5 CTE name CTE Body CTE Usage with engineers as ( select * from employees where dept='Engineering' ) select * from engineers where ... WITH CTE syntax  Similar to DERIVED tables  “Query-local VIEWs”
  6. 6. 6 select * from ( select * from employees where dept='Engineering' ) as engineers where ... with engineers as ( select * from employees where dept='Engineering' ) select * from engineers where ... CTEs are like derived tables
  7. 7. 7 with engineers as ( select * from employees where dept in ('Development','Support') ), eu_engineers as ( select * from engineers where country IN ('NL',...) ) select ... from eu_engineers; Use case #1: CTEs refer to CTEs  More readable than nested FROM(SELECT …)
  8. 8. 8 with engineers as ( select * from employees where dept in ('Development','Support') ), select * from engineers E1 where not exists (select 1 from engineers E2 where E2.country=E1.country and E2.name <> E1.name); Use case #2: Multiple uses of CTE  Anti-self-join
  9. 9. 9 select * from sales_product_year CUR, sales_product_year PREV, where CUR.product=PREV.product and CUR.year=PREV.year + 1 and CUR.total_amt > PREV.total_amt with sales_product_year as ( select product, year(ship_date) as year, sum(price) as total_amt from item_sales group by product, year ) Use case #2: example 2  Year-over-year comparisons
  10. 10. 10 select * from sales_product_year S1 where total_amt > (select 0.1*sum(total_amt) from sales_product_year S2 where S2.year=S1.year) with sales_product_year as ( select product, year(ship_date) as year, sum(price) as total_amt from item_sales group by product, year ) Use case #2: example 3  Compare individuals against their group
  11. 11. 11 Conclusions so far  Non-recursive CTEs are “Query-local VIEWs”  One CTE can refer to another  Better than nested FROM (SELECT …)  Can refer to a CTE from multiple places  Better than copy-pasting FROM(SELECT …)  CTE adoption  TPC-H (1999) - no CTEs  SQL:1999 introduces CTEs  TPC-DS (2011) - 38 of 100 queries use CTEs.
  12. 12. 12 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  13. 13. 13 with engineers as ( select * from employees where dept='Engineering' or dept='Support' ) select ... from engineers, other_table, ... Base algorithm: materialize in a temporary table  Always works  Often not optimal
  14. 14. 14 with engineers as ( select * from employees where dept='Development' ) select ... from engineers E, support_cases SC where E.name=SC.assignee and SC.created='2016-09-30' and E.location='Amsterdam' select ... from employees E, support_cases SC where E.dept='Development' and E.name=SC.assignee and SC.created='2016-09-30' and E.location='Amsterdam' Optimization #1: CTE Merging  Join optimizer can pick any plan  e.g. support employee→
  15. 15. 15 Optimization #1: CTE Merging (2)  Requirement  CTE is just a JOIN : no GROUP BY, DISTINCT, etc  Output  CTE is merged into parent’s join  Optimizer can pick the best query plan  This is the same as ALGORITHM=MERGE for VIEWs
  16. 16. 16 with sales_per_year as ( select year(order.date) as year sum(order.amount) as sales from order group by year ) select * from sales_per_year where year in ('2015','2016') with sales_per_year as ( select year(order.date) as year sum(order.amount) as sales from order where year in ('2015','2016') group by year ) select * from sales_per_year Optimization #2: condition pushdown
  17. 17. 17 Condition pushdown summary  Used when merging is not possible  CTE has a GROUP BY  Makes temp. table smaller  Allows to filter out whole GROUP-BY groups  Besides CTEs, works for derived tables and VIEWs  Based on Galina Shalygina’s GSOC 2016 project:  “Pushing conditions into non-mergeable views and derived tables in MariaDB”
  18. 18. 18 with product_sales as ( select product_name, year(sale_date), count(*) as count from product_sales group by product, year) select * from product_sales P1, product_sales P2 where P1.year = 2010 AND P2.year = 2011 AND ... Optimization #3: CTE reuse The idea  Fill the CTE once  Then use multiple times  Interferes with condition pushdown
  19. 19. 19 CTE Merge Condition pushdown CTE reuse MariaDB 10.2 ✔ ✔ ✘ MS SQL Server ✔ ✔ ✘ PostgreSQL ✘ ✘ ✔ MySQL 8.0.0 ✔ ✘ ✔ CTE Optimizations summary  Merge and condition pushdown are most important  MariaDB supports them, like MS SQL.  PostgreSQL’s approach is *weird*  “CTEs are optimization barriers”  MySQL: “try merging, otherwise reuse”
  20. 20. Recursive CTEs
  21. 21. 21 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  22. 22. 22 wheel boltcapnut tire valve rimtirespokes Chicago Nashville Atlanta Orlando Recursive CTEs  SQL is poor at “recursive” data structures/algorithms  First attempt: Oracle’s CONNECT BY syntax (80’s)  Superseded by Recursive CTEs  SQL:1999, implementations in 2007-2009 ● Trees ● Graphs
  23. 23. 23 Recursive part Anchor part Recursive use of CTE “recursive” Recursive CTE syntax with recursive ancestors as ( select * from folks where name = 'Alex' union [all] select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors;
  24. 24. 24 Sister AmyAlex Mom Dad Grandpa Bill +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ Recursive CTE computation  Consider a dataset
  25. 25. 25 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | +------+--------------+--------+--------+ Result table Step #1: execute the anchor part Computation
  26. 26. 26 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | +------+--------------+--------+--------+ +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ Step #2: execute the recursive part Computation Result table
  27. 27. 27 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | +------+--------------+--------+--------+ Result table +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ Step #2: Add results the result table Computation
  28. 28. 28 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; Result table+------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | +------+--------------+--------+--------+ +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ Computation Step #3: Execute the recursive part again
  29. 29. 29 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | +------+--------------+--------+--------+ Result table +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ Computation ● Step #2: Add results the result table ● Dad and Mom are already there
  30. 30. 30 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; Result table +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | +------+--------------+--------+--------+ Computation Step #4: Execute the recursive part again
  31. 31. 31 with recursive ancestors as ( select * from folks where name = 'Alex' union select f.* from folks as f, ancestors AS a where f.id = a.father or f.id = a.mother ) select * from ancestors; Result table +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | | 98 | Sister Amy | 20 | 30 | +------+--------------+--------+--------+ +------+--------------+--------+--------+ | id | name | father | mother | +------+--------------+--------+--------+ | 100 | Alex | 20 | 30 | | 20 | Dad | 10 | NULL | | 30 | Mom | NULL | NULL | | 10 | Grandpa Bill | NULL | NULL | +------+--------------+--------+--------+ Computation ● Step #4: No [new] results ● The process finishes.
  32. 32. 32 1. Compute anchor_part 2. Compute recursive_part to get the new data 3. if (new data is non-empty) goto 2; with recursive R as ( select anchor_part union [all] select recursive_part from R, … ) select … Summary so far
  33. 33. 33 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  34. 34. 34 bus_routes +------------+------------+ | origin | dst | +------------+------------+ | New York | Boston | | Boston | New York | | New York | Washington | | Washington | Boston | | Washington | Raleigh | +------------+------------+ New York Boston Washington Raleigh Transitive closure
  35. 35. 35 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_dst, bus_routes where bus_dst.dst= bus_routes.origin ) select * from bus_dst Transitive closure New York Boston Washington Raleigh ● bus_dst is where one can be ● Start from New York (with a datatype trick)
  36. 36. 36 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_dst, bus_routes where bus_dst.dst= bus_routes.origin ) select * from bus_dst Transitive closure New York Boston Washington Raleigh ● Put into the work table +------------+ | dst | +------------+ | New York | +------------+
  37. 37. 37 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_dst, bus_routes where bus_dst.dst= bus_routes.origin ) select * from bus_dst +------------+ | dst | +------------+ | New York | +------------+ Transitive closure New York Boston Washington Raleigh ● Join bus_dst with bus_routes. ● New destinations: Boston, Washington
  38. 38. 38 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_routes, bus_dst where bus_dst.dst= bus_routes.origin ) select * from bus_dst +------------+ | dst | +------------+ | New York | | Boston | | Washington | +------------+ Transitive closure New York Boston Washington Raleigh ● Add new destinations to the temp. table
  39. 39. 39 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_routes, bus_dst where bus_dst.dst= bus_routes.origin ) select * from bus_dst +------------+ | dst | +------------+ | New York | | Boston | | Washington | +------------+ Transitive closure New York Boston Washington Raleigh ● Join bus_dst with bus_routes – Raleigh, Boston, New York
  40. 40. 40 with recursive bus_dst as ( select origin as dst from bus_routes where origin='New York' union select bus_routes.dst from bus_routes, bus_dst where bus_dst.dst= bus_routes.origin ) select * from bus_dst +------------+ | dst | +------------+ | New York | | Boston | | Washington | | Raleigh | +------------+ Transitive closure New York Boston Washington Raleigh ● Join bus_dst with bus_routes – Raleigh, Boston, New York
  41. 41. 41 Summary so far  Can compute transitive closure  UNION prevents loops. New York Boston Washington Raleigh
  42. 42. 42 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  43. 43. 43 bus_routes +------------+------------+ | origin | dst | +------------+------------+ | New York | Boston | | Boston | New York | | New York | Washington | | Washington | Boston | | Washington | Raleigh | +------------+------------+ Computing “Paths” New York Boston Washington Raleigh ● Want paths like New York Washington Raleigh→ →
  44. 44. 44 with recursive paths (cur_path, cur_dest) as ( select origin, origin from bus_routes where origin='New York' union select concat(paths.cur_path, ',', bus_routes.dest), bus_routes.dest from paths, bus_routes where paths.cur_dest= bus_routes.origin and locate(bus_routes.dest, paths.cur_path)=0 ) select * from paths Computing “Paths” New York Boston Washington Raleigh Collect a path Don’t construct loops
  45. 45. 45 select concat(paths.cur_path, ',', bus_routes.dest), bus_routes.dest from paths, bus_routes where paths.cur_dest= bus_routes.origin and locate(bus_routes.dest, paths.cur_path)=0 +-----------------------------+------------+ | cur_path | cur_dest | +-----------------------------+------------+ | New York | New York | | New York,Boston | Boston | | New York,Washington | Washington | | New York,Washington,Boston | Boston | | New York,Washington,Raleigh | Raleigh | +-----------------------------+------------+ Computing “Paths” New York Boston Washington Raleigh
  46. 46. 46 How recursion stops  Tree or Directed Acyclic Graph walking  Execution is guaranteed to stop  Computing transitive closure  Use UNION  Computing “Paths” over graph with loops  Put condition into WHERE to stop loops/growth  Safety measure: @@max_recursive_iterations  Like in SQL Server  MySQL-8.0: @@cte_max_recursion_depth
  47. 47. 47 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  48. 48. 48 with recursive R as ( select anchor_part union [all] select recursive_part from R, … ) select … [Non-]linear recursion The SQL standard requires that recursion is linear:  recursive_part must refer to R only once  No self-joins  Not from subqueries  Not from inner side of an outer join  ...
  49. 49. 49 R x a +y +b Linearity of SELECT statements
  50. 50. 50 with recursive R as ( select anchor_part union [all] select recursive_part from R, … ) select … Linear recursion  New data is generated by “wave-front” elements  Contents of R are always growing
  51. 51. 51 Plan  Non-recursive CTEs  Use cases  Optimizations  Recursive CTEs  Basics  Transitive closure  Paths  (Non-)linear recursion  Mutual recursion
  52. 52. 52 with recursive C1 as ( select … from anchor_table union select … from C2 ), C2 as ( select … from C1 ) select ... Mutual recursion  Multiple CTEs refer to each other  Useful for “bi-partite” graphs  MariaDB supports it  No other database does
  53. 53. 53 Modules and objects M1 v3 v9v9 v4  A module consumes objects and produces other objects v objects v3 v9 v4 ... m modules m1 ... m1 ... (m, v) m1,v3 ... m1,v9 module_arguments module_results (m, v) m1,v4 ...
  54. 54. 54 Modules and objects M1 M2 M3 v3 v9v9 v4 v7 v1 v6 v10  A module consumes objects and produces other objects
  55. 55. 55 Modules and objects  What objects can be produced from objects v3, v9, v7 M1 M2 M3 v3 v9v9 v4 v7 v1 v6 v10
  56. 56. 56 Query part #1: objects produced from modules with recursive reached_objects as ( select v, "init" from objects where v in ('v3','v7','v9') union select module_results.v, module_results.m from module_results, applied_modules where module_results.m = applied_modules.m ),
  57. 57. 57 Query part #2: modules ready to be applied applied_modules as ( select * from modules where 1=0 union select modules.m from modules where not exists (select * from module_arguments where module_arguments.m = modules.m and module_arguments.v not in (select v from reached_objects)) ) select * from reached_objects;
  58. 58. 58 Query result +------+------+ | v | init | +------+------+ | v3 | init | | v7 | init | | v9 | init | | v4 | m1 | | v1 | m2 | | v6 | m2 | | v10 | m3 | +------+------+
  59. 59. 59 Conclusions  MariaDB 10.2 has Common Table Expressions  Both Recursive and Non-recursive are supported  MariaDB 10.3 add EXCEPT and INTERSECT - Recursive references in EXCEPT and INTERSECT are allowed  Non-recursive  “Query-local VIEWs”  Competitive set of query optimizations  Recursive  Useful for tree/graph-walking queries  Mutual and non-linear recursions are supported.
  60. 60. Thanks! Q&A

×