SQL Outer Joins for Fun and Profit

10,929 views

Published on

Many questions on database newsgroups and forums can be answered with uses of outer joins. Outer joins are part of the standard SQL language and supported by all RDBMS brands. Many programmers are expected to use SQL in their work, but few know how to use outer joins effectively.
Learn to use this powerful feature of SQL, increase your employability, and amaze your friends!

Karwin will explain outer joins, show examples, and demonstrate a Sudoku puzzle solver implemented in a single SQL query.

Published in: Technology, Business

SQL Outer Joins for Fun and Profit

  1. 1. SQL Outer Joins for Fun and Profit Bill Karwin Proprietor/Chief Architect bill@karwin.com www.karwin.com
  2. 2. Introduction n  n  n  Overview of SQL joins: inner and outer Applications of outer joins Solving Sudoku puzzles with outer joins 2006-07-27 OSCON 2006 2
  3. 3. Joins in SQL n  Joins: The SQL way to express relations between data in tables n  Form a new row in the result set, from matching rows in each joined table n  As fundamental to using a relational database as a loop is in other programming languages n  2006-07-27 OSCON 2006 3
  4. 4. Inner joins refresher n  ANSI SQL-89 syntax: SELECT ... FROM products p, orders o WHERE p.product_id = o.product_id; n  ANSI SQL-92 syntax: SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 4
  5. 5. Inner join example Products Orders product_id product_id order_id Abc Abc 10 Def Abc 11 Efg Def 9 2006-07-27 OSCON 2006 5
  6. 6. Inner join example Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 6
  7. 7. Outer joins n  n  n  Returns all rows in one table, but only matching rows in joined table. Returns NULL where no row matches. Not supported in SQL-89 SQL-92 syntax: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 7
  8. 8. Types of outer joins n  LEFT OUTER JOIN Returns all rows from table on left. Returns NULLs in columns of right table where no row matches n  RIGHT OUTER JOIN Returns all rows from table on right. Returns NULLs in columns of left table where no row matches. n  FULL OUTER JOIN Returns all rows from both tables. Returns NULLs in columns of each, where no row matches. 2006-07-27 OSCON 2006 8
  9. 9. Support for OUTER JOIN Open-source RDBMS products: Hypersonic HSQLDB PostgreSQL LEFT OUTER JOIN ü ü ü ü ü ü ü RIGHT OUTER JOIN ü ü ü ü ü ü ü ü ü ü 2006-07-27 SQLite Ingres R3 MySQL FULL OUTER JOIN Firebird Apache Derby OSCON 2006 ü 9
  10. 10. Outer join example Products Orders product_id product_id order_id Abc Abc 10 Def Abc 11 Efg Def 9 NULL 2006-07-27 OSCON 2006 NULL 10
  11. 11. Outer join example Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id; 2006-07-27 OSCON 2006 11
  12. 12. So what? n  n  n  Difference seems trivial and uninteresting SQL works with sets and relations Operations on sets combine in powerful ways (just like operations on numbers, strings, or booleans) INNER JOIN 2006-07-27 LEFT OUTER JOIN RIGHT OUTER JOIN OSCON 2006 FULL OUTER JOIN 12
  13. 13. Solutions using outer joins n  n  n  n  Extra join conditions Subtotals per day Localization Mimic n  n  (entity-attribute-value) n  NOT IN (subquery) n  Top three per group Finding attributes in EAV tables Sudoku puzzle solver Greatest row per group 2006-07-27 OSCON 2006 13
  14. 14. Extra join conditions n  n  Problem: match only with orders created this year. Put extra conditions on the outer table into the ON clause. This applies the conditions before the join: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01'; 2006-07-27 OSCON 2006 14
  15. 15. Extra join conditions Products Orders product_id product_id order_id date Abc Abc 10 2006/2/1 Def Abc 11 2006/3/10 Efg Def 9 2005/5/2 NULL 2006-07-27 OSCON 2006 NULL NULL 15
  16. 16. Extra join conditions Query result set product_id Product attributes order_id Order attributes Abc $10.00 10 2006/2/1 Abc $10.00 11 2006/3/10 Def $5.00 NULL NULL Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01'; 2006-07-27 OSCON 2006 16
  17. 17. Subtotals per day n  n  Problem: show all days, and the subtotal of orders per day even when there are zero. Requires an additional table containing all dates in the desired range. SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date; 2006-07-27 OSCON 2006 17
  18. 18. Subtotals per day Days Orders date date order_id 2005/5/2 2005/5/2 9 2006/2/1 10 2006/3/10 11 . . . . . . . . . . . . 2006/2/1 . . . . . . NULL NULL . . . . . . 2006/3/10 . . . 2006-07-27 OSCON 2006 18
  19. 19. Subtotals per day Query result set date 2005/5/2 . . . 0 . . . 0 . . . 0 . . . 0 2006/2/1 1 0 . . . 0 . . . 0 . . . 0 2006/3/10 1 . . . 2006-07-27 1 . . . SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date; COUNT() 0 OSCON 2006 19
  20. 20. Localization n  Problem: show translated messages, or in default language if translation is not available. SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en'; n  COALESCE() returns its first non-null argument. 2006-07-27 OSCON 2006 20
  21. 21. Localization messages message_id language message 123 en Thank you 123 sp Gracias 456 en Hello NULL 2006-07-27 OSCON 2006 21
  22. 22. Localization Query result set message_id message 123 Gracias 456 Hello SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en'; 2006-07-27 OSCON 2006 22
  23. 23. Mimic NOT IN subquery n  n  Problem: find rows for which there is no match. Often implemented using NOT IN (subquery): SELECT ... FROM products p WHERE p.product_id NOT IN (SELECT o.product_id FROM orders o) 2006-07-27 OSCON 2006 23
  24. 24. Mimic NOT IN subquery n  Can also be implemented using an outer join: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL; n  Useful when subqueries are not supported (e.g. MySQL 4.0) 2006-07-27 OSCON 2006 24
  25. 25. Mimic NOT IN subquery Products Orders product_id product_id order_id Abc Abc 10 Def Abc 11 Efg Def 9 NULL 2006-07-27 OSCON 2006 NULL 25
  26. 26. Mimic NOT IN subquery Query result set product_id Product attributes order_id Order attributes Efg $17.00 NULL NULL SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL; 2006-07-27 OSCON 2006 26
  27. 27. Greatest row per group n  Problem: find the row in each group with the greatest value in one column SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL; n  I.e., show the rows for which no other row exists with a greater date and the same product_id. 2006-07-27 OSCON 2006 27
  28. 28. Greatest row per group Orders o1 Products product_id order_id date product_id Abc 10 2006/2/1 Abc Abc 11 2006/3/10 Def Def 9 2005/5/2 Efg NULL Orders o2 product_id date Abc 10 2006/2/1 Abc 11 2006/3/10 Def 2006-07-27 order_id 9 2005/5/2 OSCON 2006 28
  29. 29. Greatest row per group Query result set product_id Product attributes order_id Order attributes Abc $10.00 11 2006/3/10 Def $5.00 9 2005/5/2 SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL; 2006-07-27 OSCON 2006 29
  30. 30. Top three per group n  Problem: list the largest three cities per US state. SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC; n  I.e., show the cities for which the number of cities with the same state and greater population is less than or equal to three. 2006-07-27 OSCON 2006 30
  31. 31. Top three per group Cities c2 Cities c state city_name population state city_name population CA Los Angeles 3485K CA Los Angeles 3485K CA San Diego 1110K CA San Diego 1110K CA San Jose 782K CA San Jose 782K CA San Francisco 724K CA San Francisco 724K 2006-07-27 OSCON 2006 31
  32. 32. Top three per group Query result set state city_name population CA Los Angeles 3485K CA San Diego 1110K CA San Jose 782K SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC; 2006-07-27 OSCON 2006 32
  33. 33. Fetching EAV attributes n  Entity-Attribute-Value table structure for dynamic attributes Not normalized schema design n  Lacks integrity enforcement n  Not scalable n  Nevertheless, EAV is used widely and is sometimes the only solution when attributes evolve quickly n  2006-07-27 OSCON 2006 33
  34. 34. Fetching EAV attributes Products Attributes product_id product_id attribute value Abc Abc Media DVD Def Abc Discs 2 Efg Abc Format Widescreen Abc Length 108 min. 2006-07-27 OSCON 2006 34
  35. 35. Fetching EAV attributes n  Need an outer join per attribute: SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc'; 2006-07-27 OSCON 2006 35
  36. 36. Fetching EAV attributes Query result set product_id media discs Format length Abc DVD 2 Widescreen 108 min. SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc'; 2006-07-27 OSCON 2006 36
  37. 37. Sudoku puzzles 7 2 6 3 5 1 1 4 9 7 6 3 8 5 9 1 6 4 7 2 6 2 3 1 5 3 6 9 7 8 6 4 2 5 1 2 8 6 1 7 5 9 9 7 3 1 2006-07-27 OSCON 2006 37
  38. 38. Sudoku schema CREATE TABLE one_to_nine ( value INTEGER NOT NULL ); INSERT INTO one_to_nine (value) VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9); CREATE TABLE sudoku ( column INTEGER NOT NULL, row INTEGER NOT NULL, value INTEGER NOT NULL ); INSERT INTO sudoku (column, row, value) VALUES (6,1,3), (8,1,5), (9,1,1), (1,2,1), (2,2,4), (5,2,7), (7,2,6), (2,3,8), (3,3,5), (4,3,9), (7,3,4), (9,3,2), (3,4,2), (4,4,3), (7,4,1), (9,4,7), (1,5,5), (2,5,3), (8,5,6), (1,6,9), (4,6,8), (5,6,6), (6,6,4), (8,6,2), (2,7,5), (4,7,1), (6,7,2), (8,7,8), (1,8,6), (3,8,7), (4,8,5), (8,8,9), (6,9,7), (7,9,3), (8,9,1); 2006-07-27 OSCON 2006 38
  39. 39. Showing puzzle state SELECT GROUP_CONCAT(COALESCE(s.value, '_') ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state` FROM one_to_nine AS x INNER JOIN one_to_nine AS y +-------------------+ | Puzzle_state | LEFT OUTER JOIN sudoku AS s +-------------------+ ON s.column = x.value | _ _ _ _ _ 3 _ 5 1 | AND s.row = y.value | 1 4 _ _ 7 _ 6 _ _ | | _ 8 5 9 _ _ 4 _ 2 | GROUP BY y.value; | _ _ 2 3 _ _ 1 _ 7 | | 5 3 _ _ _ _ _ 6 _ | | 9 _ _ 8 6 4 _ 2 _ | | _ 5 _ 1 _ 2 _ 8 _ | | 6 _ 7 5 _ _ _ 9 _ | | _ _ _ _ _ 7 3 1 _ | +-------------------+ 2006-07-27 OSCON 2006 39
  40. 40. Revealing possible values Cartesian product: loop x over 1..9 columns, SELECT x_loop.value AS x, y_loop.value AS y, GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities 1..9 rows, loop y over FROM (one_to_nine AS x_loop loop cell over 1..9 values INNER JOIN one_to_nine AS y_loop Is there any value already INNER JOIN one_to_nine AS cell) in the cell x, y ? LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value Does the value appear in AND occupied.row = y_loop.value) column x ? LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value Does the value appear AND num_in_col.value = cell.value) Does the value appear in row y ? LEFT OUTER JOIN sudoku AS num_in_row in the sub-square ON (num_in_row.row = y_loop.value containing x, y ? AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, Select for cases num_in_row.value, num_in_box.value) IS NULL where all four GROUP BY x_loop.value, y_loop.value outer joins find no matches 2006-07-27 OSCON 2006 40
  41. 41. Revealing singleton values SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value Limit the groups only to AND num_in_row.value = cell.value) those with one value LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) remaining AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1; 2006-07-27 OSCON 2006 41
  42. 42. Updating the puzzle INSERT INTO sudoku (column, row, value) SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop Insert these singletons back INNER JOIN one_to_nine AS cell) into the table, LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value then we can try again AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1; 2006-07-27 OSCON 2006 42
  43. 43. Finish n  Outer joins are an indispensable part of SQL programming. Thank you! 2006-07-27 OSCON 2006 43

×