SQL Joins


Published on

An explanation of the difference between inner and outer joins that should make more sense than the average "Learn SQL" book does.

Published in: Technology
1 Comment
  • good doc for interview
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

SQL Joins

  1. 1. SQL Joins Paul W. Harkins 2009-02-01
  2. 2. Purpose One of the most common interview questions for IT jobs is “can you explain the difference between an inner join and an outer join?” This question is common for a very simple reason: understanding the difference between an inner join and an outer join and which one to use for a particular purpose is the key to writing complex SQL queries necessary to capture and analyze the complex data used in most large applications and many small applications, so this knowledge very clearly separates the employees who need no supervision from the employees who will be constantly asking for assistance. The purpose of this document is not just to give you a cheat sheet so you can get through an interview sounding like you actually know SQL, but rather to give you an understanding of joins that will allow you to excel in your current position even if you are not selected for a new one. That said, this is not an SQL training course. This guide is intended to fill a gap in popular SQL learning materials and courses. Nearly all instructional materials and instructors will at least attempt to explain the difference between an inner and an outer join, but few do so in a way that students can understand if they do not already know the difference. My intention is to explain the difference in a way that the rest of us can understand. To this end, I will avoid “big words” and discussion of the underlying mathematical theory where possible. Assumptions Throughout this document we will assume the following: • You already understand SQL well enough to write single-table queries. • You are at least vaguely familiar with the purpose of the WHERE clause. • You have access to a relational database that supports SQL queries. • There is already data in your database. • You already have access to a query execution tool (even if the tool is just the command line interface to the database management system) and you know how to execute queries with it. If any of the assumptions above are incorrect, the information below may not be of much value to you. Most SQL manuals are clear enough to give the average person the required knowledge, and most of the manuals also include either a database with data in it or instructions to create one. Please consult your manual and/or your database administrator for further information.
  3. 3. Sample Tables The tables in the diagram below show a common database structure. Many of the fields that would normally be in these tables are not in them because they are not needed to explain the concepts. Below the diagram is a listing of the data in each of the tables. In the structure below, a customer can have many orders. An order can have many products, and a product can be on many orders. There is one customer in the database who does not have any orders. There are also several orders that do not have products. For the purpose of this exercise, I turned off referential integrity checks, which allowed me to create orders that did not have customers. It might seem like this should never happen in a production system, but quite often an initial data load will be executed without referential integrity checks because the load can run much faster. We would all hope that all required data would be included, but sometimes decisions are made to load only part of the legacy data, and even when all data is expected to be loaded, someone has to write the SQL to verify that all data was correctly loaded, so I will consider this a valid example even though I had to disable referential integrity to create the example.
  4. 4. Inner Join An inner join will output only the results from each table where the join conditions are met. It will not output any rows from either table where the conditions are not met. In the examples below, the join condition is CUST.CUST_ID = ORD.CUST_ID. Only rows where the CUST_ID field matches in both tables will be displayed in the output. There are two basic ways to implement an inner join. Most of you are probably already familiar with one of them. 1) You can put your join conditions in the WHERE clause: SELECT * FROM CUST , ORD WHERE CUST.CUST_ID = ORD.CUST_ID ; 2) You can use INNER JOIN in the FROM clause (Note: in most relational database management systems, JOIN is equivalent to INNER JOIN): SELECT * FROM CUST INNER JOIN ORD ON CUST.CUST_ID = ORD.CUST_ID ;
  5. 5. Left Outer Join Outer joins are used to display the results from one table where there are no corresponding results in another table (where the join conditions are not met). If you want to know which of your customers have not ordered anything, you would need an outer join because if a customer has not ordered anything, there will be no entry in the ORD table to match the customer, and an inner join would not output that customer at all. Much of the confusion around outer joins is caused by the use of LEFT and RIGHT. Look at the query below: SELECT * FROM CUST LEFT OUTER JOIN ORD ON CUST.CUST_ID = ORD.CUST_ID; With the query written on a single line, the first table (CUST) is to the left of the second table (ORD). A left outer join will output all of the rows in the first (or left) table and only the rows from the second (or right) table that match the join conditions. This means that where the join conditions are not met, the output for the second (or right) table will be filled with nulls (some query tools will actually display <NULL>, and others will simply display a blank field). In the result set below, you can see that Nick has not ordered anything. If you go back and look at the result sets for the inner join examples, you will see that Nick was not displayed at all because there was no matching entry with his ID in the ORD table.
  6. 6. Right Outer Join Right outer joins are used for the same purpose that left outer joins are used for, but they are much less common. The reason they are much less common is that in most cases where one might want to use a right outer join, he or she can simply swap the order of the tables in the query and use a left outer join instead. If we use the same query we used for the left outer join example above and change it to a right outer join without changing the table order, we will see all of the orders that have customers and all of the orders that do not have customers, where the same query with a left outer join showed us all of the customers that have orders and all of the customers that do not have orders. SELECT * FROM CUST RIGHT OUTER JOIN ORD ON CUST.CUST_ID = ORD.CUST_ID; Once again, the non-matching results will be displayed as null. Notice in the output below that the null values are on the left where they were previously on the right when we used a left outer join. Since we did not specify the order to output the fields (we used SELECT *), the fields from the left table are displayed on the left and the fields from the right table are displayed on the right.
  7. 7. Full Outer Join Since I do not currently have access to a DB2 database that I can create the sample tables in and MySQL does not support full outer joins, I will describe the concept and provide simulated output below. Full outer joins are used where we need all rows from both left and right tables but some rows in each table do not have corresponding entries in the other table. In our example, we will be looking at all customers and all orders. Where a customer matches an order, we want to display the results on one line. Where a customer does not have any orders, we want to display the customer and some null fields where the order should be. Where an order does not have a customer assigned to it, we want to display the order and some null fields where the customer should be. SELECT * FROM CUST FULL OUTER JOIN ORD ON CUST.CUST_ID = ORD.CUST_ID; Full outer joins are rare, but in any situation where it is needed, the full outer join is much less complex than the alternatives.
  8. 8. Using Multiple Joins Looking through the dataset we started with, you might have noticed that not only do we have customers without orders and orders without customers, but we also have orders with no products. The customers without orders might be explained by a legacy data conversion that did not include outdated orders or by an initial contact with a customer who has not yet decided if he or she wants to order anything at all. The orders without customers and the orders with no products on them, however, probably indicate that we have some problems with the software that created the database entries (whether that software is the user interface or a data conversion utility, the problem remains). Since solving problems is what we do best in IT (at least if we want to retain our jobs after someone asks us what we do all day), we might want to document the data that is missing so the developer can look at the source code and correct it and so we can retroactively correct the data if possible. The query below will display all of the orders with their associated customers or a null value if there is no customer and the associated products that are on the order or a null value if there are no products. SELECT CUST.CUST_NM , ORD.ORD_TS , ORD.SHIP_TS , ORD.ORD_ID , PRD.PRD_DESC FROM ORD LEFT OUTER JOIN CUST ON CUST.CUST_ID = ORD.CUST_ID LEFT OUTER JOIN ORD_PRD ON ORD.ORD_ID = ORD_PRD.ORD_ID LEFT OUTER JOIN PRD ON ORD_PRD.PRD_ID = PRD.PRD_ID ;
  9. 9. The same query, modified to only display entries where customers or products are null is actually far more useful because you do not have to sort through all of the valid data to locate the invalid data: SELECT CUST.CUST_NM , ORD.ORD_TS , ORD.SHIP_TS , ORD.ORD_ID , PRD.PRD_DESC FROM ORD LEFT OUTER JOIN CUST ON CUST.CUST_ID = ORD.CUST_ID LEFT OUTER JOIN ORD_PRD ON ORD.ORD_ID = ORD_PRD.ORD_ID LEFT OUTER JOIN PRD ON ORD_PRD.PRD_ID = PRD.PRD_ID WHERE PRD.PRD_ID IS NULL OR CUST.CUST_ID IS NULL ;
  10. 10. Exercises 1. Modify the query with multiple joins to also include customers that do not have orders. 2. Write a query to return all of the products that have never been ordered. 3. Write a query to return only orders that have neither customers nor products associated with them. 4. Start applying this knowledge at work.