Advanced SQL
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
695
On Slideshare
695
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
47
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Advanced SQL Programming Mark Holm Centerfield Technology
  • 2. Goals
    • Introduce some useful advanced SQL programming techniques
    • Show you how to let the database do more work to reduce programming effort
    • Go over some basic techniques and tips to improve performance
  • 3. Notes
    • V4R3 and higher syntax used in examples
    • Examples show only a small subset of what can be done!
  • 4. Agenda
    • Joining files - techniques, do’s and don’ts
    • Query within a query - Subqueries
    • Stacking data - Unions
    • Simplifying data with Views
    • Referential Integrity and constraints
    • Performance, performance, performance
  • 5. Joining files
    • Joins are used to relate data from different tables
    • Data can be retrieved with one “open file” rather than many
    • Concept is identical to join logical files without an associated permanent object (except if the join is done with an SQL view)
  • 6. Join types
    • Inner Join
      • Used to find related data
    • Left Outer (or simply Outer) Join
      • Used to find related data and ‘orphaned’ rows
    • Exception Join
      • Used to only find ‘orphaned’ rows
    • Cross Join
      • Join all rows to all rows
  • 7. Sample tables Employee table Department table
  • 8. Inner Join SELECT LastName, Division FROM Employee, Department WHERE Employee.Dept = Department.Dept
    • Method #1 - Using the WHERE Clause
    • Method #2 - Using the JOIN Clause
    SELECT LastName, Division FROM Employee INNER JOIN Department ON Employee.Dept = Department.Dept NOTE: This method is useful if you need to influence the order of the tables are joined in for performance reasons. Only works on releases prior to V4R4.
  • 9. Results
    • Return list of employees that are in a valid department.
    • Employee ‘Smith’ is not returned because she is not in a department listed in the ‘Department’ table
    Result table
  • 10. Left Outer Join SELECT LastName, Area FROM Employee LEFT OUTER JOIN Department ON Employee.Dept = Department.Dept
    • Must use Join Syntax
  • 11. Results
    • Return list of employees even if they are not in a valid department
    • Employee ‘Smith’ has a NULL Area because it could not be associated with a valid Dept
    Result table
  • 12. Exception Join SELECT LastName, Area FROM Employee EXCEPTION JOIN Department ON Employee.Dept = Department.Dept
    • Must use Join Syntax
  • 13. Results
    • Return list of employees only if they are NOT in a valid department
    • Employee ‘Smith’ is only one without a valid department
    Result table
  • 14. WARNING!
    • The order tables are listed in the FROM clause is important
    • For OUTER and EXCEPTION joins, the database must join the tables in that order.
    • The result may be horrible performance…more on this topic later
  • 15. Observations
    • Joins provide one way to bury application logic in the database
    • Each join type has a purpose and can be used to not only get the data you want but identify “incomplete” information
    • With some exceptions, if joined properly performance should be at least as good as an application
  • 16. Subqueries
    • Subqueries are a powerful way to select only the data you need without separate statements.
    • Example: List employees making a higher than average salary
  • 17. Subquery Example
    • SELECT FNAME, LNAME FROM EMPLOYEE
    • WHERE SALARY > ( SELECT AVG( SALARY )
    • FROM EMPLOYEE )
    SELECT FNAME, LNAME FROM EMPLOYEE WHERE SALARY > ( SELECT AVG( SALARY ) FROM EMPLOYEE WHERE LNAME = ’JONES’ )
  • 18. Subqueries - types
    • Correlated
      • Inner select refers to part of the outer (parent) select (multiple evaluations)
    • Non-Correlated
      • Inner select does not relate to outer query (one evaluation)
  • 19. Subquery Tips 1
    • Subquery optimization (2nd statement will be faster)
      • SELECT name FROM employee WHERE salary > ALL (SELECT salary FROM salscale)
      • SELECT name FROM employee WHERE salary > (SELECT max(salary) FROM salscale)
  • 20. Subquery Tips 2
    • Subquery optimization (2nd statement will be faster)
      • SELECT name FROM employee WHERE salary IN (SELECT salary FROM salscale)
      • SELECT name FROM employee WHERE EXISTS (SELECT salary FROM salscale WHERE employee.salid = salscale.salid)
  • 21. UNIONs
    • Unions provide a way to append multiple row sets files in one statement
    • Example: Process all of the orders from January and February
    SELECT * FROM JanOrders WHERE SKU = 199976 UNION SELECT * FROM FebOrders WHERE SKU = 199976
  • 22. Unions
    • Each SELECT statement that is UNIONed together must have the same number of result columns and have compatible types
    • Two forms of syntax
      • UNION ALL -- allow duplicate records
      • UNION -- return only distinct rows
  • 23. Views
    • Views provide a convenient way to permanently put SQL logic
    • Create once and use many times
    • Also make the database more understandable to users
    • Can put simple business rules into views to ensure consistency
  • 24. Views
    • Example: Make it easy for the human resources department to run a report that shows ‘new’ employees.
    CREATE VIEW HR/NEWBIES (EMPLOYEE_NAME, DEPARTMENT, HIRE_DATE) AS SELECT concat(concat(strip(last_name),','),strip(first_name)), department, hire_date FROM HR/EMPLOYEE WHERE (year(current date)-year(hire_date)) < 2
  • 25. Performance
    • SQL performance is harder to predict and tune than native I/O.
    • SQL provides a powerful way to manipulate data but you have little control over HOW it does it.
    • Query optimizer takes responsibility for doing it ‘right’.
  • 26. Performance - diagnosis
    • Getting information about how the optimizer processed a query is crucial
    • Can be done via one or all of the following:
      • STRDBG: debug messages in job log
      • STRDBMON: optimizer info put in file
      • QAQQINI: can be used to force messages
      • CHGQRYA: messages put out when time limit set to 0
  • 27. Performance tips
    • Create indexes
      • Over columns that significantly limit data in WHERE clause
      • Over columns that join tables together
      • Over columns used in ORDER BY and GROUP BY clauses
  • 28. Performance tips
    • Create Encoded Vector Indexes (EVI’s)
      • Most useful in heavy query environments with a lot of data (e.g. large data warehouses)
      • Helps queries that process between 20-60% of a table’s data
      • Create over columns with a modest number of distinct values and those with data skew
      • EVI’s bridge the gap between traditional indexes and table scans
  • 29. Performance tips
    • Encourage optimizer to use indexes
      • Use keyed columns in WHERE clause if possible
      • Use ANDed conditions as much as possible
      • OPTIMIZE FOR n ROWS
      • Don’t do things that eliminate index use
        • Data conversion (binary-key = 1.5)
        • LIKE clause w/leading wildcard (NAME LIKE ‘%JOE’)
  • 30. Performance tips
    • Keep statements simple
      • Complex statements are much more difficult to optimize
      • Provide more opportunity for the optimizer to choose a sub-optimal plan of attack
  • 31. Performance tips
    • Enable DB2 to use parallelism
      • Query processed by many tasks (CPU parallelism) or by getting data from many disks at once (I/O parallelism)
      • CPU parallelism requires IBM’s SMP feature and a machine with multiple processors
      • Enabled via the QQRYDEGREE system value, CHGQRYA, or the QAQQINI file
  • 32. Other useful features
    • CASE clause - conditional calculations
    • ALIAS - access to multi-member files
    • Primary/Foreign keys - referential integrity
    • Constraints
  • 33. CASE
    • Conditional calculations with CASE
      • SELECT Warehouse, Description,
      • CASE RegionCode
      • WHEN 'E' THEN 'East Region'
      • WHEN 'S' THEN 'South Region'
      • WHEN 'M' THEN 'Midwest Region'
      • WHEN 'W' THEN 'West Region'
      • END
      • FROM Locations
  • 34. CASE
    • Avoiding calculation errors (e.g. division by 0)
      • SELECT Warehouse, Description,
      • CASE NumInStock
      • WHEN 0 THEN NULL
      • ELSE CaseUnits/NumInStock
      • END
      • FROM Inventory
  • 35. ALIAS names
    • The CREATE ALIAS statement creates an alias on a table, view, or member of a database file.
      • CREATE ALIAS alias-name FOR table member
    • Example: Create an alias over the second member of a multi-member physical file
      • CREATE ALIAS February FOR MonthSales February
  • 36. Referential Integrity
    • Keeps two or more files in synch with each other
    • Ensures that children rows have parents
    • Can also be used to automatically delete children when parents are deleted
  • 37. Referential Integrity Rules
    • A row inserted into a child table must have a parent row (typically in another table).
    • Parent rules
      • A parent row can not be deleted if there are dependent children (Restrict rule) OR
      • All children are also deleted (Cascade rule) OR
      • All children’s foreign keys are changed (Set Null and Set Default rules)
  • 38. Parent table Child table Primary Key Foreign Key Primary key must be unique
  • 39. Referential Integrity syntax
    • ALTER TABLE Hr/Employee ADD CONSTRAINT EmpPK PRIMARY KEY (EmployeeId)
    • ALTER TABLE Hr/Department ADD CONSTRAINT EmpFK FOREIGN KEY (EmployeeId) REFERENCES Hr/Employee (EmployeeId) ON DELETE CASCADE ON UPDATE RESTRICT
  • 40. Check Constraints
    • Rules which limit the allowable values in one or more columns:
      • CREATE TABLE Employee
      • (FirstName CHAR(20),
      • LastName CHAR(30),
      • Salary CHECK (Salary>0 AND Salary<200000))
  • 41. Check Constraints
    • Effectively does data checking at the database level.
    • Data checking done with display files or application logic can now be done at the database level.
    • Ensures that it is always done and closes “back doors” like DFU, ODBC, 3-rd party utilities….
  • 42. Other resources
    • Database Design and Programming for DB2/400 - book by Paul Conte
    • SQL for Smarties - book by Joe Celko
    • SQL Tutorial - www.as400network.com
    • AS/400 DB2 web site at http://www.as400.ibm.com/db2/db2main.htm
    • Publications at http://publib.boulder.ibm.com/pubs/html/as400/
    • Our web site at http://www.centerfieldtechnology.com
  • 43. Summary
    • SQL is a powerful way to access and process data
    • Used effectively, it can reduce the time it takes to build applications
    • Once tuned, it can perform very close (and sometimes better) than HLL’s alone
  • 44. Good Luck and Happy SQLing