Sql Basics And Advanced

  • 3,307 views
Uploaded on

TSQL Best practices

TSQL Best practices

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,307
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
202
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Revisiting SQL basics and advancedPart 1
    Presenter : M.M.Al-Faroorque (Shubho)
    http://www.linkedin.com/in/thisisshubho
  • 2. Objectives
    Developers tend to learn & enhance skill on programming
    logic/language/framework/tool more, and, put less effort
    in learning SQL.
    Because of lacking good knowledge in SQL, the data
    access operations are not written in best way in most of
    the cases. So, overall application performs slow.
    Objective of this session is to re-visit and re-explore SQL
    to enhance our SQL knowledge as best as possible so that:
    We can write better SQL
    We can tune data access of our applications
  • 3. Assumptions
    Audience have basic SQL working knowledge
    Audience have experience in database based application development
    Audience have working experience in a database server (Say, SQL server 2000 or later)
    Audience are familiar with common database objects
  • 4. Topics covered
    Stay clean, follow good conventions
    Data types.
    Re-visit the basics.
    Joining.
    SubQueries.
    Joins Versus Subqueries.
    Group By.
    Set Operations.
    Case Statements.
  • 5. Stay clean, follow good conventions
    Using a good convention allows us to write consistent and manageable code. It also increase readability and helps troubleshoot problems.
    3 basic conventions:
    Use uppercase letters for the keywords, which includes SELECT, FROM, and WHERE. Use lowercase letters for the user-supplied words
    Align the keywords SELECT, FROM, and WHERE on separate lines
    Write appropriate comments. For example :
    SELECT dname, age /* displays the dependent name and age */
    FROM Dependent d /* from the Dependent table */
    WHERE d.age > 5 /* where the age of the dependent is greater than 5 */
  • 6. Data types
    Always use the data type that will require the least storage size. Following are the SQL server 2005 data types.
    CHAR and VARCHAR
    Both non-Unicode character data types with a maximum length of 8,000. Each character occupies 1 byte.
    Storage size for CHAR is n bytes while for VARCHAR is the actual length in bytes of the data entered
    Use CHAR data type when the data values in a column are expected to be consistently the same size (Say, ProductCode)
    NCHAR and NVARCHAR
    Both are Unicode character data types with a maximum length of 4,000 characters. Each character occupies 2 bytes.
  • 7. Data types
    TINYINT, SMALLINT, INT and BIGINT
    If you simply need to store a value between 0 and 255 then you should define your column as TINYINT,not INT
  • 8. Data types
    FLOAT and REAL
    Data types for use with floating point numeric data.  Floating point data is approximate; not all values in the data type range can be precisely represented. Declared as float [(n)]
    n is the number of bits used to store the mantissa of the float number in scientific notation and thus dictates the precision and storage size.
    n must be a value from 1 through 53. The default value of n is 53. If value of n is 24, then, it is REAL.
  • 9. Data types
    NUMERIC and DECIMAL
    Numeric data types that have fixed precision and scale.
    DECIMAL (p [, s ]) NUMERIC (p [, s ]).
    They are synonymous to each other and either one can be used.
    Precision is the total number of digits stored, and scale is the maximum number of digits to the right of the decimal point.
    The storage space of NUMERIC/DECIMAL data varies according to the precision.
    Float and real follow the IEEE 754 specification on approximate numeric. data types and cannot store values accurately. So, when accuracy is important (For financial applications), NUMERIC/DECIMAL should be used.
  • 10. Data types
    DATETIME and SMALLDATETIME
    Values with the datetimedata type are stored internally
    DateTime is stored as two 4-byte integers.  The first 4 bytes store number of days before or after the base date, January 1, 1900. Rest 4 bytes store time of day in milliseconds after midnight.
    SmallDateTimevalues are stored as two 2-byte integers.  The first 2 bytes store the number of days after January 1, 1900.  The other 2 bytes store the number of minutes since midnight.
    UseSmallDateTime when you don't need to store the time of the day. It’ll save a good amount of storage space.
  • 11. Data types
    MONEY and SMALLMONEY
    Data types for representing currency values.  The differences between these 2 data types are in the minimum and maximum values each can hold .
    MONEY uses 8 bytes of storage space while SMALLMONEY uses 4 bytes of storage space.
    If the value is less than 214,748.3647 then you should use SMALLMONEY. Otherwise use MONEY.
    BOOLEAN
    In SQL Server, there's no BOOLEANdata type.  Nearest data type that can be used in place of boolean data is the BIT data type, which is basically an INTEGER data type that can accept a value of 1, 0 or NULL value only.
  • 12. Data types
    TEXT,NTEXT and IMAGE
    TEXT :Single-byte non-unicode character strings, can be used to store more than 8,000 bytes (2 GB).
    NTEXT :Double-byte unicode character strings, can be used to store more than 8,000 bytes (2 GB).
    IMAGE :Variable-length binary data from 0 byte through 2 GB
    BINARY and VERBINARY
    BINARY [ ( n ) ] is Fixed-length binary data with a length of n bytes, where n is a value from 1 through 8,000. The storage size is n bytes.
    VERBINARY [ ( n )] isVariable-length binary data. n can be a value from 1 through 8,000.
  • 13. Data types
    VARCHAR(MAX),  NVARCHAR(MAX) and VARBINARY(MAX)
    In SQL Server 2000, a row cannot exceed 8000 bytes in size. This limitation is due to the 8 KB internal page size SQL Server.
    To store more data in a single column, you needed to use the TEXT, NTEXT, or IMAGE data types (BLOBs) which are stored in a collection of 8 KB data pages .
    These are unlike the data pages that store the other data in the same table. Rather, these pages are arranged in a B-tree structure. These data cannot be used as variables in a procedure or a function and they cannot be used inside string functions such as REPLACE, CHARINDEX or SUBSTRING. In most cases, you have to use READTEXT, WRITETEXT, and UPDATETEXT
    To solve this problem, Use VARCHAR(MAX),  NVARCHAR(MAX) and VARBINARY(MAX) in SQL Server 2005. These data types can hold the same amount of data BLOBs can hold (2 GB) and they are stored in the same type of data pages used for other data types.
    When data in a MAX data type exceeds 8 KB, an over-flow page is used (In the ROW_OVERFLOW allocation unit) and a pointer to the page is left in the original data page in the IN_ROW allocation unit.
  • 14. Data types
    Other data types
    UNIQUEIDENTIFIER : also referred to as globally unique identifier (GUID) or universal unique identifier (UUID), is a 128-bit generated value that guarantees uniqueness worldwide, even among unconnected computers
    XML : A new data type that has been added to SQL Server 2005 to store/retrieve XML data. The XML data type can be typed (Must conform to an XML Schema) or, untyped. XML data type will allow you to store complete XML documents or fragments of XML documents. The data type has methods (That accepts Xquery) to store/retrieve/udpate XML data. Can store upto 2 GB of data.
    SQL_VARIANT : Values stored in a SQL_VARIANT column can be any data type except TEXT or IMAGE. Should be avoided for several reasons.
    TIMESTAMP :  Data type used to store automatically generated unique incrementing number. It has nothing to do with preserving a date or a time. This is useful to determine whether any row has changed since the last read - if there's any change in the row - timestamp gets updated automatically.
  • 15. Re-visit the basics.
    Query processing order :
    According to ANSI standard the query should be written in the following way:
    SELECT [...]
    FROM [T1]
    JOIN [T2]
    ON [condition]
    WHERE [...]
    GROUP BY [...]
    HAVING [...]
    ORDER BY [...]
    However the query will be processed by MSSQL in the following order like so:
    FROM [T1]
    ON [condition]
    JOIN [T2]
    WHERE [...]
    GROUP BY [...]
    HAVING [...]
    SELECT [...]
    ORDER BY [...]
    Notes : ORDER BY is the last to be processed, only TOP clause would be processed after ORDER BY. DISTINCT will be processed after SELECT and before ORDER BY
  • 16. Re-visit the basics
    Obtaining record count
    count(*) is used to get the number of total records in the table. But, use count(primary_key_field) for faster result.
    To count the number of values (not null) in a column, use count(field_name).
    Limiting number of records in query :Two ways to do that
    SET ROWCOUNT:
    - takes a variable as well as a constant,- affects ALL future queries, until another SET ROWCOUNT is executed,TOP:
    - takes only a constant,- affects only the current query,
  • 17. Re-visit the basics
    Aliases
    Aliases are used to improve the readability of a query and its output
    Column Aliases :
    --SELECT dname 'Dependent Name', age 'Dependent Age', FROM Dependent
    --SELECT dname AS [Dependent Name], age AS [Dependent Age], FROM Dependent
    --SELECT dname AS "Dependent Name", age AS "Dependent Age", FROM Dependent
    --SELECT [Dependent Name] = dname, [Dependent Age] = age, FROM Dependent
    Table Aliases:
    --SELECT * FROM Dependent d WHERE d.age > 5
    --SELECT v.column1, v.column2, ... FROM (SELECT column1, column2, ... FROM Table WHERE ...) v
  • 18. Re-visit the basics
    Synonyms
    SQL Server 2005 allows to create synonyms for tables. Synonyms are usually shorter names that can be used in place of the table name.
    If a change is made in the original table or its data, this change will be reflected when the synonym is used. And, if a change is made in the data of the table using a synonym, this change will be reflected in the original table
    Synonyms are more permanent; they are available for use until they are deleted
    Example
    CREATE SYNONYM s1 FOR Student
    SELECT * FROM s1
    DROP SYNONYM s1
  • 19. Joining
    SQL joins are used to query data from two or more tables, based on a relationship between certain columns in these tables.
    In SQL Server 2005, the join is accomplished using the ANSI JOIN SQL syntax
    The general form of the ANSI JOIN SQL syntax is:
    SELECT columns FROM table1 JOIN table2 ON table1.column1=table2.column1
    Different SQL JOINs
    INNER JOIN: Return rows when there is match between two tables
    OUTER JOIN: Does not require each record in the two joined tables to have a matching record. The joined table retains each record—even if no other matching record exists
    SELF JOIN: A self-join is joining a table to itself
  • 20. Joining
    INNER joins : Can be classified into followings
    Equi-join: Uses only equality(=) comparisons in the join-predicate
    Examle: SELECT Employee.lastName,, Department.DepartmentName FROM Employee INNER JOIN Department ON Employee.DepartmentID = Department.DepartmentID;
    Natural join:ImplecitEqui-join. Uses a common column across two tables to join. The common column appears once only in the result set (Not supported in all databases)
    SELECT * FROM employee NATURAL JOIN department
    Cross join: A cross join, cartesian join or product provides the foundation upon which all types of inner joins operate. A cross join returns the cartesian product of the sets of records from the two joined tables
    SELECT * FROM employee CROSS JOIN department, or,
    SELECT * FROM employee, department; (Implicit cross join)
  • 21. Joining
    OUTER joins : Can be classified into followings
    Left outer join:Result of a left outer join (or left join) for tables A and B always contains all records of the "left" table (A), even if the join-condition does not find any matching record in the "right" table (B).
    Example:SELECT * FROM employee LEFT OUTER JOIN department ON employee.DepartmentID = department.DepartmentID
    Right outer join: join:Result of a right outer join (or right join) for tables A and B always contains all records of the “right" table (B), even if the join-condition does not find any matching record in the “left" table (A).
    Example:SELECT * FROM employee RIGHT OUTER JOIN department ON employee.DepartmentID = department.DepartmentID
    Full outer join: A full outer join combines the results of both left and right outer joins
    Example:SELECT * FROM employee FULL OUTER JOIN department ON employee.DepartmentID = department.DepartmentID
    Note :
    --In case of outer join queries, the missing matching fields are replaced as NULL values in the result set
    --Some Databases do not support Full outer join (Say, MySql). In these cases, Full outer joins could be obtained by using UNION of Left outer join and Right outer join
  • 22. Joining
    Self Join: In a regular join, each row in a table is compared against other rows in another table on a condition. But, if each row in a table is to be compared in other rows in the same table, then, self join is required.
    In a typical self join query, following two join conditions are used:
    Table1.Field =Table2.Field (To select duplicate/common values)
    Table1.PK <> Table2.PK (To filter the original row)
    Example: Select Employees and their corresponding Managers
    select e.Name as [Employee Name], m.Name as [Manager Name] from Employee e left outer join Employee m on e.MgrID = m.ID
    (Left join is required to list all employees including those who don’t have managers)
    Example: List products that are supplied by different vendors (Duplicate entries on a condition).
    SELECT DISTINCT pv1.ProductID, pv1.VendorID FROM Purchasing.ProductVendor pv1 INNER JOIN Purchasing.ProductVendor pv2
    ON pv1.ProductID = pv2.ProductID AND pv1.VendorID <> pv2.VendorID ORDER BY pv1.ProductID
  • 23. SubQueries
    select au_lname, au_fname from authors where city in (select city from publishers)
    There are two basic types of subqueries:
    Expression subqueries: Used with a comparison operator, must return a single value, and can be used almost anywhere an expression is allowed in SQL.
    Quantified predicate subqueries: Operate on lists introduced with in or not in, or, exists or not exists or, with a comparison operator modified by any or all. These subqueries return zero or more values.
    Subqueries of either type are either noncorrelated or correlated (repeating).
    Noncorrelatedsubquery: Can be evaluated as if it were an independent query. Conceptually, the results of the subquery are substituted in the main statement, or outer query
    Correlated subquery: Cannot be evaluated as an independent query, but can reference columns in a table listed in the from list of the outer query.
  • 24. SubQueries
    select au_lname, au_fname from authors where city in (select city from publishers)
    Expression SubQueries : Two types are there
    SubQuery in a select list (Correlated Subquery)
    Example :
    SELECT P.Name, P.ListPrice, (SELECT M.Name FROM Model M where M.ProductModelID = P.ProductID) Model FROM Product P
    Subqueries in a where or having clause connected by a comparison operator (Non-Correlated Subquery)
    Examples:
    SELECT customerid FROM orders WHERE empID= (SELECT empID from employees WHERE lastname = ‘Chang’);
    SELECT title FROM titles WHERE price > (SELECT min(price) FROM titles)
  • 25. SubQueries
    select au_lname, au_fname from authors where city in (select city from publishers)
    Quantified predicate subqueries : 3 types are there
    any/all subqueries :
    > all means greater than every value, or greater than the maximum value. For example, > all (1, 2, 3) means greater than 3.
    > any means greater than at least one value, or greater than the minimum value. Therefore, > any (1, 2, 3) means greater than 1.
    Example :
    select title from titles where price > all (select price from titles where type = "mod_cook")
    select title from titles where price > any(select price from titles where type = "mod_cook")
    In/not in subqueries :
    Example :
    Select lname, fname from authors where city in (select city from publishers)
    Select lname, fname from authors where city not in (select city from publishers)
  • 26. SubQueries
    select au_lname, au_fname from authors where city in (select city from publishers)
    Quantified predicate subqueries : Continued
    Exists/not Exists subqueries :
    The subquery Exists/NotExists evaluates to TRUE or FALSE rather than returning any
    data.
    With EXISTS/NOT EXISTS, The select list of the subquery usually consists of the asterisk (*)
    Example :
    Select lname, fname from authors where exists (select * from publishers where authors.city = publishers.city)
    Select lname, fname from authors where not exists (select * from publishers where authors.city = publishers.city)
    If the inner queries in any of these types of sub queries contains any column use of the outer query, then, it becomes a non-correlated sub query
  • 27. SubQueries
    select au_lname, au_fname from authors where city in (select city from publishers)
    Limitations
    The subquery_select_list can consist of only one column name, except in the exists subquery, where an (*) is usually used
    Subqueries can be nested inside the WHERE or HAVING clause of an outer SELECT, INSERT, UPDATE, or DELETE statement, inside another subquery, or in a select list. Examples :
    UPDATE titles SET price = price * 2 WHERE pub_id in (SELECT pub_id FROM publishers WHERE pub_name = "New Age Books")
    DELETE salesdetail WHERE title_id in (SELECT title_id FROM titles WHERE type = "business")
    IF EXISTS (SELECT title_id FROM titles WHERE type = "business")
    BEGIN
    DELETE salesdetail WHERE title_id in (SELECT title_id FROM titles WHERE type = "business")
    END
    In Transact-SQL, a subquery can appear almost anywhere an expression can be used, if it returns a single value
    You cannot use subqueries in an order by, group by
    text and image datatypes are not allowed in subqueries
  • 28. Joins Versus Subqueries
    Subqueriesmay often be used as alternatives to joins, but, not always
    SubQueries resembles to “Programming Logics” (Hence, seem more logical), where, Joining resembles to “Set operations” more.
    Joining queries usually perform less efficient than the SubQueries when number of involved tables grows, and, on some particular conditions. (See next slide)
    Both SubQueries and Join queries have their limitations. Based upon situation, one might perform better than the other.
  • 29. Joins Versus Subqueries
    A query written in both way
    Suppose we have two tables “Student” and “Grade”. List the students who have obtained “A” or “B” grades.
    The SubQuery (Programming logic)way
    Logic :
    1. Select student ID’s who has “A” or “B” grades
    2. Take those students who has a ID in the student ID list
    SQL:
    SELECT StudentName FROM Student WHERE ID IN(SELECT StID from Grade where StdGrade=“A” OR StdGrade=“B”)
    The Join way: (Wrong one!)
    SELECT StudentName from Student INNER JOIN Grade ON Student.ID = Grade.StdID AND StdGrade=“A” OR StdGrade=“B”
    Though the above query looks OK, this query produces more results comparing to the subquery version. Why?
    The Join query version shows duplicate student names who have both “A” and “B” grades in the Grade table (Because, the results are combined by “OR”). So, the DINTINCT has to be applied after the SELECT in the above query.
    But, to show DINTINCT results, the result set has to be sorted internally and the DINTINCT records has to be calculated. So, the SubQuery version might be a little efficient in this case.
    Why this problem does not occur in the SubQuery? (Think yourself)
  • 30. Joins Versus Subqueries
    When the Join Cannot Be Turned into a Subquery
    When a column from a table needs to be in the result set, that table has to be in the outer query. If two tables are being used, and if columns from both tables have to be in the result set, a join is necessary. This type of join cannot be turned into a subquery
    If the result set does not need the columns from more than one table, then the join can be turned into a subquery. In this case, the inner query only acts as a “List of valid values” to filter against a value in the outer query
    In the previous example,if the Student Name and Grade had been asked to show, then, the query had to be written using a Join.
  • 31. Group By
    If an aggregate function (Say, count(*)) is used in a query, it is applied to all (Filtered if there is any “where condition”) rows in the table.
    The GROUP BY statement is used in conjunction with the aggregate functions to group the result-set by one or more columns.
    SQL GROUP BY Syntax
    SELECT column_name, aggregate_function(column_name) FROM table_name WHERE column_name operator value GROUP BY column_name
    Example:
    SELECT Customer,SUM(OrderPrice) FROM Orders GROUP BY Customer
  • 32. Group By
    GROUP BY More Than One Column
    We can also use the GROUP BY statement on more than one column, like this:
    SELECT Customer,OrderDate,SUM(OrderPrice) FROM Orders GROUP BY Customer,OrderDate
    Note: If any aggregate function is used in any select query, except the aggregate functions, All the fields in the select list must be presented in a group by list
    But, the reverse is not true. That means, a field in the group by list does not necessarily have to be presented in the select list. That means, the select list can only contain aggregate function.
    The HAVING clause has to be used be used to compare the aggregate functions calculated value with any value.
    SQL HAVING example
    SELECT Customer,SUM(OrderPrice) FROM Orders WHERE Customer='Hansen' OR Customer='Jensen' GROUP BY Customer HAVING SUM(OrderPrice)>1500
  • 33. Group By
    Using ORDER BY
    The result set of a group by query can be ordered by any column from the result set using the ORDER BY clause
    Note: If alias is used for the aggregate function, that alias could be used in the ORDER BY clause in the GROUP BY query
    Group by query processing sequence
    The where clause excludes rows that do not meet its search conditions.
    The group by clause collects the remaining rows into one group for each unique value in the group by expression.
    Aggregate functions specified in the select list calculate summary values for each group.
    The having clause excludes rows from the final results that do not meet its search conditions.
  • 34. Group By
    • Group by and Aggregate function limitation
    If any column value is null and if that column is used in an aggregation function, then, the corresponding row is not considered in the calcuation.
    Workaround: Use aggregate_func(ISNULL(nullable_column, 0))
    SQL Server 2005 will not allow you to write nested aggregation function with a GROUP BY query. For example, following query is invalid
    SELECT MIN(COUNT(stno)) FROM Student GROUP BY class
    The above query has to be written using a SubQuery
    SELECT MIN(Scount.count) FROM
    (
    SELECT COUNT(stno) as count FROM Student GROUP BY class)
    ) as Scount
  • 35. Set Operations
    In relation database concepts, collection of rows are considered as “Sets"
    Set operations are used in SQL to retrieve data from multiple sets, and apply union, intersection and set difference operations to obtain the result
    Explicit set operations are used in SQL: UNION, INTERSECT, and MINUS (for set difference)
    In SQL server, the MINUS set operation is achieved by using the NOT IN operator.
    The format of a set statement is as follows:
    set OPERATOR set
    where OPERATOR is a UNION, INTERSECT or MINUS, and where "set" is defined by a SELECT.
  • 36. Set Operations
    Union Compatibility
    When using set operations, the two sets (the results of two SELECTs) have to have the same number of similar columns and the columns have to have compatible data types
    For union compatibility, the three basic data types are numeric, string, and dates
    All numeric columns are compatible with one another, all string columns are compatible with one another, and all date columns are compatible with one another
    The two sets being unioned must have the same number of columns in the result sets of the SELECT clauses.
  • 37. Set Operations
    UNION ALL
    UNION ALL works exactly like UNION, but does not expunge duplicates
    If we want to union two result sets that do not have the same number of columns, we have to use NULL (or other) values in the column-places as place holders
  • 38. Case Statements
    CASE expressions can be used in SQL anywhere an expression can be used (Inside SELECT,INSERT,UPDATE,DELETE statements and functions). A CASE expression returns a particular value.
    Two types of CASE statements are there
    Simple CASE function: Checks one expression against multiple values. Allows only an equality check.
    SELECT     
    CASE @TestVal                
    WHEN 1 THEN 'First‘
    WHEN 2 THEN 'Second'                 
    WHEN 3 THEN 'Third'     
    ELSE 'Other'     
    END
    Searched CASE function: Doesn’t have expression after the CASE statement. Allows evaluating complex expression in the WHEN clauses.
  • 39. Case Statements
    CASE statement in Select Query
    SELECT 'Price Category' = CASE  WHEN price IS NULL THEN 'Not yet priced'  WHEN price < 10 THEN 'Very Reasonable Title'  ELSE 'Expensive book!' END, Title
    FROM Titles 
    CASE Statement in a functionSELECT title_id, SUM(CASE WHEN month(ord_date)between 1 and 3 THEN qty ELSE 0 END) AS Q1, SUM(CASE WHEN month(ord_date) between 4 and 6 THEN qty ELSE 0 END) AS Q2
  • 40. Case Statements
    Searched CASE Statement example
    SELECT     
    CASE  
    WHEN @TestVal <=3 THEN 'Top 3'     
    ELSE 'Other'     
    END
    CASE statement in Select Query
    SELECT 'Price Category' = CASE  WHEN price IS NULL THEN 'Not yet priced'  WHEN price < 10 THEN 'Very Reasonable Title'  ELSE 'Expensive book!' END, Title
    FROM Titles 
  • 41. Case Statements
    CASE Statement in a functionSELECT title_id, SUM(CASE WHEN month(ord_date)between 1 and 3 THEN qty ELSE 0 END) AS Q1, SUM(CASE WHEN month(ord_date) between 4 and 6 THEN qty ELSE 0 END) AS Q2
    CASE Statement in an UPDATE statement
    UPDATE dbo.Customer SET stateDescription = 
    CASE 
    WHEN statecode = 'MA' THEN 'Massachusetts' WHEN statecode = 'VA' THEN 'Virginia' WHEN statecode = 'PA' THEN 'Pennsylvania' ELSE NULL 
    END