T-Sql programming guidelines, in terms of:-
1. Commenting code
2. Code readability
3. General good practise
4. Defensive coding and error handling
5. Coding for performance and scalability
Comments and exception handling have been
purposely omitted from code fragments in the interest
of brevity, such that each fragment can fit onto one
slide.
Disclaimer
Comments and exception handling have been
purposely omitted from code fragments in the interest
of brevity, such that each fragment can fit onto one
slide.
Disclaimer
All code should be self documenting.
T-SQL code artefacts, triggers, stored procedures and
functions should have a standard comment banner.
Comment code at all points of interest, describe why
and not what.
Avoid in line comments.
Comments
Comment banners should include:-
Author details.
A brief description of what the code does.
Narrative comments for all arguments.
Narrative comments for return types.
Change control information.
An example is provided on the next slide
Comment Banners
CREATE PROCEDURE
/*===================================================================================*
//*
/*
*/
*/Name :
uspMyProc
/*
/*
/*
/*
/*
/*
(
*/
*/
*/
*/
*/
*/
Description: Stored
should
procedure to demonstrate what a specimen comment banner
look like.
Parameters :
@Parameter1 int, /* First parameter passed into procedure. */
*/
*/
/* ---------------------------------------------------------------------------
@Parameter2 int /* Second parameter passed into procedure.
)
/*
/*
/*
/*
/*
/*
/*
/*
*/
*/
*/
*/
*/
*/
*/
*/
Change History
~~~~~~~~~~~~~~
Version
-------
1.0
Author
--------------------
C. J. Adkin
Date
--------
09/08/11
Ticket
------
3525
Description
------------------------------------
Initial version created.
/*===================================================================================*/
AS
BEGIN
.
.
Comment Banner Example
-- This is an example of an inline comment
Why are these bad ?
Because a careless backspace can turn a useful statement
into a commented out one.
But my code is always thoroughly tested
NO EXCSUSE, always code defensively
Use /* */ comments instead.
Use Of Inline Comments
Use and adhere to naming conventions.
Use meaningful object names.
Never prefix application stored procedures with sp
SQL Server will always scan through the system
catalogue first, before executing such procedures
Bad for performance
Naming Conventions
Use ANSI SQL join syntax over none ANSI syntax.
Be consistent when
Camel case
Pascal case
Use of upper case
Be consistent when
using case:-
for reserved key words
indenting and stacking text.
Code Readability
Never blindly take technical hints and tips written in a blog or
presentation as gospel.
Test your assumptions using “Scientific method”, i.e.:-
Use test cases which use consistent test data across all tests,
production realistic data is preferable.
If the data is commercially sensitive, e.g. bank account details, keep
the volume and distribution the same, obfuscate the sensitive
parts out.
Only change one thing at a time, so as to be able to gauge the
impact of the change accurately and know what effected the
change.
The “Scientific Method” Approach
• For performance related tests always clear the
procedure and buffer cache out, so that results
not skewed between tests, use the following:-
are
–
–
–
CHECKPOINT
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
The “Scientific Method” Approach
A term coined by Jeff Moden, a MVP and frequent poster on
SQL Server Central.com .
Alludes to:-
Coding in procedural 3GL way instead of a set based way.
Chronic performance of row by row oriented processing.
Abbreviated to RBAR, pronounced Ree-bar.
Avoid “Row by agonising row”
Techniques
Code whereby result sets and table contents are
processed line by line, typically using cursors.
Correlated subqueries.
User Defined Functions.
Iterating through results sets as ADO objects in
Server Integration Services looping containers.
SQL
Where “Row by agonising row”
Takes Place
A simple, but contrived query written against the
AdventureWorkds2008R2 database.
The first query will use nested subqueries.
The second will use derived tables.
Sub-Query Example
SELECT ProductID,
Quantity
FROM (SELECT TOP 1
LocationID
AdventureWorks.Production.Location Loc
CostRate = (SELECT MAX(CostRate)
FROM
WHERE
FROM AdventureWorks.Production.Location) ) dt,
AdventureWorks.Production.ProductInventory Pi
Pi.LocationID = dt.LocationIDWHERE
Sub-Query Example Without RBAR
What is the difference between the two queries ?.
Query 1, cost = 0.299164
Query 2, cost = 0.0202938
What is the crucial difference ?
Table spool operation in the first plan has been executed 1069 times.
This happens to be the number of rows in the ProductInventory table.
The RBAR Versus The Non-RBAR
Approach Quantified
Row oriented processing may be unavoidable under certain
circumstances:-
The processing of one row depends on the state of one or more
previous rows in a result set.
The row processing logic involves a change to the global state of the
database and therefore cannot be encapsulated in a function.
In this case there are ways to use cursors in a very efficient manner
As per the next three slides.
Efficient Techniques For RBAR When
It Cannot Be Avoided
Elapsed time 00:22:27.892
DECLARE @MaxRownum
@OrderId
@i
int,
int,
int;
SET @i = 1;
CREATE TABLE #OrderIds (
rownum
OrderId
int IDENTITY (1, 1),
int
);
INSERT
SELECT
FROM
INTO #OrderIds
SalesOrderID
Sales.SalesOrderDetail;
SELECT
FROM
@MaxRownum = MAX(rownum)
#OrderIds;
WHILE @i <
BEGIN
SELECT
FROM
WHERE
@MaxRownum
@OrderId = OrderId
#OrderIds
rownum = @i;
SET @i = @i + 1;
END;
RBAR Without A Cursor
Elapsed time 00:00:03.106
DECLARE @s int;
DECLARE c CURSOR FOR
SELECT
FROM
SalesOrderID
Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @s;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @s;
END;
CLOSE c;
DEALLOCATE c;
RBAR With A Cursor
Elapsed time 00:00:01.555
DECLARE @s int;
DECLARE c CURSOR FAST_FORWARD FOR
SELECT
FROM
SalesOrderID
Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @s;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @s;
END;
CLOSE c;
DEALLOCATE c;
RBAR With An Optimised Cursor
No T-SQL language feature is a “Panacea to all
For example:-
Avoid RBAR logic where possible
Avoid nesting cursors
But cursors do have their uses.
Be aware of the FAST_FORWARD optimisation, applicable
when:-
The data being retrieved is not being modified
The cursor is being scrolled through in a forward only
direction
ills”.
Cursor “RBAR” Morale Of The Story
When using SQL Server 2005 onwards:-
Use TRY CATCH blocks.
Make the event logged in CATCH block verbose enough to
allow the exceptional event to be easily tracked down.
NEVER use exceptions for control flow, illustrated with an
upsert example in the next four slides.
NEVER ‘Swallow’ exceptions, i.e. catch them and do nothing
with them.
Exception Handling
DECLARE @p int;
DECLARE c CURSOR FAST_FORWARD FOR
SELECT ProductID
FROM Sales.SalesOrderDetail;
OPEN c;
FETCH NEXT FROM c INTO @p;
WHILE @@FETCH_STATUS = 0
BEGIN
FETCH NEXT FROM c INTO @p;
/* Place the stored procedure to be tested
* on the line below.
*/
EXEC dbo.uspUpsert_V1 @p;
END;
CLOSE c;
DEALLOCATE c;
Exceptions Used For Flow Control
Test Harness
CREATE TABLE SalesByProduct (
ProductID int,
Sold int,
CONSTRAINT [PK_SalesByProduct]
(
ProductID
) ON [USERDATA]
) ON [USERDATA]
PRIMARY KEY CLUSTERED
Exceptions Used For Flow Control
‘Upsert’ Table
Execution time = 00:00:51.200
CREATE PROCEDURE uspUpsert_V1 (@ProductID
BEGIN
SET NOCOUNT ON;
int) AS
BEGIN TRY
INSERT INTO SalesByProduct
VALUES (@ProductID, 1);
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 2627
BEGIN
UPDATE
SET
WHERE
SalesByProduct
Sold += 1
ProductID = @ProductID;
END
END CATCH;
END;
‘Upsert’ Procedure First Attempt
Execution time = 00:00:20.080
CREATE PROCEDURE uspUpsert_V2 (@ProductID
BEGIN
SET NOCOUNT ON;
int) AS
UPDATE
SET
WHERE
SalesByProduct
Sold += 1
ProductID = @ProductID;
IF @@ROWCOUNT = 0
BEGIN
INSERT INTO SalesByProduct
VALUES (@ProductID, 1);
END;
END;
‘Upsert’ Procedure Second Attempt
With SQL Server 2008 onwards, consider using the MERGE
statement for upserts, execution time = 00:00:20.904
CREATE PROCEDURE uspUpsert_V3 (@ProductID int) AS
BEGIN
SET NOCOUNT ON;
MERGE
USING
AS
source
ON
SalesByProduct AS target
(SELECT @ProductID)
(ProductID)
(target.ProductID = source.ProductID)
WHEN MATCHED THEN
UPDATE
SET Sold += 1
WHEN NOT MATCHED THEN
INSERT (ProductID, Sold)
VALUES (source.ProductID, 1);
END;
‘Upsert’ Procedure Third Attempt
Scalar functions
this function:-
are another example of RBAR, consider
CREATE FUNCTION
RETURNS int
AS
BEGIN
udfMinProductQty ( @ProductID int )
RETURN ( SELECT
FROM
WHERE
MIN(OrderQty)
Sales.SalesOrderDetail
ProductId = @ProductID )
END;
RBAR and Scalar Functions
Now lets call the function from an example query:-
SELECT ProductId,
dbo.udfMinProductQty(ProductId)
FROM Production.Product
Elapsed time = 00:00:00.746
RBAR and Scalar Functions: Example
Now doing the same thing, but
valued function:-
using an inline table
CREATE FUNCTION tvfMinProductQty
@ProductId INT
)
RETURNS TABLE
AS
RETURN (
(
SELECT MAX(s.OrderQty) AS MinOrdQty
FROM Sales.SalesOrderDetail s
WHERE s.ProductId = @ProductId
)
RBAR and Scalar Functions A Better
Approach, Using Table Value Functions
Invoking the inline TVF from a query:-
SELECT ProductId,
(SELECT MinOrdQty
FROM dbo.tvfMinProductQty(ProductId)
FROM Production.Product
ORDER BY ProductId
) MinOrdQty
Elapsed time 00:00:00.330
RBAR and Scalar Functions A Better
Approach, Using Table Value Functions
Developing applications that use database
perform well depends on good:-
Schema design
Compiled statement plan reuse.
Connection management.
and
Minimizing the number of network round trips
between the database and the tier above.
Compiled Plan Reuse
Parameterise your queries in order to minimize compiling.
BUT, watch out for “Parameter sniffing”.
At runtime the database engine will sniff the values of the
parameters a query is compiled with and create a plan
accordingly.
Unfortunate when the values cause plans with table scans,
when the ‘Popular’ values lead to plans with index seeks.
Writing Plan Reuse Friendly Code
Use the RECOMPILE hint to force the creation of a new plan.
Use the optimise for hint in order for a plan to be created for
‘Popular’ values you specify.
Use the OPTIMISE FOR UNKNOWN hint, to cause a “General
purpose” plan to be created.
Copy parameters passed into a stored procedure to local
variables and use those in your query.
Parameter Sniffing
For OLTP style applications:-
Transactions will be short
Number of statements will be finite
SQL will only affect a few rows for each execution.
The SQL will be simple.
Plans will be skewed towards using index seeks over table scans.
Recompiles could double+ query execution time.
Therefore recompiles are undesirable for OLTP applications.
When (Re)Compiles
Are To Be Avoided
For OLAP style applications:-
Complex queries that may involve aggregation and analytic
SQL.
Queries may change constantly due to the use of reporting
and BI tools.
May involve WHERE clauses with potentially lots of
combinations of parameters.
Foregoing a recompile via OPTION(RECOMPILE) may be
worth taking a hit on for the benefit of a significant reduction
in total execution time.
This is the exception to the rule.
When Taking The Hit Of A
(Re)Compile Is Worthwhile
Be careful when using table variables.
Statistics cannot be gathered on these
The optimizer will assume they only contain one
row unless the statement is recompiled
This can lead to unexpected execution plans.
Table variables will always inhibit parallelism in
execution plans.
Table Variables
This applies to conditions in WHERE clauses.
If a WHERE clause condition can use an index, this is
said to be ‘Sargable’
A searchable argument
As a general rule of thumb the use of a function on a
column will suppress index usage.
i.e. WHERE ufn(MyColumn1) = <somevalue>
Sargability
Constructs that will always force a serial plan:-
All T-SQL user defined functions.
All CLR user defined functions with data access.
Built in function including: @@TRANCOUNT,
ERROR_NUMBER() and OBJECT_ID().
Dynamic cursors.
Be Aware Of Constructs That Create
Serial Regions In Execution Plans
Constructs that will always force a serial region within a plan:-
Table value functions
TOP
Recursive queries
Multi consumer spool
Sequence functions
System table scans
“Backwards” scans
Sequence functions
Global scalar aggregate
Be Aware Of Constructs That Create
Serial Regions In Execution Plans
Advise From The SQL Server
Optimizer Development Team
Craig Freedman, a former optimizer developer has
some good words of advice in his “Understanding
Query Processing and Query Plan in SQL Server”
slide deck.
The points on the next three slides
( quoted verbatim ) come from slide 40.
Watch Out For Errors In
Cardinality Estimates
Watch out for errors in cardinality estimates
Errors propagate upwards; look for the root cause
Make sure statistics are up to date and accurate
Avoid excessively complex predicates
Use computed columns for overly complex
expressions
General Tips
Use set based queries; (almost always) avoid cursors
Avoid joining columns with mismatched data types
Avoid unnecessary outer joins, cross applies, complex sub-queries,
dynamic index seeks, …
Avoid dynamic SQL
(but beware that sometimes dynamic SQL does yield a better plan)
Consider creating constraints
(but remember that there is a cost to maintain constraints)
If possible, use inline TVFs NOT multi-statement TVFs
Use SET STATISTICS IO ON to watch out for large numbers of
physical I/Os
Use indexes to workaround locking, concurrency, and deadlock
issues
OLTP and DW Tips
OLTP tips:
Avoid memory consuming or blocking iterators
Use seeks not scans
DW tips:
Use parallel plans
Watch out for skew in parallel plans
Avoid order preserving exchanges
• OLTP tips:
– Avoid memory consuming or blocking iterators
– Use seeks not scans
• DW tips:
– Use parallel plans
– Watch out for skew in parallel plans
– Avoid order preserving exchanges
Leverage functionality already
reinvent it, this will lead to:-
More robust code
Less development effort
Potentially faster code
Code with better readability
Easier to maintain code
in SQL Server, never
Avoid Reinventing The Wheel
This is furnishing the code with a facility to allow its execution
to be traced.
Write to a tracking table
And / or use xp_logevent to write to event log
DO NOT make the code a “Black box” which has to be dissected
statement by statement in production if it starts to fail.
Code Instrumentation
Make stored procedures and functions relatively single
minded in what they do.
Stored procedures and functions with lots of arguments are a
“Code smell” of code that:-
Is difficult to unit test with a high degree of confidence.
Does not lend itself to code reuse.
Smacks of poor design.
Favour Strong Functional
Independence For Code Artefacts
Understand and use the full power of T-SQL.
Most people know how to UNION results sets together, but do not know
about INTERSECT and EXCEPT.
Also a lot of development effort can be saved by using T-SQL’s analytics
extensions where appropriate:-
RANK()
DENSE_RANK()
NTILE()
ROW_NUMBER()
LEAD() and LAG() (introduced in Denali)
Leverage The Full Power Of
Transact SQL
An ‘Ordinal’ in the context of the ORDER BY clause is when numbers are used to
represent column positions.
If the new columns are added or their order changed in the SELECT, this query will
return different results, potentially breaking the application using it.
SELECT TOP 5
[SalesOrderNumber]
,[OrderDate]
,[DueDate]
,[ShipDate]
,[Status]
FROM [AdventureWorks].[Sales].[SalesOrderHeader]
ORDER BY 2 DESC
Avoid Ordering By Ordinals
SELECT * retrieves all columns from a table
bad for performance if only a subset of these is
required.
Using columns by their names explicitly leads to
improved code readability.
Code is easier to maintain, as it enables the
“Developer” to see in situ what columns a query is
using.
Avoid SELECT *
A scenario that actually happened:-
A row is inserted into the customer table
Customer table has a primary key based on an identity
column
@@IDENTITY is used to obtain the key value of the customer
row inserted for the creation of an order row with a foreign
key linking back to customer.
The identity value obtained is nothing like the one for the
inserted row – why ?
Robust Code and @@IDENTITY
@@IDENTITY obtains the latest identity value
irrespective of the session it came from.
In the example the replication merge agent inserted
row in the customer table just before @@IDENTITY
was used.
The solution: always use SCOPE_IDENTITY() instead
of @@IDENTITY.
a
@@IDENTITY Is Dangerous !!!
SQL Tri State Logic
SQL has tri-state logic: TRUE, FALSE and NULL.
SQL data types cannot be compared to NULL using
conventional comparison operators:
<some value> <> NULL
<some value> > NULL
<some value> < NULL
<some value> = NULL
Always use IS NULL, IS NOT NULL, ISNULL and
COALESCE to handle NULLs correctly.
NULL Always Propagate
In Expressions
Expressions that includes NULL will always evaluate
to NULL, e.g.:
SELECT 1 + NULL
SELECT 1 – NULL
SELECT 1 * NULL
SELECT 1 / NULL
SELECT @MyString + NULL
If this is not the behaviour you want, code around
this using ISNULL or COALESCE.
Use Of The NOLOCK Hint
Historically SQL Server has always used locking to
enforce Isolation levels, however:
SQL Server ( 2005 onwards ) facilitates non
blocking versions of the read committed and
snapshot isolations levels through multi version
concurrency control (MVCC).
SQL Server 2014 which uses MVCC for its in
memory OLTP engine.
All Azure Database databases use the MVCC
version of read committed snapshot isolation.
Use Of The NOLOCK Hint
Objects can be scanned in two ways:
Allocation order, always applied to heaps, can
apply to indexes.
Logical order, indexes are traversed in logical leaf
node order.
Any queries against indexed tables (clustered or non-
clustered) using NOLOCK that are perform allocation-
ordered scans will be exposed to reading the same
data twice if another session causes the page to split
and the data to move during this process.
Use Of The NOLOCK Hint
If a session uses a NOLOCK hint on a heap / clustered
index, its reads would ignore any locks taken out on
pages/rows by in-flight transactions and
subsequently be able read uncommitted ( dirty )
data, if that row is in the process of being changed by
another session.
If the in-flight transaction rolls back, this leaves the
session in a state whereby it has read dirty data, i.e.
data that has been modified outside of a safe
transactional context.
Thanks to Mark Broadbent (@retracement) for
checking this and the last two slides.
Transaction Rollback Behaviour
CREATE TABLE Test (col1 INT)
BEGIN TRANSACTION
INSERT INTO Test VALUES (1);
UPDATE Test SET col1 = col1 + 1
WHERE 1/0 > 1;
COMMIT;
SELECT col1 FROM Test
-- ** 1 ** row is returned
CREATE TABLE Test (col1 INT)
SET XACT_ABORT ON
BEGIN TRANSACTION
INSERT INTO Test VALUES (1);
UPDATE Test SET col1 = col1 + 1
WHERE 1/0 > 1;
COMMIT;
SELECT col1 FROM Test
-- ** No rows ** are returned.
For SQL Server to automatically rollback an entire transaction
when a statement raises a run time error
SET XACT_ABORT must be set to ON.