How to think like the engine

Intro to Internals
How to Think Like the SQL Engine
Brent Ozar, Brent Ozar Unlimited

MIT License
Copyright © 2016 Brent Ozar.
Permission is hereby granted, free of charge, to any person obtaining a copy of this
software and associated documentation files (the "Software"), to deal in the Software
without restriction, including without limitation the rights to use, copy, modify, merge,
publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to
whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or
substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
OTHER DEALINGS IN THE SOFTWARE.

I know, I hate killing trees.
But having these next 3 pages in your hand will help a lot
as we talk through the demos.
Print this 3-page PDF to follow along:
http://u.BrentOzar.com/engine.pdf
3

Brent Ozar
Consultant, Brent Ozar Unlimited
I make SQL Server faster and more reliable.
I created sp_Blitz® and the SQL Server First
Responder Kit, and I loves sharing knowledge at
BrentOzar.com. I hold a bunch of certifications and
awards including the rare Microsoft Certified Master.
You don’t care about any of that though.
Download the PDF: BrentOzar.com/go/enginepdf
/brentozar @brento brentozar

Agenda
When you pass in a query, how does SQL Server build the results? Time
to role play: Brent will be an end user sending in queries, and you will
play the part of the SQL Server engine. Using simple spreadsheets as
your tables, you will learn how SQL Server builds execution plans,
uses indexes, performs joins, and considers statistics.
This session is for DBAs and developers who are comfortable writing
queries, but not so comfortable when it comes to explaining
nonclustered indexes, lookups, sargability, fill factor, and corruption
detection.

Index OR
Data Rows
Slot Array
8KB
Page header

You: SQL Server.
Me: end user.

First query:
SELECT Id
FROM dbo.Users

Your execution plan:
1. Shuffle through all of the pages,
saying the Id of each record out loud.

SET STATISTICS IO ON
Logical reads:
the number of 8K pages we read.
(79,672 x 8KB = 637MB)

Let’s add a filter.
SELECT Id
FROM dbo.Users
WHERE LastAccessDate > ‘2014/07/01’

Your execution plan:
saying the Id of each record out loud,
if their LastAccessDate > ‘2014/07/01’.

Lesson:
Using a WHERE without a
matching index means
scanning all the data.

Lesson:
Estimated Subtree Cost
guesses at CPU and IO
work required for a query.

Let’s add a sort.
SELECT Id
FROM dbo.Users
ORDER BY LastAccessDate

Your execution plan
writing down fields __________ for each record,
2. Sort the matching records by LastAccessDate.

Cost is up ~4x
We needed space to
write down our results,
so we got a memory grant
Order By:

Memory is set when the query starts,
and not revised.
SQL Server has to assume other people
will run queries at the same time as you.
Your memory grant can change with
each time that you run a query.
You can’t always get
what you want.

And if you run out of memory…

Let’s get all the fields.
SELECT *
FROM dbo.Users
ORDER BY LastAccessDate

Lesson:
SELECT * sucks.
But let’s dig deeper.

Why does it suck?
Do we work harder to read the data?
Do we work harder to write the data?
Do we work harder to sort the data?
Do we work harder to output the data?

SELECT ID SELECT *
No order 66 66
ORDER BY 259 20,666

Lesson:
Sorting is expensive,
and more fields
makes it worse.

Let’s run it a few times.
SELECT *
FROM dbo.Users
ORDER BY LastAccessDate;
GO 5

Your execution plan
writing down all the fields for each record,
2. Sort the matching records by LastAccessDate.
3. Keep the output so you could reuse it the next
time you saw this same query?

Oracle can.
(It better, since it costs
$47,000 per core.)

SQL Server reads & sorts 5 times.

Lesson:
SQL Server caches raw data
pages, not output.

Nonclustered indexes: copies.
Stored in the order we want
Include the fields we want
CREATE INDEX
IX_LastAccessDate_Id
ON dbo.Users(LastAccessDate, Id)

Let’s go simple again.
SELECT Id
FROM dbo.Users

Your execution plan
1. Grab IX_LastAccessDate and seek to 2014/07/01.
2. Read the Id’s out in order.

SELECT ID SELECT *
No order 66 66
ORDER BY 259 20,666
ORDER BY
(with index)
10 6,354

Lesson:
Indexes reduce reads.
Duh.

Lesson:
Indexes also
reduce CPU time.

Don’t think scan = terrible.

It covers the fields we need in this query.
But if we change the query…
That’s a covering index.

Let’s add a couple of fields.
SELECT Id, DisplayName, Age
FROM dbo.Users

One execution plan
1. Grab IX_LastAccessDate_Id, seek to 2014/07/01.
2. Write down the Id and LastAccessDate of
matching records.
3. Grab the clustered index (white pages), and look
up each matching row by their Id to get
DisplayName and Age.

For simplicity, I told you I created this index with the Id.
SQL Server always includes your clustering keys whether
you ask for ‘em or not because it has to join indexes.
That’s why SQL Server includes the key

Key lookup is required
when the index doesn’t
have all the fields we need.
Hover your mouse over the
key lookup and look for the
OUTPUT fields.
Small? Frequently used?
Add ‘em to the index.
DO NOT ADD A NEW INDEX.
Classic index
tuning sign

But to get that plan, I had to cheat.

Because with 2014/07/01, I get:

Lesson:
Even with indexes,
there’s a tipping point
where scans work better.

Decide which index to use
What order to process tables/indexes in
Whether to do seeks or scans
Guess how many rows will match your query
How much memory to allocate for the query
Statistics help SQL Server:

Automatic stats updates aren’t enough. Consider:
• http://Ola.Hallengren.com
• http://MinionWare.net/reindex
Typical strategy: weekly statistics updates
Updated statistics on an index invalidate query plans that
involve that index
• Affects your plan cache analysis
• Can cause unpredictable query plan changes
Keep statistics updated.

How about on a single random date?

Why can’t I get just one row

Lesson:
This is called
Cardinality Estimation,
and it’s not just about
keeping stats updated.

The Cardinality Estimator has huge improvements.
To turn ‘em on, just change your Compatibility Level.
Fortunately, SQL 2014/2016 fixes this.

And run the exact same query again

Lesson:
2014/2016’s new
Cardinality Estimator
is, uh, new

Lesson:
bad cardinality estimation
is at the dark heart
of many bad plans.

Whew.
That’s a lot of lessons.

Clustered indexes hold all the fields*
Nonclustered indexes are light-weight* copies of the table
NC indexes reduce not just reads, but also CPU work
SQL Server caches raw data pages, not query output
Statistics drive seek vs scan, index choice, memory
Statistics aren’t the only part: cardinality estimation matters
Includes and seeks aren’t magically delicious
What we learned

Thank You
Learn more from
Brent Ozar
help@brentozar.com or follow @BrentO

How to think like the engine

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to How to think like the engine

Similar to How to think like the engine (20)

Recently uploaded

Recently uploaded (20)

How to think like the engine

Editor's Notes