Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
×
1 of 93

Introduction to SQL Server Internals: How to Think Like the Engine

29

Share

Download to read offline

When you pass in a query, how does SQL Server build the results? Time to role play: Brent will be an end user sending in queries, and you will play the part of the SQL Server engine. Using simple spreadsheets as your tables, you will learn how SQL Server builds execution plans, uses indexes, performs joins, and considers statistics.

This session is for DBAs and developers who are comfortable writing queries, but not so comfortable when it comes to explaining nonclustered indexes, lookups, and sargability.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Introduction to SQL Server Internals: How to Think Like the Engine

  1. 1. Intro to Internals How to Think Like the SQL Engine Brent Ozar, Brent Ozar Unlimited
  2. 2. MIT License Copyright © 2016 Brent Ozar. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
  3. 3. I know, I hate killing trees. But having these next 3 pages in your hand will help a lot as we talk through the demos. Print this 3-page PDF to follow along: http://u.BrentOzar.com/engine.pdf 3
  4. 4. Brent Ozar Consultant, Brent Ozar Unlimited I make SQL Server faster and more reliable. I created sp_Blitz® and the SQL Server First Responder Kit, and I loves sharing knowledge at BrentOzar.com. I hold a bunch of certifications and awards including the rare Microsoft Certified Master. You don’t care about any of that though. Download the PDF: BrentOzar.com/go/enginepdf /brentozar @brento brentozar
  5. 5. Agenda When you pass in a query, how does SQL Server build the results? Time to role play: Brent will be an end user sending in queries, and you will play the part of the SQL Server engine. Using simple spreadsheets as your tables, you will learn how SQL Server builds execution plans, uses indexes, performs joins, and considers statistics. This session is for DBAs and developers who are comfortable writing queries, but not so comfortable when it comes to explaining nonclustered indexes, lookups, sargability, fill factor, and corruption detection.
  6. 6. Index OR Data Rows Slot Array 8KB Page header
  7. 7. Leaf pages Index pages
  8. 8. You: SQL Server. Me: end user.
  9. 9. First query: SELECT Id FROM dbo.Users
  10. 10. Your execution plan: 1. Shuffle through all of the pages, saying the Id of each record out loud.
  11. 11. SQL Server’s execution plan
  12. 12. SET STATISTICS IO ON Logical reads: the number of 8K pages we read. (79,672 x 8KB = 637MB)
  13. 13. That’s 159 reams.
  14. 14. Let’s add a filter. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’
  15. 15. Your execution plan: 1. Shuffle through all of the pages, saying the Id of each record out loud, if their LastAccessDate > ‘2014/07/01’.
  16. 16. SQL Server’s execution plan
  17. 17. Lesson: Using a WHERE without a matching index means scanning all the data.
  18. 18. Lesson: Estimated Subtree Cost guesses at CPU and IO work required for a query.
  19. 19. Let’s add a sort. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate
  20. 20. Your execution plan 1. Shuffle through all of the pages, writing down fields __________ for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate.
  21. 21. SQL Server’s execution plan
  22. 22. Cost is up ~4x We needed space to write down our results, so we got a memory grant Order By:
  23. 23. Memory is set when the query starts, and not revised. SQL Server has to assume other people will run queries at the same time as you. Your memory grant can change with each time that you run a query. You can’t always get what you want.
  24. 24. And if you run out of memory…
  25. 25. Let’s get all the fields. SELECT * FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate
  26. 26. Your execution plan 1. Shuffle through all of the pages, writing down fields __________ for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate.
  27. 27. Lesson: SELECT * sucks. But let’s dig deeper.
  28. 28. Why does it suck? Do we work harder to read the data? Do we work harder to write the data? Do we work harder to sort the data? Do we work harder to output the data?
  29. 29. SQL Server’s execution plan
  30. 30. SELECT ID SELECT * No order 66 66 ORDER BY 259 20,666
  31. 31. Lesson: Sorting is expensive, and more fields makes it worse.
  32. 32. Let’s run it a few times. SELECT * FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate; GO 5
  33. 33. Your execution plan 1. Shuffle through all of the pages, writing down all the fields for each record, if their LastAccessDate > ‘2014/07/01’. 2. Sort the matching records by LastAccessDate. 3. Keep the output so you could reuse it the next time you saw this same query?
  34. 34. Oracle can. (It better, since it costs $47,000 per core.)
  35. 35. SQL Server reads & sorts 5 times.
  36. 36. Lesson: SQL Server caches raw data pages, not output.
  37. 37. Nonclustered indexes: copies. Stored in the order we want Include the fields we want CREATE INDEX IX_LastAccessDate_Id ON dbo.Users(LastAccessDate, Id)
  38. 38. Let’s go simple again. SELECT Id FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate;
  39. 39. Your execution plan 1. Grab IX_LastAccessDate and seek to 2014/07/01. 2. Read the Id’s out in order.
  40. 40. SQL Server’s execution plan
  41. 41. SELECT ID SELECT * No order 66 66 ORDER BY 259 20,666 ORDER BY (with index) 10 6,354
  42. 42. Lesson: Indexes reduce reads. Duh.
  43. 43. Lesson: Indexes also reduce CPU time.
  44. 44. Yes, this is a “seek.”
  45. 45. Don’t think scan = terrible.
  46. 46. It covers the fields we need in this query. But if we change the query… That’s a covering index.
  47. 47. Let’s add a couple of fields. SELECT Id, DisplayName, Age FROM dbo.Users WHERE LastAccessDate > ‘2014/07/01’ ORDER BY LastAccessDate;
  48. 48. One execution plan 1. Grab IX_LastAccessDate_Id, seek to 2014/07/01. 2. Write down the Id and LastAccessDate of matching records. 3. Grab the clustered index (white pages), and look up each matching row by their Id to get DisplayName and Age.
  49. 49. The SQL Server equivalent
  50. 50. For simplicity, I told you I created this index with the Id. SQL Server always includes your clustering keys whether you ask for ‘em or not because it has to join indexes. That’s why SQL Server includes the key
  51. 51. Key lookup is required when the index doesn’t have all the fields we need. Hover your mouse over the key lookup and look for the OUTPUT fields. Small? Frequently used? Add ‘em to the index. DO NOT ADD A NEW INDEX. Classic index tuning sign
  52. 52. But to get that plan, I had to cheat.
  53. 53. Because with 2014/07/01, I get:
  54. 54. Lesson: Even with indexes, there’s a tipping point where scans work better.
  55. 55. Enter statistics.
  56. 56. Decide which index to use What order to process tables/indexes in Whether to do seeks or scans Guess how many rows will match your query How much memory to allocate for the query Statistics help SQL Server:
  57. 57. WHERE LastAccessDate > ‘2014/07/01’
  58. 58. Add it up, add it up
  59. 59. Automatic stats updates aren’t enough. Consider: • http://Ola.Hallengren.com • http://MinionWare.net/reindex Typical strategy: weekly statistics updates Updated statistics on an index invalidate query plans that involve that index • Affects your plan cache analysis • Can cause unpredictable query plan changes Keep statistics updated.
  60. 60. How about on a single random date?
  61. 61. Let’s write it differently.
  62. 62. Wait – what?
  63. 63. Why can’t I get just one row
  64. 64. Lesson: This is called Cardinality Estimation, and it’s not just about keeping stats updated.
  65. 65. The Cardinality Estimator has huge improvements. To turn ‘em on, just change your Compatibility Level. Fortunately, SQL 2014/2016 fixes this.
  66. 66. And run the exact same query again
  67. 67. All better!
  68. 68. Lesson: 2014/2016’s new Cardinality Estimator is, uh, new
  69. 69. Let’s add a join.
  70. 70. Lesson: bad cardinality estimation is at the dark heart of many bad plans.
  71. 71. Whew. That’s a lot of lessons.
  72. 72. Clustered indexes hold all the fields* Nonclustered indexes are light-weight* copies of the table NC indexes reduce not just reads, but also CPU work SQL Server caches raw data pages, not query output Statistics drive seek vs scan, index choice, memory Statistics aren’t the only part: cardinality estimation matters Includes and seeks aren’t magically delicious What we learned
  73. 73. Thank You Learn more from Brent Ozar help@brentozar.com or follow @BrentO

Editor's Notes

  • We’re using the StackOverflow.com database as an example.
    To get download it, go to http://BrentOzar.com/go/querystack.
    These screenshots are from the 2016/03 export, which is ~100GB.
    If you use a newer or older export, your numbers of pages may vary.
  • This session focuses on the Users table
    Id – primary key, clustered index. It’s an identity, starts at 1 and goes into the millions.
    The white paper you’re holding in your hands – that’s the clustered index.
    It includes all of the fields on the table – sort of.
    Notice the About Me? It’s an NVARCHAR(MAX), and may not fit on a row. SQL Server may store that off-row, on other pages, if people get really wordy in their about-me field. We’re not going to touch on off-row data here, but I just want you to know there’s an overhead to that. Same thing with XML, JSON.
  • For old-school tables, everything is stored in 8KB pages.
    These pages are the same whether they’re in memory or on disk.
    It’s the smallest unit of data SQL Server works with.
    (Things are different for Hekaton and columnstore indexes, but we’re focusing on old-school tables today.)
  • 463x
  • 463x
  • Why can’t I get just one row
  • ×