Indexes: The neglected performance all rounder

17,507 views
17,584 views

Published on

The latest incarnation of this talk as presented on Jan, 31st at FOSDEM PgDay in Brussles

Published in: Technology, Business
2 Comments
28 Likes
Statistics
Notes
No Downloads
Views
Total views
17,507
On SlideShare
0
From Embeds
0
Number of Embeds
7,827
Actions
Shares
0
Downloads
0
Comments
2
Likes
28
Embeds 0
No embeds

No notes for slide

Indexes: The neglected performance all rounder

  1. 1. Indexing: The neglected performance all-rounder Not always that obvious, unfortunately! @MarkusWinand @SQLPerfTips iStockPhoto wildpixel
  2. 2. Indexing: The neglected performance all-rounder Not always that obvious, unfortunately! @MarkusWinand @SQLPerfTips iStockPhoto wildpixel
  3. 3. Takeaway #1: Pandemic Scale It affects you! (Symbolic image; not real data) http://upload.wikimedia.org/wikipedia/commons/c/c7/2009_world_subdivisions_flu_pandemic.png
  4. 4. Takeaway #2: Caused by Success Copyright © 2013 Telerik, Inc. All rights reserved
  5. 5. Takeaway #3: It’s Not Your Fault http://simpsonswiki.com/wiki/File:I_Didn%27t_Do_It!_Volume_III.png
  6. 6. The Problem Index/Query Mismatch © 2014 by Markus Winand
  7. 7. Quantifying the Problem Percona White Paper: Reasons of performance problems that caused production downtime: 38% bad SQL 15% schema and indexing http://www.percona.com/files/white-papers/causes-of-downtime-in-mysql.pdf
  8. 8. Quantifying the Problem Survey by sqlskills.com: Root causes of the last few SQL Server performance problems: 27% T-SQL 19% Poor indexing http://www.sqlskills.com/blogs/paul/survey-what-are-the-most-common-causes-of-performance-problems/
  9. 9. Quantifying the Problem Craig S. Mullins (strategist and researcher): „As much as 75% of poor relational performance is caused by "bad" SQL and application code.” Noel Yuhanna (Forrester Research): „The key difficulties surrounding performance continue to be poorly written SQL statements, improper DBMS configuration and a lack of clear understanding of how to tune databases to solve performance issues.”
  10. 10. Quantifying the Problem My observation: ~50% of SQL performance problems are caused by improper index use
  11. 11. The Root Cause Indexing is a afterthought… ...often done by admins © 2014 by Markus Winand
  12. 12. The Root Cause: DBAs are Indexing How did databases work before SQL?
  13. 13. The Root Cause: DBAs are Indexing Index use was intrinsically tied to the queries.
  14. 14. The Root Cause: DBAs are Indexing Example: dBase Developers had to... ...use indexes explicitly when searching: !"#$%&'"($#)$*+!#,&+-" $$$.%&'$/%&+&' ...take care of index maintenance: !"#$%&'"($#)$*+!#,&+-"0$%'(1 $$$$+22"&'
  15. 15. The Root Cause: DBAs are Indexing SQL is an abstraction that only defines the logical view. The actual SQL implementation takes care of everything else.
  16. 16. The Root Cause: DBAs are Indexing SQL (language) has: Tables Constraints Views Transactions SQL Databases (software) have: Storage management High Availability Bugs & patches Indexes Tuning parameters Queries Data manipulation Backup & recovery
  17. 17. The Root Cause: DBAs are Indexing Developers Tables Constraints Views Transactions SQL Databases (software) have: Storage management High Availability Bugs & patches Indexes Tuning parameters Queries Data manipulation Backup & recovery
  18. 18. The Root Cause: DBAs are Indexing Developers Tables Constraints Views Transactions Administrators Storage management High Availability Bugs & patches Indexes Tuning parameters Queries Data manipulation Backup & recovery
  19. 19. The Root Cause: DBAs are Indexing Today, indexing is considered a tuning task that belongs to the administrators responsibilities.
  20. 20. The Root Cause: DBAs are Indexing A misconception that causes new problems: DBAs don’t know the queries Have to “investigate” to find the queries. It is time consuming and almost always incomplete. by G-10gian82 deviantart.com
  21. 21. The Root Cause: DBAs are Indexing A misconception that causes new problems: DBAs don’t know the queries DBAs can’t change the queries Have to “investigate” to find the queries. Can make the index match the query. It is time consuming and almost always incomplete. Can’t make the query match the index!
  22. 22. The Solution Indexing is a Development Task © 2014 by Markus Winand
  23. 23. The Solution: It’s a Dev Task Developers Tables Constraints Views Transactions Administrators Storage management High Availability Bugs & patches Indexes Tuning parameters Queries Data manipulation Backup & recovery
  24. 24. The Solution: It’s a Dev Task Developers Tables Constraints Views Storage management High Availability Transactions Queries Data manipulation Administrators Indexes ust M ch! mat Backup & recovery Bugs & patches Tuning parameters
  25. 25. Another Problem: It’s not Taught Indexes are not part of the pure SQL (language) literature because indexes are not part of the SQL standard. 11 SQL books analyzed: only 1.0% of the pages are about indexes (70 out of 7330 pages). Examples: Oracle SQL by Example: 2.0% (19/960) Beginning DBs with PostgreSQL: 0.8% (5/664) Learning SQL: 3.3% (11/336 — highest rate in class)
  26. 26. Another Problem: It’s not Taught Proper index usage is sometimes covered in database tuning books but is always buried between hundreds of pages of HW, OS and DB parameterization topics. 14 database administration books analyzed: 5.1% of the pages are about indexes (307 out of 6069 pages). Examples: Oracle Performance Survival Guide: 5.2% (38/730) High Performance MySQL: 8% (55/684) PostgreSQL 9 High Performance: 5.8% (27/468)
  27. 27. Another Problem: It’s not Taught Consequence: Developers don’t know how to use indexes properly. Results of the 3-minute online quiz: http://use-the-index-luke.com/3-minute-test 5 questions: each about a specific index usage pattern. Non-representative!
  28. 28. 3-Minute Quiz: Indexing Skills Q1: Good or Bad? (Function use) 345675$89:5;$#<*,%'($=9$#<*$>!"#$%&'()*+?@ A5B537$#"(#0$'+#",C)*D-& $$E4=F$#<* $/G545$,-%./012!"#$%&'()*+0$HIIIIH?$J$H1KLMH@ http://use-the-index-luke.com/sql/where-clause/obfuscation/dates
  29. 29. 3-Minute Quiz: Indexing Skills NNN/G545$7=,3G64>'+#",C)*D-&0$HIIIIH?$J$H1KLMH@ 3$453&"+$)&$#<*$>O)P!JMQR? $$$4)P!$4"-)S"'$<T$E%*#"OU$VWQMR 7)#+*$OD&#%-"U$66789:;5*< 270x NNN/G545$'+#",C)*D-&$X57/559$H1KLMYKLYKLH $$$$$$$$$$$$$$$$$$$$$$$$$69:$H1KLMYL1YMLH faster =+!$>53&"+$D!%&Z$#<*,%'($)&$#<*$>O)P!JMQR? $7)#+*$OD&#%-"U$?8@A?5*< (Above: simplified PostgreSQL execution plans when selecting 365 rows out of 50000)
  30. 30. 3-Minute Quiz: Q1 — Results
  31. 31. 3-Minute Quiz: Indexing Skills Q2: Good or Bad? (Indexed Top-N, no IOS) 345675$89:5;$#<*,%'($=9$#<*$>+0$'+#",C)*?@ A5B537$%'0$+0$'+#",C)* $$E4=F$#<* $/G545$+$J$[ $=4:54$XI$'+#",C)*$:5A3 $B=C=,56@ http://use-the-index-luke.com/sql/partial-results/top-n-queries
  32. 32. 3-Minute Quiz: Indexing Skills It is already the most optimal solution (not considering index-only scan). $B%-%#$>O)P!JL? $$$Y$=+!$>53&"+5D"&EF"G!$D!%&Z$#<*,%'($)&$#<*$$>O)P!JL? $$$$$$8&'"($3)&'U$>+$J$L1MUU&D-"O%C?$ $7)#+*$OD&#%-"U$?8?HA5*< As fast as a primary key lookup because it can never return more than one row.
  33. 33. 3-Minute Quiz: Q2 — Results Understandable controversy!
  34. 34. 3-Minute Quiz: Indexing Skills Q3: Good or Bad? (Column order) CREATE INDEX tbl_idx ON tbl (a, b); SELECT id, a, b FROM tbl WHERE a = ? AND b = ?; SELECT id, a, b FROM tbl WHERE b = ?; http://use-the-index-luke.com/sql/where-clause/the-equals-operator/concatenated-keys
  35. 35. 3-Minute Quiz: Indexing Skills As-is only one query can use the index (a,b): ...WHERE a = ? AND b = ?; $$X%#-+2$G"+2$AC+&$)&$#<*$$>O)P!JQ? $$$Y$X%#-+2$=+!$>53&"+$)&$#<*,%'($>O)P!JQ? $$$$$$8&'"($3)&'U$>>+$J$L1M?$69:$><$J$L?? $$7)#+*$OD&#%-"U$?8?HH5*< ...WHERE b = ?; $3$453&"+$)&$#<*$>O)P!JRLV1? $$$4)P!$4"-)S"'$<T$E%*#"OU$VV]R] $7)#+*$OD&#%-"U$I:87@:5*<
  36. 36. 3-Minute Quiz: Indexing Skills Change the index to (b, a) so both can use it: ...WHERE a = ? AND b = ?; $$X%#-+2$G"+2$AC+&$)&$#<*$$>O)P!JQ? $$$Y$X%#-+2$=+!$>53&"+$)&$#<*,%'($>O)P!JQ? $$$$$$8&'"($3)&'U$>>+$J$L1M?$69:$><$J$L?? $$7)#+*$OD&#%-"U$?8?H;5*< 4x faster X%#-+2$G"+2$AC+&$)&$#<*$>O)P!JRLV1? ...WHERE b = ?; $Y$X%#-+2$=+!$>53&"+$)&$#<*,%'($$>O)P!JRLV1? $$$$8&'"($3)&'U$><$J$LUU&D-"O%C? 7)#+*$OD&#%-"U$;8:AI5*<
  37. 37. 3-Minute Quiz: Q3 — Results
  38. 38. 3-Minute Quiz: Indexing Skills Q4: Good or Bad? CREATE ON SELECT FROM WHERE (Indexing LIKE) INDEX tbl_idx tbl (text varchar_pattern_ops); id, text tbl text LIKE '%TERM%'; http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/like-performance-tuning
  39. 39. 3-Minute Quiz: Indexing Skills B-Tree indexes don’t support prefix wildcards. $3$453&"+$)&$#<*$>O)P!JK? $$$4)P!$4"-)S"'$<T$E%*#"OU$RKKKK $7)#+*$OD&#%-"U$IA8@:@5*< Using a special purpose index (e.g. in PostgreSQL): 345675$89:5;$#<*,#ZO-$=9$#<* $^A89_$Z%&$>#"(#$Z%&,#OZ-,)2!?@ 200x (rows=0) faster Recheck: 2 Bitmap Heap Scan on tbl Rows Removed by Index -> Bitmap =+!$>53&"+ on tbl_tgrm (rows=2) Index Cond: ((text)::text ~~ '%TERM%'::text) Total runtime: ?866@5*<
  40. 40. 3-Minute Quiz: Q4 — Results
  41. 41. 3-Minute Quiz: Indexing Skills Q5: Good or Bad? (equality vs. ranges) CREATE INDEX tbl_idx ON tbl (date_col, state); SELECT id, date_col, state FROM tbl WHERE date_col >= TO_DATE(‘2008-11-07’) AND state = 'X'; http://use-the-index-luke.com/sql/where-clause/searching-for-ranges/greater-less-between-tuning-sql-access-filter-predicates
  42. 42. 3-Minute Quiz: Indexing Skills For the curious, the data distribution: $$A5B537$C)D&#>`?$E4=F$#<*NNN $NNN$/G545$'+#",C)*D-&$J$7=,:675>a1KK]YLLYKbc?@ $YY$67I; $NNN$/G545$!#+#"$J$H;H@$$ $YY$6???? $NNN$/G545$'+#",C)*D-&$J$7=,:675>a1KK]YLLYKbc?$ $$$$$$$69:$!#+#"$J$H;H@$$$$ $YY$A;H
  43. 43. 3-Minute Quiz: Indexing Skills CREATE INDEX tbl_idx ON tbl (date_col, state); $=+!$>53&"+$D!%&Z$#<*,%'($)&$#<*$>O)P!JMQR? $7)#+*$OD&#%-"U$?87:A5*< CREATE INDEX tbl_idx ~70% ON tbl (state, date_col); faster =+!$>53&"+ D!%&Z$#<*,%'($)&$#<*$>O)P!JMQR? $7)#+*$OD&#%-"U ?8HA?5*<
  44. 44. 3-Minute Quiz: Q5 — Results No segregated data! (Other DBs ask about index-only scan, but PostgreSQL didn't support them at the time the quiz was designed)
  45. 45. Indexes: The Neglected All-Rounder Everybody knows indexing is important for performance, yet nobody takes the time to learn and apply is properly.
  46. 46. Indexes: The Neglected All-Rounder Index details are hardly known. ! “Details” like column-order or equality vs. range conditions must be learned and understood. Only one index capability is used: finding data quickly ! Indexes have three capabilities (powers): finding data, clustering data, and sorting data. Indexing is done from single query perspective. ! Should be done from application perspective (considering all queries). It’s a design task!
  47. 47. Indexes: The Neglected All-Rounder Are you just adding indexes or are you designing indexes?
  48. 48. About Markus Winand Tuning developers for high SQL performance Training & co (one-man show): winand.at Author of: SQL Performance Explained Geeky blog: use-the-index-luke.com

×