Fosdem 2012 practical_indexing

422 views
374 views

Published on

Presented at Fosdem 2012

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
422
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Fosdem 2012 practical_indexing

  1. 1. Practical MySQL indexing guidelinesFOSDEMFebruary 5th, 2012 Stéphane CombaudonBrussels stephane.combaudon@dailymotion.com
  2. 2. Agenda Introduction Bad indexes & performance drops Guidelines for efficient indexing Tools and methods to improve index usage 2
  3. 3. Introduction 3
  4. 4. Goals Having fun with indexes!!! Getting rid of trial-and-error approach Knowing performance penalty of bad indexes Being productive  Knowing simple rules to design indexes  Knowing tools that can help 4
  5. 5. Indexing basics Index: data structure to speed up SELECTs  Think of an index in a book  In MySQL, key = index  Well consider that indexes are trees InnoDBs clustered index  Data is stored with the PK: PK lookups are fast  Secondary keys hold the PK values  Designing InnoDBs PKs with care is critical for perf. 5
  6. 6. Strengths An index can filter and/or sort values An index can contain all the fields needed for a query  No need to access data anymore A leftmost prefix can be used  Indexes on several columns are useful  Order of columns in composite keys is important 6
  7. 7. Limitations MySQL only uses 1 index per table per query  Ok, thats not 100% true (OR clauses...)  Think of composite indexes when you can!! Cant index full TEXT fields  You must use a prefix  Same for BLOBS and long VARCHARs Maintaining an index has a cost  Read speed vs write speed 7
  8. 8. Sample table we will use CREATE TABLE t (   id INT NOT NULL AUTO_INCREMENT,   a INT NOT NULL DEFAULT 0,   b INT NOT NULL DEFAULT 0,   [more columns here]   PRIMARY KEY(id) )ENGINE=InnoDB; Populated with ”many” rows  Means that queries against t are ”slow” Replace ”many” and ”slow” with your own values 8
  9. 9. Bad indexes &Performance drops 9
  10. 10. Adding an index 3 main consequences:  Can speed up queries (good)  Increases the size of your dataset (bad)  Slows down writes (bad) How big is the write slow-down?  Lets have simple tests 10
  11. 11. Write slow-downs, pictured In-memory test 300 Inserting N rows in PK order 250 Baseline is 100 with PK onlyTime to load data 200 PK + 1 idx for both graphs 150 PK + 2 idx PK + 3 idx 100 50 For in-memory workloads, adding 2 keys makes perf. 0 Number of indexes 2x worse On-disk test 12000 10000 For on-disk workloads,Time to load data 8000 PK + 1 idx 6000 PK + 2 idx adding 2 keys make perf. PK + 3 idx 40x worse!! 4000 2000 0 11 Number of indexes
  12. 12. So what? Removing bad indexes is crucial for perf.  Especially for write-intensive workloads  Tools will help us, more on this later What if your workload is read-intensive?  A few hot tables may handle most of the writes  These tables will be write-intensive 12
  13. 13. Identifying bad indexes Before removing bad indexes, we have to identify them! What is a bad index?  Duplicate indexes: always bad  Redondant indexes: generally bad  Low-cardinality indexes: depends  Unused indexes: always bad 13
  14. 14. Guidelines for efficient indexes 14
  15. 15. Before we start... Indexing is not an exact science  But guessing is not the best way to design indexes A few simple rules will help 90% of the time Always check your assumptions  EXPLAIN does not tell you everything  Time your queries with different index combinations  SHOW PROFILES is often valuable Slow query log is a good place to start! 15
  16. 16. Rule #1: FilterQ1: SELECT * FROM t WHERE a = 10 AND b = 20 Without an index, always a full table scan1. mysql> EXPLAIN SELECT * FROM t WHERE a = 10 AND b = 20G2. *********** 1. row ***********3.            id: 14.   select_type: SIMPLE5.         table: t ALL means6.          type: ALL table scan7. possible_keys: NULL8.           key: NULL9.       key_len: NULL10.          ref: NULL Estimated #12.         rows: 1000545 of rows to read12.        Extra: Using where Post-filtering needed to discard the non-matching rows 16
  17. 17. Rule #1: Filter Idea: filter as much data as possible by focusing on the WHERE clause Candidates for Q1:  key(a), key(b), key(a,b), key(b,a) Condition is on both a and b with an AND  A composite index should be better  Lets test! 17
  18. 18. Rule #1: Filter1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...2. ********** 1. row ********** 2. ********** 1. row **********3.            [...] 3.            [...]4.           key: a 4.           key: b5.       key_len: 4 5.       key_len: 46.            [...] 6.            [...]7.          rows: 20 7.          rows: 67368Exec time: 0.00s Exec time: 0.20s1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...2. ********** 1. row ********** 2. ********** 1. row **********3.            [...] 3.            [...]4.           key: ab 4.           key: ba5.       key_len: 8 5.       key_len: 86.            [...] 6.            [...]7.          rows: 10 7.          rows: 10Exec time: 0.00s Exec time: 0.00s Same perf. for this query Other queries will guide us 18 to choose between them
  19. 19. Rule #2: SortQ2: SELECT * FROM t WHERE a = 10 ORDER BY b Remember: indexed values are sorted An index can avoid costly filesorts  Think of filesorts performed on on-disk temp tables  ORDER BY clause must be a leftmost prefix of the index Caveat: an index scan is fast in itself, but retrieving the rows in index order may be slow  Seq. scan on index but random access on table 19
  20. 20. Rule #2: Sort  Lets try key(b) for Q2 vs full table scan1. mysql> EXPLAIN SELECT * ... 1. mysql> EXPLAIN SELECT * ...2. ********** 1. row ********** 2. ********** 1. row **********3.            […] 3.            […]4.          type: index 4.          type: ALL5.           key: b 4.           key: NULL6.       key_len: 4 5.       key_len: NULL7.            [...] 6.            [...]8.          rows: 1000638 7.          rows: 10006389.         Extra: Using where 8.         Extra: Using where;                       Using filesort Exec time: 1.52s Exec time: 0.37s EXPLAIN suggest key(b) is better, 20 but its wrong!
  21. 21. Rule #2: Sort An index is not always the best for sorting If possible, try to sort and filter Exception to the leftmost prefix rule:  Leading columns appearing in the WHERE clause as constants can fill the holes in the index  WHERE a = 10 ORDER BY b: key(a,b) can filter and sort  Not true with WHERE a > 10 ORDER BY b 21
  22. 22. Rule #2: Sort  With key(a,b)1. mysql> EXPLAIN SELECT * FROM t 1. mysql> EXPLAIN SELECT * FROM tWHERE a = 10 ORDER BY bG WHERE a > 10 ORDER BY bG2. ********** 1. row ********** 2. ********** 1. row **********3.            [...] 3.            [...]4.          type: ref 4.          type: ALL5.           key: ab 5.           key: NULL6.       key_len: 8 6.       key_len: NULL7.            [...] 7.            [...]8.          rows: 20 8.          rows: 10006389.         Extra: 9.         Extra: Using where;                       Using filesort Could have been a range scan Depends on the distribution of the values 22
  23. 23. Rule #3: CoverQ3: SELECT a,b FROM t WHERE a > 100; With key(a), you filter efficiently But with key(a,b)  You filter  The index holds all the columns you need  Means you dont need to access data key(a,b) is a covering index 23
  24. 24. Rule #3: Cover Back to InnoDBs clustered index  It is always covering  SELECT by PK is the fastest access with InnoDB  Take care of your PKs!! Remember full table scan + filesort vs index?  If the index used for sorting is also covering, it will outperform the table scan 24
  25. 25. Rating an index An index can give you 3 benefits: filtering, sorting, covering 1-star index: 1 property 2-star index: 2 properties 3-star index: 3 properties This is my own rating, other systems exist 25
  26. 26. Range queries and ORDER BYQ4: SELECT * FROM t WHERE a > 10 and b = 20 ORDER BY amysql> EXPLAIN SELECT * ...G********** 1. row **********           [...] Key filters and sorts, but         type: range filtering is not efficient.          key: a Getting data is very slow         rows: 500319 (random access + I/O-bound)        Extra: Using whereExec time: 35.9smysql> EXPLAIN SELECT * ...G********** 1. row **********           [...]         type: ref Key filters but doesnt sort.possible_keys: a,b,ab,ba Filtering is efficient so getting data,          key: ba post-filtering and post-sorting         rows: 64814 is not too slow        Extra: Using where; 26               Using filesortExec time: 0.2s
  27. 27. Joins and ORDER BY All columns in the ORDER BY clause must refer to the 1st table  I mean the 1st table to be joined, not the 1st table written in your query!! Forcing the join order with SELECT STRAIGHT_JOIN is sometimes useful Sometimes you cant fulfill this condition  This can be a good reason to denormalize 27
  28. 28. Tools and methodsto improve index usage 28
  29. 29. Userstats v2 You need Percona Server or MariaDB 5.2+mysql> SELECT s.table_name,s.index_name,rows_read       FROM information_schema.statistics s       LEFT JOIN information_schema.index_statistics i       ON (i.table_schema=s.table_schema Table added by           AND i.table_name=s.table_name this feature           AND i.index_name=s.index_name)       WHERE s.table_name=comment             AND s.table_schema=mydb             AND seq_in_index=1; Deals with composite indexes+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| table_name | index_name           | rows_read |+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| comment    | PRIMARY              |     50361 || comment    | user_id              |      NULL | Useless| comment    | video_comment_idx    |  18276197 || comment    | created_language_idx |      NULL | Useless+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+ 29
  30. 30. Workload matters! OLAP server+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| table_name | index_name           | rows_read |+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| comment    | PRIMARY              |     50361 || comment    | user_id              |      NULL | Never used| comment    | video_comment_idx    |  18276197 || comment    | created_language_idx |      NULL |+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+ OLTP server+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| table_name | index_name           | rows_read |+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+| comment    | PRIMARY              |  12220798 || comment    | user_id              |  96674982 | Useful!| comment    | video_comment_idx    | 365691254 || comment    | created_language_idx |    217176 |+­­­­­­­­­­­­+­­­­­­­­­­­­­­­­­­­­­­+­­­­­­­­­­­+ 30
  31. 31. Pros Very easy to use  Turn on the variable and forget Easy to write queries to discover unused indexes automatically 31
  32. 32. Cons Large sample period needed for accurate stats Not always obvious to say if index is useful  Look at created_language_idx in previous slide Has some CPU overhead 32
  33. 33. pt-duplicate-key-checker Anything wrong with the keys?CREATE TABLE comment (  comment_id int(10) ... AUTO_INCREMENT,  video_id int(10) ...,  user_id int(10) ...,  language char(2) ...,  [...]  PRIMARY KEY (comment_id), Tool is aware  KEY user_id (user_id), of InnoDBs  KEY video_comment_idx (video_id,language,comment_id) clustered index!) ENGINE=InnoDB;$ pt­duplicate­key­checker u=root,h=localhost[...]# Key video_comment_idx ends with a prefix of the clustered index# Key definitions:#   KEY video_comment_idx (video_id,language,comment_id) Query to remove#   PRIMARY KEY (comment_id),[...] the index# To shorten this duplicate clustered index, execute:ALTER TABLE mydb.comment DROP INDEX video_comment_idx, ADD INDEX  33video_comment_idx (video_id,language)
  34. 34. pt-index-usage Helps answer questions not solved by userstats  Are there any queries with a changing exec plan?  Is an index necessary for a query? Read a slow log file/general log file Can give you invaluable information on your index usage  See the man page for more 34
  35. 35.  Thanks for your attention! Any questions? 35

×