Optimizing MySQL Queries


Published on

CTO Dr. Aris Zakinthinos explain the hands-on approach to optimize MySQl queries.

Published in: Technology

Optimizing MySQL Queries

  2. 2. Goals• To show you how to use MySQL Explain• More importantly, how to think about whatMySQL is doing when it is processing a query
  4. 4. What’s an Index?• An index is a data structure that improves thespeed of data retrieval operations
  5. 5. Types of Indexes• Unique/Index– B+ Tree indexes– Hash indexes• Spatial– R-Tree indexes• Full-text indexes
  6. 6. Column Index Type Definitions• Column Index– An index on a single column• Compound Index– An index on multiple columns• Covering Index– “Covers” all columns in a query• Partial Index– A subset of a column for the index• E.g. Only the first 10 characters of a person’s name
  7. 7. Compound Index• CREATE TABLE test (id INT NOT NULL,last_name CHAR(30) NOT NULL,first_name CHAR(30) NOT NULL,PRIMARY KEY (id),INDEX name(last_name,first_name) );• The name index is an index over thelast_name and first_name columns
  8. 8. What does that mean?• Queries like:– SELECT * FROM test WHERE first_name=‘Aris’ ANDlast_name=‘Zakinthinos’;– SELECT * FROM test WHERE last_name=‘York’;– Will use the index• But a query like:– SELECT * FROM test WHERE first_name=‘Zak’;– Will not
  9. 9. How Compound Indexes are Used• If you have an index on (col1, col2,col3)• This Index will be used on queries for (col1),(col1, col2) and (col1, col2, col3)– Notice that the leftmost prefix must exist• Only the col1 part of the index will be used forqueries for (col1, col3)• This Index will not be used for queries for(col2), (col3) and (col2, col3)
  10. 10. One Thing to Keep in Mind• Remember that the index stores things insorted order so an n-field compound index isthe equivalent of sorting the data on n fields• For example, for a 2 column index:COL1 COL2A 4Z 3A 5Z 1Index(A,4)(A,5)(Z,1)(Z,3)
  11. 11. Pro Tip• MySQL always adds the primary key to theend of your index– You never have to add it to the end of yourcompound key
  12. 12. Can you have too many indexes?• YES!• They take up space– You want all your indexes to fit in memory• They make inserts/deletes slower– Remember that you need to insert/deleteinto/from each index
  14. 14. What does Explain do?• Shows MySQL’s query execution plan
  15. 15. What does that mean?• How many tables are used• How tables are joined• How data is looked-up• Possible and actual index use• Length of index used• Approximate number of rows examined
  16. 16. Why should I care?• Ultimately, less server load leads to a betteruser experience– Could be the difference between usable andbankrupt• Impress your friends• Get better jobs
  17. 17. Which queries should I examine?• Every single one!• If you have never used EXPLAIN:– Start by looking at the items in the slow query log– Or, execute SHOW FULL PROCESSLIST everyonce in a while and grab a query that you see veryoften
  18. 18. Pro Tip• Using EXPLAIN during QA is better than inproduction• It avoids you having to say:– “It’s not a problem—we call it the coffee breakfeature.”
  19. 19. Can I run it on any query?• Up to MySQL 5.6, it only worked on SELECTqueries
  20. 20. Anything else I should know?• It doesn’t execute your query but MAYexecute portions of it. CAUTION!– Nested subqueries are executed
  22. 22. Test Database• We will be using MySQL’s sakalia testdatabase, available at:http://dev.MySQL.com/doc/index-other.html• It is a sample database for a DVD rental store
  23. 23. How do you use Explain?• To get MySQL’s execution plan, simply put theword ‘EXPLAIN’ in front of your selectstatement:EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  24. 24. Explain Output• Using a pretty GUI:
  25. 25. Explain Output• On the commandline it is easier toread if you use Gat the end of theline:
  27. 27. Column DefinitionsA sequential ID identifying the select in the query.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  28. 28. Column DefinitionsThe type of select you are doing. Common values:• SIMPLE – Simple SELECT (not using UNION or subqueries)• PRIMARY – Outermost SELECT• UNION – Second or later SELECT in a UNION• SUBQUERY – First SELECT in a subquery• DEPENDENT SUBQUERY - First SELECT in subquery, dependent on outer query• DERIVED - Derived table select (subquery in FROM clause)
  29. 29. Column DefinitionsThe Table or Alias this row refers to.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  30. 30. Column DefinitionsThe join type. Lots more to come on this.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  31. 31. Column DefinitionsThe possible indexes that could be used.If NULL then no appropriate index was found.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  32. 32. Column DefinitionsThe Index that was used.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  33. 33. Column DefinitionsThe number of bytes MySQL uses from the index.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  34. 34. Column DefinitionsThe columns (or constants) form the index that are used.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  35. 35. Column DefinitionsThe approximate number of rows examined.Note: this is just a guide.EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  36. 36. Column DefinitionsInformation on how the tables are join.Lots more to come on this!EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  37. 37. JOIN TYPES
  38. 38. Join Type – system/const• At most one row returned– E.g., … WHERE id=1– Constant can be propagated through query– Index must be either PRIMARY or UNIQUEEXPLAIN SELECT * FROM rental WHERE rental_id = 10;
  39. 39. Join Type – eq_ref• Index lookup returns exactly one row– E.g., WHERE a.id=b.id– Requires unique index on all parts of the key usedby the joinEXPLAIN SELECT * FROM customer cJOIN address a ON c.address_id = a.address_id
  40. 40. Join Type – ref• Index lookup that can return more than onerow– Used when• Either leftmost part of a unique key is used• Or a non-unique or non-null key is usedEXPLAIN SELECT * FROM rental WHERE rental_id IN (10,11,12)AND rental_date = 2006-02-01;
  41. 41. Join Type– ref_or_null• Similar to ref but allows for null values or nullconditions– Essentially an extra pass to look for nullsEXPLAIN SELECT * FROM film WHERErelease_year = 2006 OR release_year IS NULL;
  42. 42. Join Type – index_merge• Uses 2 separate indexes– Extra field shows more info– Can be one of:• sort_union – OR condition on non-primary key fields• union – OR condition using constants or ranges on primarykey fields• intersection – AND condition with constants or rangeconditions on primary key fieldsEXPLAIN SELECT * FROM rental WHERE rental_id IN (10,11,12)OR rental_date = 2006-02-01;
  43. 43. Join Type – range• Access method for a range value in the whereclause (<, <=, >, >=, LIKE, IN or BETWEEN,IN())– Performs a partial index scan– Lots of optimizations for this type of queryEXPLAIN SELECT * FROM rental WHERE rental_dateBETWEEN 2006-01-01 AND 2006-07-01;
  44. 44. Join Type – index• Does an index scan– You are doing a full scan of every record in theindex– Better than ‘ALL’ but still requires a LOT ofresources– Note: This is not the same as ‘USING INDEX’ inExtrasEXPLAIN SELECT rental_date FROM rental;
  45. 45. Join Type – ALL• Full table scan– It will look at every record in the table– Unless you want the whole table, it should beavoided for all but the smallest of tablesEXPLAIN SELECT * FROM rental;
  46. 46. Join Type Summary• From best to worst– system/const– eq_ref– ref– ref_or_null– index_merge– range– index– ALL
  48. 48. A More Complicated Query• All identical id values are part of the same select• This query has a bunch of UNIONs• You can also see all of the join types used by this Query
  50. 50. Execution Order• First thing to notice is that the order of execution is not the same as the query• MySQL will rearrange your query to do what it thinks is optimal• You will often disagree!EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  51. 51. Query ProcessingThe above can be read as:for (each row in table a [Actor] ) {for (each row in fa [film_actor] matching a.actor_id) {for the row in f [film] matching fa.film_id ) {Add row to output}}} EXPLAIN SELECT * FROM film fJOIN film_actor fa ON fa.film_id = f.film_idJOIN actor a ON a.actor_id = fa.actor_id;
  52. 52. Query Processing In Generalfor (each row in outer table matching where clause){…for (each row in inner table matching key valueand where clause) {Add to output}}
  53. 53. rows is a critical performance indicator• Total approximate rows is the PRODUCT of allthe rows with the same ID• It is only an estimate• This query estimated a total of 68,640 rows• Actual rows examined 2,969,639
  54. 54. Remember: it is an estimateRemember this query: Here is the rest:This query executes in less than200 ms because there are limitclauses that only return a smallnumber of rows.
  55. 55. key_len• This will tell you how much of the index it will use– Really only useful if you have compound keys• For a complete list of type to byte mapping, see:– http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html• What’s missing from that page:– VARCHAR(n)will always use a 2 byte length fields– If a column can be NULL it takes one extra byte in theindex• Warning! Multibyte characters make byte!=character– UTF8 is 3 bytes
  56. 56. key_len exampleEXPLAIN SELECT film_id, title FROM film WHERE description LIKEA Epic% AND release_year=2006 AND language_id = 1;CREATE TABLE `film` (...`description` text,`release_year` year(4) DEFAULT NULL,`language_id` tinyint(3) unsigned NOT NULL,...KEY `idx_compound_key` (`language_id`,`release_year`,`description`(20)),) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8;tinyint – 1 byte year – 1 byte + 1 byte possible NULLDescription 20*3 bytes+2 bytelength+1 byte possible NULL
  57. 57. EXTRA COLUMN
  58. 58. Pay Close Attention• This column will give you a good sense ofwhat is going to happen during execution
  59. 59. Extra Column – The Bad• Using temporary– During execution, a temporary table was required– Not horrible if the temporary table is small since itwill sit in memory
  60. 60. Extra Column – The Bad• Using filesort– Sorting was needed rather than using an index– Not horrible if the result set is small
  61. 61. Extra Column – The Good• Using index– The query was satisfied using only an index
  62. 62. Extra Column – The Whatever• Using where– A where clause is used to restrict rows– Unless you also see ‘Using Index,’ this means thatMySQL had to read the row from the database toapply the where clause
  64. 64. Signs That Your Query Stinks• No index is used – NULL in key column• Large number of rows estimated• Using temporary• Using filesort• Using derived tables – DERIVED inselect_type column• Joining on derived tables• Having dependent subqueries on a large resultset
  65. 65. Examples:EXPLAIN SELECT * FROM film f WHERE release_year = 2006;ALTER TABLE `sakila`.`film`ADD INDEX `idx_release_year` (`release_year` ASC) ;
  66. 66. Examples:EXPLAIN SELECT rating, COUNT(*) FROM filmWHERE rental_rate <1.00GROUP BY rating;• This is a terrible query.• A full table scan that creates a temporary table and then sorts it.• Why?
  67. 67. Grouping• To execute this query, MySQL will build atemporary table with all the rows that haverental_rate <1.00• It will then sort them by rating• It will then count all of the items that have thesame rating• To make this query fast you need all of theratings to be processed in order.– That is, you have to avoid the build and sort step
  68. 68. Adding an IndexALTER TABLE `sakila`.`film` ADD INDEX `idx_rating`(`rating` ASC, `rental_rate` ASC) ;EXPLAIN SELECT rating, COUNT(*) FROM filmWHERE rental_rate <1.00GROUP BY rating;• This works because MySQL can process all the rows with the same ‘rating’ sequentially.• The second part of the index allows it to check the where clause from the index directly.
  69. 69. Optimizing GROUP BY• All columns used in the GROUP BY mustcome from the same index and the index muststore the keys in the order specified in theGROUP BY• If this isn’t true you will see a “Usingfilesort” and/or “Using temporary”
  70. 70. Optimizing ORDER BY• An index can be used with ORDER BY evenif the index doesn’t match the ORDER BYcolumns exactly as long as the “missing” keysare constants in the WHERE clause–The order of the keys must match the orderof the ORDER BY clause• If this isn’t true you will see a “Usingfilesort” and/or “Using temporary”
  71. 71. Both GROUP BY and ORDER BYEXPLAIN SELECT rating, COUNT(*) AS c FROM filmWHERE rental_rate <1.00GROUP by ratingORDER BY c;• There is no way to avoid the ‘Using temporary’ and the ‘Using filesort’• You are asking MySQL to first sort by rating to process the GROUP BY which it does usingan index.• Then you are asking it to take that result and resort it from a computed field.• An ORDER BY column list which is in a different order than the GROUP BY list will causeproblem.
  72. 72. What if I can’t make it better?• Never give up!– There is always a solution. You might not want todo it, but there is always a solution.• You might need to:– Denormalize your data to allow you to construct abetter compound index– Cache outside the database– Pull some of the processing into your application– Break up the query into smaller faster chunks
  74. 74. Test with Production Data• Do not optimize with a subset of your data• MySQL uses table statistics to determine itsexecution plan• If you optimize with test data your real worldperformance might be completely different
  75. 75. PRACTICE!• Like everything, you need to do it, to fullyunderstand it• Don’t expect to be a MySQL ninja overnight• It takes hard work but the benefits are worthit– 10 second searches optimized to 100 milliseconds