Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MySQL Query Optimisation 101

366 views

Published on

An introduction to query optimisation. The talk is focused on MySQL, but this information is mostly true for any relational DBMS.

Published in: Software
  • Be the first to comment

MySQL Query Optimisation 101

  1. 1. MySQL Query Optimisation 101
  2. 2. € whoami ● Federico Razzoli ● Freelance consultant ● Writing SQL since MySQL 2.23 hello@federico-razzoli.com ● I love open source, sharing, Collaboration, win-win, etc ● I love MariaDB, MySQL, Postgres, etc ○ Even Db2, somehow
  3. 3. Why is the database important?
  4. 4. Remember the Von Neumann machine?
  5. 5. It’s always about Data ● Since then, the purpose of hardware and software never changed: ○ Receive data ○ Process data ○ Output data
  6. 6. A rose by any other name... ● Feel free to use synonyms ○ Validate ○ Sanitise ○ Parse ○ Persist ○ Normalise / Denormalise ○ Cache ○ Map / Reduce ○ Print ○ Ping ○ ...
  7. 7. ...would smell as sweet ● The database of known stars is not a ping package ● You use a DBMS to abstract data management as much as possible ○ Persistence ○ Consistence ○ Queries ○ Fast search ○ … ● That’s why “database is magic”
  8. 8. Busy ● But it’s just a very busy person, performing many tasks concurrently ● And each is: ○ Important (must be reliable) ○ Complex (must be fast) ○ Expected (if something goes wrong, you will complain) In practice, the DBMS is usually the bottleneck.
  9. 9. Terminology Statement: An SQL command Query: A statement that returns a resultset ...or any other statement :) Resultset: output 0 or more rows Optimiser / Query Planner: Component responsible of deciding a query’s execution plan Optimised query: A query whose execution plan is reasonably good ...this doesn’t imply in any way that the query is fast Database: set of tables (schema) Instance / Server: running MySQL daemon (cluster)
  10. 10. Performance
  11. 11. When should a query be optimised? mysql> EXPLAIN SELECT * FROM t WHERE c < 10 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t partitions: NULL type: ALL possible_keys: idx_c key: idx_c key_len: 4 ref: NULL rows: 213030 filtered: 50.00 Extra: Using where 1 row in set, 1 warning (0.01 sec)
  12. 12. When should a query be optimised? mysql> SELECT * FROM performance_schema.events_statements_summary_by_digestWHERE DIGEST = '254a65744e661e072103b7a7630dee1c3a3b8e906f19889f7c796aebe7cdd4f8' G *************************** 1. row *************************** SCHEMA_NAME: test DIGEST: 254a65744e661e072103b7a7630dee1c3a3b8e906f19889f7c796aebe7cdd4f8 DIGEST_TEXT: SELECT * FROM `t` WHERE `c` < ? COUNT_STAR: 1 ... SUM_ROWS_AFFECTED: 0 SUM_ROWS_SENT: 57344 SUM_ROWS_EXAMINED: 212992 SUM_CREATED_TMP_DISK_TABLES: 0 SUM_CREATED_TMP_TABLES: 0 SUM_SELECT_FULL_JOIN: 0 SUM_SELECT_FULL_RANGE_JOIN: 0 SUM_SELECT_RANGE: 0 SUM_SELECT_RANGE_CHECK: 0 SUM_SELECT_SCAN: 1 SUM_SORT_MERGE_PASSES: 0 SUM_SORT_RANGE: 0 SUM_SORT_ROWS: 0 SUM_SORT_SCAN: 0 SUM_NO_INDEX_USED: 1 SUM_NO_GOOD_INDEX_USED: 0 FIRST_SEEN: 2019-05-14 00:31:24.078967 LAST_SEEN: 2019-05-14 00:31:24.078967 ... QUERY_SAMPLE_TEXT: SELECT * FROM t WHERE c < 10 QUERY_SAMPLE_SEEN: 2019-05-14 00:31:24.078967 QUERY_SAMPLE_TIMER_WAIT: 117493874000
  13. 13. But how do I find impacting queries? ● It depends what you mean by “impacting” ● There are several monitoring methods (USE, etc) ● But 3 philosophies:
  14. 14. But how do I find impacting queries? ● It depends what you mean by “impacting” ● There are several monitoring methods (USE, etc) ● But 3 philosophies: ○ Panicking when you hear that something is down or slow
  15. 15. But how do I find impacting queries? ● It depends what you mean by “impacting” ● There are several monitoring methods (USE, etc) ● But 3 philosophies: ○ Panicking when you hear that something is down or slow ○ System-centric monitoring
  16. 16. But how do I find impacting queries? ● It depends what you mean by “impacting” ● There are several monitoring methods (USE, etc) ● But 3 philosophies: ○ Panicking when you hear that something is down or slow ○ System-centric monitoring ○ User-centric
  17. 17. But how do I find impacting queries? ● Panicking when you hear that something is down or slow ● System-centric monitoring ● User-centric You can use them all.
  18. 18. Panicking ● Simplest method ● Do nothing do prevent anything ○ Optionally, take a lot of actions to prevent imaginary problems in imaginary ways ○ There is no evidence that your job is useless, so your boss will not fire you
  19. 19. System-centric ● pt-query-digest, PMM, etc ● Merge queries into one, normalising its text and replacing parameters ○ SELECT * FROM t WHERE b= 111 AND a = 0 -- comment ○ Select * From t Where a = 24 and b=42; ○ SELECT * FROM t WHERE a = ? AND b = ? ● Sum execution time of each occurrence (Grand Total Time) ● Optimise the queries with highest GTT
  20. 20. User-Centric ● Calculate the cost of slowness (users don’t buy, maybe leave 4ever) ● Cost of slowness is different for different ○ URLs ○ Number of users ○ ...other variables that depend on your business ■ (day of month, country, etc) ● Set Service Level Objectives ● Monitor the HTTP calls latency, and the involved services ● Find out what’s slowing them down
  21. 21. Query Performance
  22. 22. What makes a query “important”? ● How many times it’s executed ● It’s locking
  23. 23. What makes a query slow? ● Number of rows read ○ Read != return ○ Indexes are there to lower the number of reads ● Number of rows written ○ In-memory temp tables are not good ○ On-disk temp tables are worse
  24. 24. How do I optimise a query? ● Use indexes properly ● Avoid creation of temp tables, if possible
  25. 25. What is an index?
  26. 26. Index Types ● BTREE - ordered data structure ● HASH - hash table ● PostgreSQL has much more ● Each storage engine can implement any of both ● InnoDB uses BTREE and internally uses HASH when it thinks it’s better ● The syntax CREATE INDEX USING [BTREE | HASH] is generally useless We will focus on BTREE indexes in InnoDB
  27. 27. Index Properties ● Primary key: unique values, not null ● UNIQUE ● Multiple columns ○ The order matters ● Column prefix (only for strings)
  28. 28. InnoDB Indexes ● InnoDB tables should always have a Primary Key ● The table is stored ordered by primary key ● The table itself is the primary key ○ Columns “not part of the primary key” simply don’t affect the order of rows
  29. 29. InnoDB Indexes Table columns: {a, b, c} Primary key: {a, b} A B C 1 1 4 1 2 1 2 1 9 2 2 3 3 0 3 4 20 0
  30. 30. InnoDB Indexes ● Secondary indexes are stored separately ● They are ordered by the indexed column ● Each entry contain a reference to a primary key entry
  31. 31. InnoDB Indexes Primary key: {a, b} Index idx_c: {c} c a b 0 4 20 1 1 2 3 2 2 3 3 0 4 1 1 9 2 1
  32. 32. Which queries will be faster? Table columns: {a, b, c, d, e} Primary key: {a, b} Index idx_c: {c} ● SELECT * FROM t WHERE a = 1 ● SELECT * FROM t WHERE a = 1 AND b = 2 ● SELECT a, b FROM t WHERE c = 0 ● SELECT d, e FROM t WHERE c = 0
  33. 33. More performance considerations?
  34. 34. More performance considerations? ● Big primary key = big indexes ● Primary key should be append-only INTEGER UNSIGNED AUTO_INCREMENT ● These indexes are duplicates: {a} - {a, b} ● This index is wrong: {a, id}
  35. 35. Index implementation
  36. 36. https://github.com/jeremycole/innodb_diagrams
  37. 37. More performance considerations?
  38. 38. More performance considerations? ● Writing to an index is relatively slow ● Deleting many rows leaves fragmented indexes
  39. 39. WHERE is the index?
  40. 40. Phone Book ● Indexes are ordered data structures ● Think to them as a phone book Table: {first_name, last_name, phone, address} Index: {last_name, first_name}
  41. 41. Phone Book ● I will show you some queries, and you will tell me which can be solved by using the index ● You may not know, but your mind contains a pretty good SQL optimiser Table: {first_name, last_name, phone, address} Index: {last_name, first_name}
  42. 42. Queries SELECT * FROM phone_book … WHERE last_name = 'Baker' WHERE last_name IN ('Hartnell','Baker', 'Whittaker') WHERE last_name > 'Baker' WHERE last_name >= 'Baker' WHERE last_name < 'Baker' WHERE last_name <= 'Baker' WHERE last_name <> 'Baker'
  43. 43. Queries SELECT * FROM phone_book … WHERE last_name IS NULL WHERE last_name IS NOT NULL
  44. 44. Rule #1 A BTREE can optimise point searches and ranges
  45. 45. Queries WHERE last_name >= 'B' AND last_name < 'C' WHERE last_name BETWEEN 'B' AND 'C' WHERE last_name LIKE 'B%'
  46. 46. Queries WHERE last_name LIKE 'B%' WHERE last_name LIKE '%B%' WHERE last_name LIKE '%B' WHERE last_name LIKE 'B_' WHERE last_name LIKE '_B_' WHERE last_name LIKE '_B'
  47. 47. Rule #2 A LIKE condition whose second operand starts with a 'constant string' is a range
  48. 48. Queries WHERE first_name = 'Tom' WHERE last_name = 'Baker' WHERE first_name = 'Tom' AND last_name = 'Baker' WHERE last_name = 'Baker' AND first_name = 'Tom'
  49. 49. Rule #3 We can use a whole index or its leftmost part
  50. 50. Queries WHERE LEFT(last_name, 2) = 'Ba' WHERE last_name = CONCAT('Ba', 'ker')
  51. 51. Rule #4 Optimiser cannot make assumptions on functions/expression results. However, wrapping a constant value into a function will produce another constant value, which is mostly irrelevant for query optimisation.
  52. 52. Queries WHERE last_name = first_name
  53. 53. Rule #5 Comparing a column with another results in a comparison whose operands change at every row. The optimiser cannot filter out any row in advance.
  54. 54. Queries WHERE last_name = 'Baker' AND phone = '+44 7739 427279'
  55. 55. Rule #6 We can use an index to restrict the search to a set of rows And search those rows in a non-optimised fashion Depending on this set’s size, this could be a brilliant or a terrible strategy
  56. 56. Queries WHERE last_name = 'Baker' AND first_name > 'Tom' WHERE last_name > 'Baker' AND first_name = 'Tom' WHERE last_name > 'Baker' AND first_name > 'Tom'
  57. 57. Queries WHERE last_name = 'Baker' AND first_name > 'Tom' WHERE first_name = 'Tom' AND last_name > 'Baker' WHERE first_name > 'Tom' AND last_name = 'Baker' WHERE last_name > 'Baker' AND first_name > 'Tom' Baker, Colin Baker, Tom Baker, Walter Capaldi, Ada Capaldi, Peter Whittaker, Jody Whittaker, Vadim
  58. 58. Rule #7 If we have a range condition on an index column The next index columns cannot be used If you prefer: Index usage stops at the first >
  59. 59. ORDER BY, GROUP BY
  60. 60. Mr Speaker talks to MySQL
  61. 61. Queries ORDER BY last_name ORDER BY first_name ORDER BY last_name, first_name ORDER BY first_name, last_name
  62. 62. Queries GROUP BY last_name GROUP BY first_name GROUP BY last_name, first_name GROUP BY first_name, last_name
  63. 63. Rule #8 ORDER BY and GROUP BY can take advantage of an index order or create an internal temp table Note: GROUP BY optimisation also depends on the function we’re using (MAX, COUNT…).
  64. 64. Queries WHERE last_name > 'Baker' ORDER BY last_name WHERE last_name = 'Baker' ORDER BY first_name WHERE last_name > 'Baker' ORDER BY first_name
  65. 65. Rule #9 If we have an ORDER BY / GROUP BY on an index column The next index columns cannot be used
  66. 66. Multiple Indexes
  67. 67. Queries Table: {id, a, b, c, d} idx_a: {a, d} idx_b: {b} WHERE a = 10 OR a = 20 WHERE a = 24 OR c = 42 WHERE a = 24 OR d = 42 WHERE a = 24 AND b = 42 WHERE a = 24 OR b = 42 WHERE a = 24 ORDER BY b GROUP BY a ORDER BY b
  68. 68. Rule #10 Using multiple indexes for AND or OR (intersect) is possible, but there is a benefit only if we read MANY rows Using different indexes in WHERE / GROUP BY / ORDER BY is not possible
  69. 69. Thank you kindly! https://federico-razzoli.com info@federico-razzoli.com

×