Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MySQL Optimizer Cost Model

8,174 views

Published on

The cost model is one of the core components of the MySQL optimizer. This presentation gives an overview over the MySQL Optimizer Cost Model, what is new in 5.7 and some ideas for further improvements.

Published in: Technology
  • Be the first to comment

MySQL Optimizer Cost Model

  1. 1. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | MySQL Cost Model Olav Sandstå Senior Principal Engineer MySQL OpFmizer Team, Oracle October 1, 2014
  2. 2. Safe Harbor Statement The following is intended to outline our general product direcFon. It is intended for informaFon purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or funcFonality, and should not be relied upon in making purchasing decisions. The development, release, and Fming of any features or funcFonality described for Oracle’s products remains at the sole discreFon of Oracle. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  3. 3. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Program Agenda IntroducFon Cost Based OpFmizaFon Cost Model Improvements in MySQL 5.7 Evolving the Cost Model 1 2 3 4
  4. 4. Cost Model Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | MySQL OpFmizer SELECT a, b FROM t1, t2, t3 WHERE t1.a = t2.b AND t2.b = t3.c AND t2.d > 20 AND t2.d < 30; MySQL Server Cost based opFmizaFons HeurisFcs OpFmizer Table/index info (data dicFonary) StaFsFcs (storage engines) t2 t3 Table scan t1 Range scan JOIN Ref access JOIN
  5. 5. MoFvaFon for Improving the MySQL Cost Model • Produce more correct cost esFmates Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | – Becer decisions by the opFmizer should improve performance • Adapt to new hardware architectures – SSD, larger memories, caches • More maintainable cost model implementaFon – Avoid hard coded “cost constants” – Refactoring of exisFng cost model code • Configurable and tunable • Make more of the opFmizer cost-­‐based Faster queries
  6. 6. Cost-­‐based OpFmizaFon Op0mizer Cost Model Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  7. 7. Cost-­‐based Query OpFmizaFon General idea: Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • Assign cost to operaFons • Assign cost to parFal or alternaFve plans • Search for plan with lowest cost t2 t3 Table scan t1 Range scan JOIN Ref access JOIN
  8. 8. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Cost-­‐based OpFmizaFons The main cost-­‐based opFmizaFons: • Index and access method: – Table scan – Index scan – Range scan – Index lookup (ref access) • Join order • Join buffering strategy • Subquery strategy t2 t3 Table scan t1 Range scan JOIN Ref access JOIN
  9. 9. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | OpFmizer Cost Model t1 Cost esFmate Row esFmate Cost Model Cost formulas Access methods Join Subquery Cost constants CPU IO Metadata: -­‐ Index informaFon -­‐ Uniqueness -­‐ Nullability StaFsFcs: -­‐ Table size -­‐ Cardinality -­‐ Range esFmates Cost Model configuraFon Range scan JOIN
  10. 10. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • The cost for execu0ng a query • Cost unit: – “read of a random data page” • Main cost factors: – IO cost: • #pages read from table • #pages read from index – CPU cost: • EvaluaFng query condiFons • Comparing keys/records • SorFng keys • Main cost constants: Cost EsFmates Cost Cost value Reading a random page 1.0 EvaluaFng query condiFon 0.2 Comparing key/record 0.1
  11. 11. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • IO-­‐cost: – EsFmates from storage engine based on number of pages to read – Both index and data pages • Schema (data dic0onary): – Length of records and keys – Uniqueness for indexes – Nullability • Sta0s0cs: – Number of records in table – Key distribuFon/Cardinality: • Average number of records per key value • Only for indexed columns – Number of records in an index range – Size of tables and indexes Input to Cost Model
  12. 12. Example: Cost Model for Table Scan SELECT * FROM t1 WHERE a BETWEEN 20 AND 23; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Cost model: • IO-­‐cost: #pages in table * IO_BLOCK_READ_COST • CPU cost: #records * ROW_EVALUATE_COST • Example: – IO-­‐cost: 193408 page * 1.0 – CPU-­‐cost: 9829537 records * 0.2 – Total cost: 2159315 10 million records
  13. 13. Example: Cost Model for Range Scan SELECT * FROM t1 WHERE a BETWEEN 20 AND 23; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Range scan (on secondary index): • IO-­‐cost: #records_in_range * IO_BLOCK_READ_COST • CPU cost: #records_in_range * ROW_EVALUATE_COST + #records_in_range * ROW_EVALUATE_COST • Example: – IO cost: 80506 * 1.0 – CPU cost: 80506 * 0.2 + 80506 * 0.2 – Total cost: 112709 Returns 80000 records Evaluate range condiFon Evaluate WHERE condiFon
  14. 14. Example: Cost Model for JOIN SELECT * FROM t1 JOIN t2 ON t1.i1 = t2.i2; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • Table Secondary key scan on t1: #blocks * IO_BLOCK_READ_COST + #records *ROW_EVALUATE_COST • Ref access on t2: #records_from_t1 * #records_per_key * (IO_BLOCK_READ_COST + ROW_EVALUATE_COST) t1 t2 Access Method Number of records read from t1 Number of records to read from t2
  15. 15. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • Total cost for query • Cost per table: – Cost for reading data – Cost for evaluaFng condiFons • Cost for “join prefix” • Output Cost in Explain JSON EXPLAIN FORMAT=JSON SELECT * FROM t1 WHERE a BETWEEN 20 AND 23; { "query_block": { "select_id": 1, "cost_info": { "query_cost": "112709.41" }, "table": { "table_name": "t1", "access_type": "range", "possible_keys": [ ”idx1" ], …. "rows_examined_per_scan": 80506, "rows_produced_per_join": 80506, "filtered": 100, "index_condiFon": "(`test`.`t1`.`a` between 20 and 23)", "cost_info": { "read_cost": "96608.21", "eval_cost": "16101.20", "prefix_cost": "112709.41", "data_read_per_join": "19M" }, } }
  16. 16. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Cost in OpFmizer Trace • Contains rows and cost esFmates • Contains the result from the cost evaluaFon "rows_esFmaFon": [ { "table": "`t1`", "range_analysis": { "table_scan": { "rows": 9575168, "cost": 2.11e6 }, ... "analyzing_range_alternaFves": { "range_scan_alternaFves": [ { "index": ”idx", "ranges": [ "20 <= a <= 23" ], ... "rows": 80506, "cost": 96608, "chosen": true } ], }, SET optimizer_trace=“enabled=ON”; SELECT * FROM t1 WHERE a BETWEEN 20 AND 23; SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE;
  17. 17. Cost Model Improvements MySQL 5.7 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  18. 18. Cost Model Improvements in MySQL 5.7 • Improved Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | record esFmates – Cost model for WHERE condiFons (condiFon filtering effect) – Improved index staFsFcs • Configurable “cost constants” • Cost esFmates in Explain JSON • Refactoring • Fixed a number of “cost model” bugs
  19. 19. Cost Model for WHERE condiFons Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Condi0on filtering effect • Goal: Low-­‐fanout tables should be early in the join order • Problem: The WHERE condiFon is not taken into account when calculaFng fanout esFmate for a table • Solu0on: New way to calculate fanout: – In 5.6: • Fanout= #rows esFmated from access method – In 5.7: • Calculate filtering effect of query condiFons NOT used by the access method • Fanout= #rows from access method * condiFon filter effect
  20. 20. Record EsFmates for JOIN in MySQL 5.6 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • tx JOIN tx+1 • Important for the accuracy of cost esFmate: – EsFmated number of records produced by tx – EsFmated number of records to be read from tx+1 tx tx+1 Access Method Number of records read from tx Both are improved in 5.7
  21. 21. Example: Two Table JOIN in MySQL 5.6 What is the best join order? Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | CREATE TABLE employee ( id INTEGER PRIMARY KEY, office_id INTEGER, name VARCHAR(20), hire_date DATE, INDEX office(office_id) ); CREATE TABLE office ( id INTEGER PRIMARY KEY, office_name VARCHAR(20) ); 10000 rows 100 rows SELECT office_name FROM office JOIN employee ON office.id = employee.office_id WHERE employee.name LIKE “John” AND employee.hire_date BETWEEN “2014-01-01” AND “2014-06-01”;
  22. 22. Example: Two Table JOIN in MySQL 5.6, cont. What is the best join order? SELECT office_name FROM office JOIN employee ON office.id = employee.office_id WHERE employee.name LIKE “John” AND employee.hire_date BETWEEN “2014-01-01” AND “2014-06-01”; Table Type Possible keys Key Ref Rows Filtered Extra office ALL PRIMARY NULL NULL 100 100.00 NULL employee ref office office office.id 99 100.00 Using where Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Explain: Total cost = cost(scan office) + 100 * cost(ref_access employee)
  23. 23. Record EsFmates for JOIN in MySQL 5.7 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • tx JOIN tx+1 • The esFmate for number of records produced by tX takes into account the enFre query condiFon • The cost esFmate for tX+1 should be more correct Condi0on filter effect tx tx+1 Access Method Number of records read from tx CondiFon filter effect Records passing the table condiFons on tx
  24. 24. How to Calculate CondiFon Filter Effect, step 1 SELECT office_name FROM office JOIN employee WHERE office.id = employee.office_id AND employee.name = “John” AND employee.first_office_id <> office.id; A condiFon contributes to filter for table t only if: – It Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | references a field in table t – It is not used by the access method – It depends on an available value: • employee.name = “John” will always contribute to filter on employee • employee.first_office_id <> office.id; depends on JOIN order
  25. 25. How to Calculate CondiFon Filter Effect, step 2 SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = “San Francisco” AND employee.name = “John” AND age > 21 AND hire_date BETWEEN “2014-01-01” AND “2014-06-01”; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Filter esFmate based on what is available: 1. Range esFmate 2. Index staFsFcs 3. GuessFmate Will like be adjusted = MAX(0.005, 1/row count) <=,<,>,>= 1/3 BETWEEN 1/9 NOT <op> 1 – SEL(<op>) AND P(A and B) = P(A) * P(B) OR P(A or B) = P(A) + P(B) – P(A and B) … …
  26. 26. CalculaFng Filter Effect for Tables SELECT * FROM office JOIN employee ON office.id = employee.office_id WHERE office_name = “San Francisco” AND employee.name = “John” AND age > 21 AND hire_date BETWEEN “2014-01-01” AND “2014-06-01”; Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Filter effect for tables: – office: 0.03 – employee: 0.005 * 0.11 * 0.89 Example 0.005 (guessFmate) 0.89 (range) 0.11 (guessFmate) 0.03 (index)
  27. 27. Example: Two Table JOIN in MySQL 5.7 SELECT office_name FROM office JOIN employee ON office.id = employee.office WHERE employee.name LIKE “John” AND employee.hire_date BETWEEN “2014-01-01” AND “2014-06-01”; Table Type Possible keys Key Ref Rows Filtered Extra office ALL PRIMARY NULL NULL 100 100.00 NULL employee ref office office office.id 99 100.00 Using where Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Explain for 5.6: Explain for 5.7: Table Type Possible keys Key Ref Rows Filtered Extra employee ALL NULL NULL NULL 9991 1.23 NULL office eq_ref PRIMARY PRIMARY employee.office 1 100.00 Using where GuesFmate JOIN ORDER HAS CHANGED!
  28. 28. Performance improvements: DBT-­‐3 (SF10, CPU bound) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 100 80 60 40 20 0 Q3 Q7 Q8 Q9 Q12 Execu0on 0me rela0ve to 5.6 (%) 5 out of 22 queries get an improved query plan MySQL 5.6 MySQL 5.7
  29. 29. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Disable CondiFon Filtering • In case of performance regressions: SET optimizer_switch=`condition_fanout_filter=OFF`;
  30. 30. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Improved Index StaFsFcs • Index staFsFcs (cardinality) is calculated by storage engines as: – Number of records per key value – In 5.6: an integer value – In 5.7: replaced with a floaFng point number • Why: – More accurate fan-­‐out and cost esFmates when using floaFng point numbers instead of using integer values – Becer choice of index for join when two indexes have similar cardinality esFmates
  31. 31. Index StaFsFcs and Index SelecFon Example Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | CREATE TABLE employee ( id INTEGER PRIMARY KEY, name VARCHAR(20), car_id INTEGER, preferred_car_model INTEGER ); CREATE TABLE cars ( model INTEGER, car_id INTEGER, INDEX model(model), INDEX car_id(car_id) ); Cardinality: 300 Records/key: 3.3 Cardinality: 1000 Records/key: 1.0 SELECT name FROM employee JOIN cars WHERE employee.car_id = cars.car_id AND employee.preferred_car_model = cars.model; Which index should be used for this JOIN?
  32. 32. Index StaFsFcs and Index SelecFon employee.preferred_car_model = cars.model; Table Type Possible keys Key Ref Rows Filtered Extra employee ALL NULL NULL NULL 1000 100.00 Using where cars ref model, car_id model employee.preferred_car_model 1 100.00 Using where Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Example SELECT name FROM employee JOIN cars WHERE employee.car_id = cars.car_id AND Explain for 5.6: Explain for 5.7: Table Type Possible keys Key Ref Rows Filtered Extra employee ALL NULL NULL NULL 1000 100.00 Using where cars ref model, car_id car_id employee.car_id 1 5.00 Using where Correct esFmate is 3.3
  33. 33. Performance improvements: DBT-­‐3 (SF10) 100 80 60 40 20 0 Disk bound Q2 Q18 Execu0on 0me rela0ve to 5.6 (%) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 100 80 60 40 20 0 Q2 Q18 Execu0on 0me rela0ve to 5.6 (%) CPU bound 5.6 5.7 2 out of 22 queries get an improved query plan 5.6 5.7
  34. 34. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Configurable Cost Model • Replaced hard-­‐coded “cost constants” with configurable “cost constants” • Stored in “mysql” database: – server_cost – engine_cost • “Cost constants” are changed by updaFng these tables Cost model server_ cost engine_ cost OpFmizer
  35. 35. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Name Default value row_evalute_cost 0.2 key_compare_cost 0.1 memory_temptable_create_cost 2.0 memory_temptable_row_cost 0.2 disk_temptable_create_cost 40.0 disk_temptable_row_cost 1.0 io_block_read_cost 1.0 Online update of cost constants: • New sessions will use the new “cost constant” value – Old sessions will use the previous “cost constant” value Configurable Cost Constants UPDATE mysql.server_cost SET cost_value=0.1 WHERE cost_name=“row_evaluate_cost”; FLUSH OPTIMIZER_COSTS;
  36. 36. Engine Specific Cost Constants Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Example • Storage engines can specify their own cost constants, both engine specific values and add new cost constants:! cost_name engine_name cost_value io_block_read_cost Default 1.0 io_block_read_cost InnoDB 0.9 io_block_read_cost MyISAM 1.2 transfer_cost_mb NDB 0.002
  37. 37. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Other changes in MySQL 5.7 • Refactoring: – Added internal Cost Model API – New handler funcFons for cost esFmates • Cost representaFon: – Internal cost representaFon in handler interface and range opFmizer records IO cost and CPU cost separately • Cost esFmates added to EXPLAIN JSON Cost constants OpFmizer Cost Model API Cost Model configuraFon handler API Storage engines
  38. 38. Evolving the Cost Model Future work Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
  39. 39. Goals for Improving the Cost Model 1. The cost model should adapt to: – New Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | server and storage technologies – Large memory buffers 2. Produce more accurate esFmates – Provide becer and more up to date defaults for cost constants – UFlize more staFsFcs about the data 3. More configurable and tunable
  40. 40. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | New Storage Technologies • Time to do a table scan of 10 million records: Memory 5 s SSD 20 -­‐ 146 s Hard disk 32 -­‐ 1465 s • Adjust cost model to support different storage technologies • Provide configurable cost constants for different storage technologies Provide a program that could measure performance and suggest good cost constant configuraFon for a running MySQL server?
  41. 41. Memory Buffer Aware Cost EsFmates SELECT * FROM t1 WHERE a > “constant”; 6 5 4 3 2 1 0 In InnoDB buffer pool 1 2 3 4 5 6 7 8 9 10 11 12 Time (s) Percentage of data Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 4500 4000 3500 3000 2500 2000 1500 1000 500 0 On harddisk 1 2 3 4 5 6 7 8 9 10 11 12 Time (s) Percentage of data Range scan on secondary key
  42. 42. Memory Buffer Aware Cost EsFmates Query executor Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | • Storage engines: – EsFmate for how much of data and indexes are in a memory buffer – EsFmate for hit rate for memory buffer • OpFmizer cost model: – Take into account whether data is already in memory or need to be read from disk Server Storage engine Disk data Database buffer
  43. 43. 6 5 4 3 2 1 In InnoDB buffer pool Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 4500 4000 3500 3000 2500 2000 1500 1000 500 0 On harddisk 1 2 3 4 5 6 7 8 9 10 11 12 Time (s) Percentage of data 0 1 2 3 4 5 6 7 8 9 10 11 12 Time (s) Percentage of data Memory Buffer Aware Cost EsFmates Goal: • Select becer point for changing between alternaFve access methods:
  44. 44. OpFmizer Index staFsFcs Storage Engine Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Histogram Improved StaFsFcs • The more informaFon the opFmizer has about the data in tables, the becer choices can the opFmizer make • New staFsFcs: – StaFsFcs for non-­‐indexed columns – Histograms Table Index Column staFsFcs
  45. 45. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | QuesFons?
  46. 46. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Thank You!

×