New features-in-mariadb-and-mysql-optimizers

1,167 views

Published on

Published in: Technology, Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,167
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
33
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

New features-in-mariadb-and-mysql-optimizers

  1. 1. Sergei Petrunia, MariaDBNew featuresin MariaDB/MySQLquery optimizer
  2. 2. 12:49:092MySQL/MariaDB optimizer development● Some features have common heritage● Big releases:– MariaDB 5.3/5.5– MySQL 5.6– (upcoming) MariaDB 10.0
  3. 3. 12:49:093New optimizer featuresSubqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersPERFORMANCE_SCHEMAEngine-independentstatisticsInnoDB persistent statistics
  4. 4. 12:49:094New optimizer featuresSubqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersEngine-independentstatisticsInnoDB persistent statisticsPERFORMANCE_SCHEMA
  5. 5. 12:49:095Subqueries in MySQL● Subqueries are practially unusable● e.g. Facebook disabled them in the parser● Reason - “naive execution”.
  6. 6. 12:49:096Naive subquery execution● For IN (SELECT... ) subqueries:select * from hotelwherehotel.country=USA andhotel.name IN (select hotel_stays.hotelfrom hotel_stayswhere hotel_stays.customer=John Smith)for (each hotel in USA ) {if (john smith stayed here) {…}}● Naive execution:● Slow!
  7. 7. 12:49:097Naive subquery execution (2)● For FROM(SELECT …) subquereis:1. Retrieve all hotels with > 500 rooms, store in a temporarytable big_hotel;2. Search in big_hotel for hotels near AMS.● Naive execution:● Slow!select *from(select *from hotelwhere hotel.rooms > 500) as big_hotelwherebig_hotel.nearest_aiport=AMS;
  8. 8. 12:49:098New subquery optimizations● Handle IN (SELECT ...)● Handle FROM (SELECT …)● Handle a lot of cases● Comparison withPostgreSQL– ~1000x slower before– ~same order of magnitude now● Releases– MySQL 6.0– MariaDB 5.5● Sheeri Kritzer @ Mozilla seemshappy with this one– MySQL 5.6● Subset of MariaDB 5.5sfeatures
  9. 9. 12:49:099Subquery optimizations - summary● Subqueries were generally unusable before MariaDB5.3/5.5● “Core” subquery optimizations are in– MariaDB 5.3/5.5– MySQL 5.6● MariaDB has extra additions● Further information:https://kb.askmonty.org/en/subquery-optimizations/
  10. 10. 12:49:0910Subqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersEngine-independentstatisticsInnoDB persistent statisticsPERFORMANCE_SCHEMA
  11. 11. 12:49:0911Batched Key Access - background● Big, IO-bound joins were slow– DBT-3 benchmark could not finish*● Reason?● Nested Loops join hits the second table at randomlocations.
  12. 12. 12:49:0912Batched Key Access ideaNested Loops Join Batched Key AccessSpeedup reasons● Fewer disk head movements● Cache-friendliness● Prefetch-friendliness
  13. 13. 12:49:0913Batched Key Access benchmarkset join_cache_level=6; – enable BKAselect max(l_extendedprice)from orders, lineitemwherel_orderkey=o_orderkey ando_orderdate between $DATE1 and $DATE2Run with● Various join_buffer_size settings● Various size of $DATE1...$DATE2 range
  14. 14. 12:49:0914Batched Key Access benchmark (2)-2,000,000 3,000,000 8,000,000 13,000,000 18,000,000 23,000,000 28,000,000 33,000,000050010001500200025003000BKA join performance depending on buffer sizequery_size=1, regularquery_size=1, BKAquery_size=2, regularquery_size=2, BKAquery_size=3, regularquery_size=3, BKABuffer size, bytesQuerytime,secPerformance without BKAPerformance with BKA,given sufficient buffer size
  15. 15. 12:49:0915Batched Key Access summary● Optimization for big, IO-bound joins– Orders-of-magnitude speedups● Available in– MariaDB 5.3/5.5 (more advanced)– MySQL 5.6● Not fully automatic yet– Needs to be manually enabled– Need to set buffer sizes.
  16. 16. 12:49:0916Subqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersEngine-independentstatisticsInnoDB persistent statisticsPERFORMANCE_SCHEMA
  17. 17. 12:49:0917Index Condition Pushdownalter table lineitem add index s_r (l_shipdate, l_receiptdate);select count(*) from lineitemwherel_shipdate between 1993-01-01 and 1993-02-01 anddatediff(l_receiptdate,l_shipdate) > 25 andl_quantity > 40● A new feature in MariaDB 5.3/ MySQL 5.6+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+| 1 | SIMPLE | lineitem | range | s_r | s_r | 4 | NULL | 158854 | Using index condition; Using where |+----+-------------+----------+-------+---------------+------+---------+------+--------+------------------------------------+1.Read index records in the rangel_shipdate between 1993-01-01 and 1993-02-012.Check the index conditiondatediff(l_receiptdate,l_shipdate) > 253.Read full table rows4.Check the WHERE conditionl_quantity > 40← New!← Filters out records beforetable rows are read
  18. 18. 12:49:0918Index Condition Pushdown - conclusionsSummary● Applicable to any index-based access (ref, range, etc)● Checks parts of WHERE after reading the index● Reduces number of table records to be read● Speedup can be like in “Using index”– Great for IO-bound load (5x, 10x)– Some for CPU-bound workload (2x)Conclusions● Have a selective condition on column?– Put the column into index, at the end.
  19. 19. 12:49:0919Extended keys● Before: optimizer has limited support for “tail” columns– Using index supports it– ORDER BY col1, col2, pk1 support it● After MariaDB 5.5/ MySQL 5.6– all parts of optimizer (ref access, range access, etc) can use the “tail”CREATE TABLE tbl (pk1 sometype,pk2 sometype,...col1 sometype,col2 sometype,...KEY indexA (col1, col2)...PRIMARY KEY (pk1, pk2)) ENGINE=InnoDBindexA col1 col2 pk1 pk2● Secondary indexes in InnoDB have invisible “tail”
  20. 20. 12:49:0920Subqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersEngine-independentstatisticsInnoDB persistent statisticsPERFORMANCE_SCHEMA
  21. 21. 12:49:0921Better EXPLAIN in MySQL 5.6● EXPLAIN for UPDATE/DELETE/INSERT … SELECT– shows query plan for the finding records to update/deletemysql> explain update customer set c_acctbal = c_acctbal - 100 where c_custkey=12354;+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+| 1 | SIMPLE | customer | range | PRIMARY | PRIMARY | 4 | NULL | 1 | Using where |+----+-------------+----------+-------+---------------+---------+---------+------+------+-------------+● EXPLAIN FORMAT=JSON– Produces [big] JSON output– Shows more information:● Shows conditions attached to tables● Shows whether “Using temporary; using filesort” is done to handleGROUP BY or ORDER BY.● Shows where subqueries are attached– No other known additions– Will be in MariaDB 10.0The most useful addition!
  22. 22. 12:49:0922EXPLAIN FORMAT=JSONWhat are the “conditions attached to tables”?explainselectcount(*)fromorders, customerwherecustomer.c_custkey=orders.o_custkey andcustomer.c_mktsegment=BUILDING andorders.o_totalprice > customer.c_acctbal andorders.o_orderpriority=1-URGENT+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+?
  23. 23. 12:49:0923EXPLAIN FORMAT=JSON (2){"query_block": {"select_id": 1,"nested_loop": [{"table": {"table_name": "customer","access_type": "ALL","possible_keys": ["PRIMARY"],"rows": 1509871,"filtered": 100,"attached_condition": "(`dbt3sf10`.`customer`.`c_mktsegment` = BUILDING)"}},{"table": {"table_name": "orders","access_type": "ref","possible_keys": ["i_o_custkey"],"key": "i_o_custkey","used_key_parts": ["o_custkey"],"key_length": "5","ref": ["dbt3sf10.customer.c_custkey"],"rows": 7,"filtered": 100,"attached_condition": "((`dbt3sf10`.`orders`.`o_orderpriority` = 1-URGENT) and (`dbt3sf10`.`orders`.`o_totalprice` >`dbt3sf10`.`customer`.`c_acctbal`))"}}]}}+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 1509871 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | dbt3sf10.customer.c_custkey | 7 | Using where |+----+-------------+----------+------+---------------+-------------+---------+-----------------------------+---------+-------------+
  24. 24. 12:49:0924EXPLAIN ANALYZE (kind of)● Does EXPLAIN match the reality?● Where is most of the time spent?● MySQL/MariaDB dont have “EXPLAIN ANALYZE” ...selectcount(*)fromorders, customerwherecustomer.c_custkey=orders.o_custkey andcustomer.c_mktsegment=BUILDING and orders.o_orderpriority=1-URGENT+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 149415 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+
  25. 25. 12:49:0925Traditional solution: Status variablesProblems:● Only #rows counters● all tables are counted togethermysql> flush status;Query OK, 0 rows affected (0.00 sec)mysql> {run query}mysql> show status like Handler%;+----------------------------+--------+| Variable_name | Value |+----------------------------+--------+| Handler_commit | 1 || Handler_delete | 0 || Handler_discover | 0 || Handler_icp_attempts | 0 || Handler_icp_match | 0 || Handler_mrr_init | 0 || Handler_mrr_key_refills | 0 || Handler_mrr_rowid_refills | 0 || Handler_prepare | 0 || Handler_read_first | 0 || Handler_read_key | 30142 || Handler_read_last | 0 || Handler_read_next | 303959 || Handler_read_prev | 0 || Handler_read_rnd | 0 || Handler_read_rnd_deleted | 0 || Handler_read_rnd_next | 150001 || Handler_rollback | 0 |.... . .
  26. 26. 12:49:0926Newer solution: userstat● In Facebook patch, Percona, MariaDB:mysql> set global userstat=1;mysql> flush table_statistics;mysql> flush index_statistics;mysql> {query}mysql> show table_statistics;+--------------+------------+-----------+--------------+-------------------------+| Table_schema | Table_name | Rows_read | Rows_changed | Rows_changed_x_#indexes |+--------------+------------+-----------+--------------+-------------------------+| dbt3sf1 | orders | 303959 | 0 | 0 || dbt3sf1 | customer | 150000 | 0 | 0 |+--------------+------------+-----------+--------------+-------------------------+mysql> show index_statistics;+--------------+------------+-------------+-----------+| Table_schema | Table_name | Index_name | Rows_read |+--------------+------------+-------------+-----------+| dbt3sf1 | orders | i_o_custkey | 303959 |+--------------+------------+-------------+-----------+● Counters are per-table– Ok as long as you dont have self-joins● Overhead is negligible● Counters are server-wide (other queries affect them, too)
  27. 27. 12:49:0927Latest addition: PERFORMANCE_SCHEMA● Allows to measure *time* spent reading each table● Has some visible overhead (Facebooks tests: 7%)● Counters are system-wide● Still no luck with self-joinsmysql> truncate performance_schema.table_io_waits_summary_by_table;mysql> {query}mysql> selectobject_schema,object_name,count_read,sum_timer_read, -- this is picosecondssum_timer_read / (1000*1000*1000*1000) as read_seconds -- this is secondsfromperformance_schema.table_io_waits_summary_by_tablewhereobject_schema = dbt3sf1 and object_name in (orders,customer);+---------------+-------------+------------+----------------+--------------+| object_schema | object_name | count_read | sum_timer_read | read_seconds |+---------------+-------------+------------+----------------+--------------+| dbt3sf1 | orders | 334101 | 5739345397323 | 5.7393 || dbt3sf1 | customer | 150001 | 1273653046701 | 1.2737 |+---------------+-------------+------------+----------------+--------------+
  28. 28. 12:49:0928Subqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersEngine-independentstatisticsInnoDB persistent statisticsPERFORMANCE_SCHEMA
  29. 29. 12:49:0929What is table/index statistics?selectcount(*)fromcustomer, orderswherecustomer.c_custkey=orders.o_custkey and customer.c_mktsegment=BUILDING;+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+| 1 | SIMPLE | customer | ALL | PRIMARY | NULL | NULL | NULL | 148305 | Using where || 1 | SIMPLE | orders | ref | i_o_custkey | i_o_custkey | 5 | customer.c_custkey | 7 | Using index |+------+-------------+----------+------+---------------+-------------+---------+--------------------+--------+-------------+MariaDB > show table status like ordersG*************************** 1. row ***************************Name: ordersEngine: InnoDBVersion: 10Row_format: CompactRows: 1495152.............MariaDB > show keys from orders where key_name=i_o_custkeyG*************************** 1. row ***************************Table: ordersNon_unique: 1Key_name: i_o_custkeySeq_in_index: 1Column_name: o_custkeyCollation: ACardinality: 212941Sub_part: NULL.................?1495152 / 212941 = 7“There are on average 7 ordersfor a given c_custkey”
  30. 30. 12:49:0930The problem with index statistics and InnoDBMySQL 5.5, InnoDB● Statistics is calculated on-the-fly– When the table is opened (server restart, DDL)– When sufficient number of records have been updated– ...● Calculation uses random sampling– @@innodb_stats_sample_pages● Result:– Statistics changes without warning=> Query plans change, without warning● For example, DBT-3 benchmark– 22 analytics queries– Plans-per-query: avg=2.8, max=7.
  31. 31. 12:49:0931Persistent table statisticsPersistent statistics v1● Percona Server 5.5 (ported to MariaDB 5.5)– Need to enable it: innodb_use_sys_stats_table=1● Statistics is stored inside InnoDB– User-visible through information_schema.innodb_sys_stats (read-only)● Setting innodb_stats_auto_update=OFF prevents unexpected updatesPersistent statistics v2● MySQL 5.6– Enabled by default: innodb_stats_persistent=1● Stored in regular InnoDB tables– mysql.innodb_table_stats, mysql.innodb_index_stats● Setting innodb_stats_auto_recalc=OFF prevents unexpected updates● Can also specify persistence/auto-recalc as a table option
  32. 32. 12:49:0932Persistent table statistics - summary● Percona, then MySQL– Made statistics persistent– Disallowed automatic updates● Remaining issue #1: its still random sampling– DBT-3 benchmark– scale=30– Re-ran EXPLAINS forbenchmark queries– Counted different queryplans● Remaining issue #2: limited amount of statistics– Only on index columns– Only AVG(#different_values)
  33. 33. 12:49:0933Upcoming: Engine-independent statisticsMariaDB 10.0: Engine-independent statistics● Collected/used on SQL layer● No auto updates, only ANALYZE TABLE– 100% precise statics● More statistics– Index statistics (like before)– Table statistics (like before)– Column statistics● MIN/MAX values● Number of NULL / not NULL values● Histograms● => Optimizer will be smarter and more reliable
  34. 34. 12:49:0934Conclusions● Lots of new query optimizer features recently– Subqueries now just work– Big joins are much faster● Need to turn it on– More diagnostics● Even more is coming● Releases with features– MariaDB 5.5– MySQL 5.6,– (upcoming) MariaDB 10.0
  35. 35. 12:49:0935New optimizer featuresSubqueries Batched Key Access(MRR)Index Condition PushdownExtended KeysEXPLAIN UPDATE/DELETESubqueriesFROM IN OthersPERFORMANCE_SCHEMAEngine-independentstatisticsInnoDB persistent statistics
  36. 36. 12:49:0936ThanksQ & A

×