Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE

814 views

Published on

This presentation focuses on two of the new features in MySQL 8.0.18: hash joins and EXPLAIN ANALYZE. It covers how these features work, both on the surface and on the inside, and how you can use them to improve your queries and make them go faster.

Both features are the result of major refactoring of how the MySQL executor works. In addition to explaining and demonstrating the features themselves, the presentation looks at how the investment in a new iterator based executor prepares MySQL for a future with faster queries, greater plan flexibility and even more SQL features.

Published in: Software

MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE

  1. 1. Copyright © 2019 Oracle and/or its affiliates.1 Hash join and EXPLAIN ANALYZE Norvald H. Ryeng Software Development Senior Manager MySQL Optimizer Team October 1, 2019 MySQL 8.0.18 latest updates
  2. 2. Copyright © 2019 Oracle and/or its affiliates.2 Safe harbor statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, timing, and pricing of any features or functionality described for Oracle’s products may change and remains at the sole discretion of Oracle Corporation.
  3. 3. Copyright © 2019 Oracle and/or its affiliates.3 Program agenda 1 Quick demo 2 How did we get here? Phase separation Volcano iterator model 3 Hash join 4 EXPLAIN ANALYZE
  4. 4. Copyright © 2019 Oracle and/or its affiliates.4 Quick demo MySQL 8.0.18
  5. 5. Copyright © 2019 Oracle and/or its affiliates.5
  6. 6. Copyright © 2019 Oracle and/or its affiliates.6 How did we get here? A long term investment that finally pays off
  7. 7. Copyright © 2019 Oracle and/or its affiliates.7 https://twitter.com/vlad_mihalcea/status/1173842312398614528
  8. 8. Copyright © 2019 Oracle and/or its affiliates.8 MySQL refactoring: Separating phases  Started ~10 years ago Considered finished now  A clear separation between query processing phases  Fixed a large number of bugs Improved stability  Faster feature development Fewer surprises and complications during development Parse Prepare Optimize Execute SQL Resolve Transform Abstract syntax tree Logical plan Physical plan
  9. 9. Copyright © 2019 Oracle and/or its affiliates.9 MySQL refactoring: Parsing and preparing  Still ongoing Implemented piece by piece  Separating parsing and resolving phases Eliminate semantic actions that do too much Get a true bottom-up parser  Makes it easier to extend with new SQL syntax Parsing doesn't have unintended side effects  Consistent name and type resolving Names resolved top-down Types resolved bottom-up  Transformations done in the prepare phase Bottom-up Parse Prepare Optimize Execute SQL Resolve Transform Abstract syntax tree Logical plan Physical plan
  10. 10. Copyright © 2019 Oracle and/or its affiliates.10 MySQL features made possible because we invested in refactoring CTEs  Recursive and non-recursive CTEs  Traverse hierarchies  Write more readable SQL LATERAL  "For-each loops" … and many, many more! Window functions  Aggregation, ranking, analytics  Sliding windows JSON  JSON_TABLE  JSON window functions
  11. 11. Copyright © 2019 Oracle and/or its affiliates.11 MySQL refactoring: Iterator executor  Volcano iterator model  Possible because phases were separated  Ongoing for ~1,5 year  Much more modular exeuctor Common iterator interface for all operations Each operation is contained within an iterator  Able to put together plans in new ways Immediate benefit: Removes temporary tables in some cases  Join is just an iterator Nested loop join is just an iterator Hash join is just an iterator Your favorite join method is just an iterator Parse Prepare Optimize Execute SQL Resolve Transform Abstract syntax tree Logical plan Physical plan
  12. 12. Copyright © 2019 Oracle and/or its affiliates.12 Old MySQL executor vs. iterator executor Old executor  Nested loop focused  Hard to extend  Code for one operation spread out  Different interfaces for each operation  Combination of operations hard coded Iterator executor  Modular  Easy to extend  Each iterator encapsulates one operation  Same interface for all iterators  All operations can be connected
  13. 13. Copyright © 2019 Oracle and/or its affiliates.13 MySQL 8.0 features based on the iterator executor  EXPLAIN FORMAT=TREE Print the iterator tree  EXPLAIN ANALYZE 1. Insert intstrumentation nodes in the tree 2. Execute the query 3. Print the iterator tree  Hash join Just another iterator type Parse Prepare Optimize Execute SQL Resolve Transform Abstract syntax tree Logical plan Physical plan
  14. 14. Copyright © 2019 Oracle and/or its affiliates.14 Hash join MySQL internals New in 8.0.18
  15. 15. Copyright © 2019 Oracle and/or its affiliates.15 MySQL hash join  Hybrid hash join  Three execution modes Everything fits in memory Spill to disk (GRACE hash join) Looping  Equi-join  xxHash64 Fast Good distribution HashJoinIterator TableScanIterator TableScanIterator SELECT * FROM t1 JOIN t2 ON t1.a = t2.a;
  16. 16. Copyright © 2019 Oracle and/or its affiliates.16 MySQL hash join — everything fits in memory 1) Hash one table into memory Smallest table The whole table fits in memory (if not, use next method) 2) Scan the other table and match with rows in memory 2 1 A = B
  17. 17. Copyright © 2019 Oracle and/or its affiliates.17 MySQL hash join — spill to disk 1 2 3 4 1) Hash the smallest table into memory until the buffer is full When the buffer is full, dump the rest as chunks on disk Produce up to 128 hash buckets (chunks) on disk 2) Scan the other table and match with rows in memory Write to chunks on disk at the same time Produce up to 128 hash buckets (chunks) on disk 3) Hash one chunk of the smallest table into memory Hash using a different seed than used to create disk buckets (If the bucket doesn't fit, see next method) 4) Scan the corresponding chunk and match with rows in memory 5) Repeat 3-4 until all chunks have been processed = = A B
  18. 18. Copyright © 2019 Oracle and/or its affiliates.18 MySQL hash join — looping 1) Hash a batch of the chunk into memory When the buffer is full, pause reading this operand 2) Scan the corresponding chunk and match with rows in memory 3) Hash the next batch into memory Resume from where it was paused 4) Scan the corresponding chunk and match with rows in memory 5) Repeat 3-4 until the whole chunk has been processed 1 & 3 2 & 4 =
  19. 19. Copyright © 2019 Oracle and/or its affiliates.19 MySQL hash join optimization  Replaces BNL  Currently optimized as BNL Replace with hash join after optimization A conservative (safe) choice  Optimizer switch to turn on/off SET optimizer_switch='hash_join=on';  Hints /*+ HASH_JOIN(tables or query blocks) */ /*+ NO_HASH_JOIN(tables or query blocks) */  Buffer size SET join_buffer_size=number; Parse Prepare Optimize Execute SQL Resolve Transform Abstract syntax tree Logical plan Physical plan
  20. 20. Copyright © 2019 Oracle and/or its affiliates.20 467x 1520x>1400x 1332x 968x MySQL hash join performance  BNL compared to hash join  Force BNL/hash join in DBT-3/TPC-H DBT-3/TPC-H without indexes Optimizer selects BNL Automatic conversion to hash join  Hash join is much faster than BNL  Can't expect same improvement when indexes are available Lowernumberisbetter
  21. 21. Copyright © 2019 Oracle and/or its affiliates.21 New in 8.0.18 EXPLAIN ANALYZE MySQL internals
  22. 22. Copyright © 2019 Oracle and/or its affiliates.22 TimingIteratorTimingIterator TimingIterator MySQL EXPLAIN ANALYZE  Wrap iterators in instrumentation nodes  Measurements Time (in ms) to first row Time (in ms) to last row Number of rows Number of loops  Execute the query and dump the stats HashJoinIterator TableScanIterator TableScanIterator
  23. 23. Copyright © 2019 Oracle and/or its affiliates.23 467x 1520x>1400x 1332x 968x What's wrong with Q2? SELECT s_acctbal, s_name, n_name, p_partkey, p_mfgr, s_address, s_phone, s_comment FROM part, supplier, partsupp, nation, region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND p_size = 4 AND p_type LIKE '%TIN' AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'AMERICA' AND ps_supplycost = ( SELECT min(ps_supplycost) FROM partsupp, supplier, nation, region WHERE p_partkey = ps_partkey AND s_suppkey = ps_suppkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'AMERICA' ) ORDER BY s_acctbal DESC, n_name, s_name, p_partkey LIMIT 100; MySQL EXPLAIN ANALYZE to the rescue!
  24. 24. -> Limit: 100 row(s) (actual time=591739.000..591739.018 rows=100 loops=1) -> Sort: <temporary>.s_acctbal DESC, <temporary>.n_name, <temporary>.s_name, <temporary>.p_partkey, limit input to 100 row(s) per chunk (actual time -> Stream results (actual time=591738.599..591738.772 rows=462 loops=1) -> Inner hash join (nation.n_regionkey = region.r_regionkey), (nation.n_nationkey = supplier.s_nationkey) (cost=2074295.37 rows=98) (actual -> Table scan on nation (cost=0.00 rows=25) (actual time=0.024..0.026 rows=25 loops=1) -> Hash -> Inner hash join (supplier.s_suppkey = partsupp.ps_suppkey) (cost=2074041.10 rows=98) (actual time=591735.554..591738.311 rows=46 -> Table scan on supplier (cost=0.06 rows=9760) (actual time=0.068..2.024 rows=10000 loops=1) -> Hash -> Filter: (partsupp.ps_supplycost = (select #2)) (cost=1977898.52 rows=98) (actual time=1282.855..591733.987 rows=462 loop -> Inner hash join (partsupp.ps_partkey = part.p_partkey) (cost=1977898.52 rows=98) (actual time=84.827..271.307 rows=3 -> Table scan on partsupp (cost=3.54 rows=796168) (actual time=0.034..108.684 rows=800000 loops=1) -> Hash -> Inner hash join (cost=20353.91 rows=24) (actual time=0.274..83.964 rows=780 loops=1) -> Filter: ((part.p_size = 4) and (part.p_type like '%TIN')) (cost=20353.16 rows=2204) (actual time=0.212.. -> Table scan on part (cost=20353.16 rows=198401) (actual time=0.160..63.957 rows=200000 loops=1) -> Hash -> Filter: (region.r_name = 'AMERICA') (cost=0.75 rows=1) (actual time=0.029..0.036 rows=1 loops=1) -> Table scan on region (cost=0.75 rows=5) (actual time=0.021..0.030 rows=5 loops=1) -> Select #2 (subquery in condition; dependent) -> Aggregate: min(partsupp.ps_supplycost) (actual time=189.566..189.566 rows=1 loops=3120) -> Inner hash join (nation.n_regionkey = region.r_regionkey), (nation.n_nationkey = supplier.s_nationkey) (cost -> Table scan on nation (cost=0.00 rows=25) (actual time=0.005..0.007 rows=25 loops=3120) -> Hash -> Inner hash join (supplier.s_suppkey = partsupp.ps_suppkey) (cost=77789257.28 rows=79617) (actual tim -> Table scan on supplier (cost=0.02 rows=9760) (actual time=0.014..1.237 rows=10000 loops=3120) -> Hash -> Filter: (part.p_partkey = partsupp.ps_partkey) (cost=81757.41 rows=79617) (actual time=92.08 -> Inner hash join (cost=81757.41 rows=79617) (actual time=0.018..155.118 rows=800000 loops -> Table scan on partsupp (cost=10101.54 rows=796168) (actual time=0.011..97.353 rows=8 -> Hash -> Filter: (region.r_name = 'AMERICA') (cost=0.75 rows=1) (actual time=0.004..0.005 -> Table scan on region (cost=0.75 rows=5) (actual time=0.003..0.003 rows=5 loo Time is in milliseconds
  25. 25. Copyright © 2019 Oracle and/or its affiliates.25 Make your queries run faster with MySQL 8.0 Automatic  Temporary table elimination Recursive CTEs Some derived tables UNION into derived tables Input to sorting  Faster duplicate removal  Hash join instead of BNL  Analyze queries to find out where time is spent EXPLAIN FORMAT=TREE EXPLAIN ANALYZE Optimizer trace
  26. 26. Copyright © 2019 Oracle and/or its affiliates.26 Feature descriptions and design details directly from the source https://mysqlserverteam.com/
  27. 27. Copyright © 2019 Oracle and/or its affiliates.27 Thank you! Norvald H. Ryeng Software Development Senior Manager MySQL Optimizer Team

×