SlideShare a Scribd company logo

1

MySQL EXPLAIN Explained
   Quick and Easy Query Optimisation




   Adrian Hardy <talks@fuzzee.co.uk>

2

Before we begin...
What you need to know
    How and why we add indexes to tables
●


    The benefits of correct field typing
●


    Understanding of the ideals of 3NF
●


    Basic understanding of SQL JOINs
●




This presentation
    Very quick introduction to EXPLAIN
●



    Improve understanding of MySQL and indexing
●


    Simplified examples / results
●

3

Introduction - Using MySQL EXPLAIN
  Prefix a SELECT query with EXPLAIN
      MySQL won't actually execute the query, just analyse it
  ●


      EXPLAIN helps us understand how and when MySQL
  ●


      will use indexes
      EXPLAIN returns a table of data from which you identify
  ●


      potential improvements
      Optimise queries in three ways
  ●


           Modify or create indexes
       ●


           Modify query structure
       ●


           Modify data structure
       ●


      Optimised queries = faster results, lower server load...
  ●

4

Introduction - Review of Indexing
    Fast, compact structure for identifying row locations
●


    Keep indexes in memory by trimming the fat:
●


        Can I reduce the characters in that VARCHAR index?
    ●



        Can I use a TINYINT instead of a BIGINT?
    ●



        Can I use an INTEGER to describe a status or flag (rather
    ●


        than a textual description)?
    Chop down your result set as quickly as possible
●


    MySQL will only use one index per query/table – it cannot
●

    combine two separate indexes to make a useful one *


Understanding and preparation brings about Indexing Strategy



                                  * Not strictly true - look up “Index Merge” operations

5

Booking application schema

attendees
  attendee_id      surname      conference_id   registration_status
 INTEGER (PK)     VARCHAR       INTEGER (FK)         TINYINT




conferences
 conference_id    location_id     topic_id             date
 INTEGER (PK)    INTEGER (FK)   INTEGER (FK)          DATE

6

EXPLAIN – Worked Example
     EXPLAIN SELECT * FROM attendees WHERE
conference_id = 123 AND registration_status > 0

           table       possible_keys          key              rows
         attendees         NULL              NULL              14052


   The three most important columns returned by EXPLAIN
    1)Possible keys
         All the possible indexes which MySQL could have used
     ●



         Based on a series of very quick lookups and
     ●

         calculations
    2)Chosen key
    3)Rows scanned
         Indication of effort required to identify your result set
     ●

7

EXPLAIN – Worked Example
     EXPLAIN SELECT * FROM attendees WHERE
conference_id = 123 AND registration_status > 0

             table      possible_keys          key            rows
           attendees        NULL               NULL       14052


   Interpreting the results
       No suitable indexes for this query
   ●



           MySQL had to do a full table scan
       ●



       Full table scans are almost always the slowest query
   ●



       Full table scans, while not always bad, are usually an
   ●

       indication that an index is required

8

EXPLAIN – Worked Example
   ALTER TABLE ADD INDEX conf (conference_id);
 ALTER TABLE ADD INDEX reg (registration_status);

EXPLAIN SELECT * FROM attendees WHERE conference_id
         = 123 AND registration_status > 1;

            table       possible_keys           key      rows
          attendees        conf, reg            conf      331


      MySQL had two indexes to choose from, but discarded “reg”
  ●



      “reg” isn't sufficiently unique
  ●



          The spread of values can also be a factor (e.g when 99% of
      ●

          rows contain the same value)
      Index “uniqueness” is called cardinality
  ●



      There is scope for some performance increase...
  ●



          Lower server load, quicker response
      ●

9

EXPLAIN – Worked Example
           ALTER TABLE ADD INDEX reg_conf_index
          (registration_status, conference_id);

EXPLAIN SELECT * FROM attendees WHERE conference_id =
           123 AND registration_status > 1;

          table     possible_keys         key         rows
                       reg, conf,
        attendees                    reg_conf_index   204
                    reg_conf_index

        reg_conf_index is a much better choice
    ●



     Note that the other two keys are still available, just
    ●

    not as effective
        Our query is now served well by the new index
    ●

10

EXPLAIN – Worked Example
              DELETE INDEX conf; DELETE INDEX reg;
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123

           table     possible_keys        key           rows
         attendees       NULL            NULL           14052

       Without the “conf” index, we're back to square one
   ●


       The order in which fields were defined in a composite index
   ●


       affects whether it is available for use in a query
       ● Remember,     we defined our index : (registration_status,
         conference_id)
   Potential workaround:
EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
                AND registration_status >= -1

           table      possible_keys         key          rows
         attendees    reg_conf_index   reg_conf_index     204

11

EXPLAIN – Example 2
EXPLAIN SELECT * FROM attendees WHERE surname LIKE 'har%';

         table        possible_keys      key           rows
       attendees        surname        surname             234


         MySQL uses an index on surname – which is good.



EXPLAIN SELECT * FROM attendees WHERE surname LIKE '%har%';

         table        possible_keys      key           rows
       attendees          NULL          NULL           14052


                 MySQL doesn't even try to use an index!

12

EXPLAIN – Example 3
EXPLAIN SELECT * FROM conferences WHERE location_id = 2 OR
                    topic_id IN (4,6,1)

         table      possible_keys         key         rows
                     location_id,
      conferences                        NULL         5043
                       topic_id

           MySQL doesn't use an index, because of the OR

 ALTER TABLE ADD INDEX location_topic (location_id,
                     topic_id);

 EXPLAIN SELECT * FROM conferences WHERE location_id = 2
                  OR topic_id IN (4,6,1)

         table      possible_keys         key         rows
                     location_id,
      conferences                    location_topic    15
                       topic_id,
                    location_topic

     Full table scan avoided – could also use UNION (ALL) trick

13

EXPLAIN – Example 4
EXPLAIN SELECT * FROM attendees WHERE MD5(conference_id) =
                         MD5(123)
         table      possible_keys      key           rows
       attendees        NULL          NULL          14052

         Understandably, MySQL has to do a full table scan



                     A more realistic example?
          EXPLAIN SELECT * FROM conferences WHERE
               DATE_FORMAT(date,'%a') = 'Sat'

         table      possible_keys      key           rows
      conferences       NULL          NULL           5043

    A good candidate for Optimisation #3 – Modify Data Structure

14

JOINs
    JOINing together large data sets (>= 100,000) is really
●


    where EXPLAIN becomes useful
    Each JOIN in a query gets its own row in EXPLAIN
●


        Make sure each JOIN condition is FAST
    ●


    Make sure each joined table is getting to its result set
●

    as quickly as possible
    ● The benefits compound if each join requires less

      effort

15

JOINs – Simple Example
                  EXPLAIN SELECT * FROM
conferences INNER JOIN attendees USING (conference_id)
         WHERE conferences.location_id = 2 AND
           conferences.topic_id IN (4,6,1) AND
            attendees.registration_status > 1


   table            type       possible_keys         key              rows
conferences          ref      conference_topic conference_topic        15

 attendees          ALL            NULL             NULL              14052


           Looks like I need an index on attendees.conference_id

      There are 13 different values for “type”
        ● Another indication of effort, aside from rows scanned

        ● Here, “ALL” is bad – we should be aiming for “ref”

          ● Common values are “const”, “ref”, and “all”

        ● http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

16

The “extra” column
  With every EXPLAIN, you get an “extra” column, which
  shows additional operations invoked to get your result set.
  table     possible_keys     key          rows           extra
                                                      Using where
attendees         conf       conf          331
                                                      Using filesort

  Some example “extra” values:

        Using   where
    ●


        Using   temporary table
    ●


        Using   filesort
    ●


        Using   index
    ●




  There are many more “extra” values which are discussed in
  the MySQL manual.

17

“Using filesort”
Avoid, because:
● Doesn't use an index

  ● Involves a full scan of your result set


  ● Employs a generic (i.e. one size fits all)

    algorithm
● Uses the filesystem (eeek)


● Will get slower with more data



It's not all bad...
    Perfectly acceptable provided you get to your
●

    result set as quickly as possible, and keep it
    predictably small
    Sometimes unavoidable - ORDER BY RAND()
●


    ORDER BY operations can use indexes to do the
●


    sorting!

18

“Using filesort” – Example
 EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
                     ORDER BY surname
    table     possible_keys         key             rows         Extra
  attendees    conference_id    conference_id       331       Using filesort


       MySQL is using an index, but it's sorting the results slowly

ALTER TABLE attendees ADD INDEX conf_surname (conference_id,
                          surname);
 EXPLAIN SELECT * FROM attendees WHERE conference_id = 123
                      ORDER BY surname
    table     possible_keys         key             rows         Extra
               conference_id,
  attendees                     conf_surname        331
               conf_surname

                         We've avoided a filesort

19

“Using index”
    Celebrate, because:
 MySQL got your results just by consulting the index,
●


  ● Which could well have been sat in memory


●MySQL didn't need to even look at the table to get you your

results
  ● Opening a table can be an expensive operation.


●MySQL can answer the next query more quickly


●The fastest way for you to get your data?




    Particularly useful...
 When you're just interested in a single date or an id
●


●Or the COUNT(), SUM(), AVG() etc. of a field

20

“Using index” – Example
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id
                           = 123
   table         possible_keys         key           rows             Extra
 attendees        conference_id    conference_id     331


  Nothing is actually wrong with this query – it could just be quicker!

 ALTER TABLE attendees ADD INDEX conf_age (conference_id,
                           age);
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id
                          = 123
    table        possible_keys          key          rows             Extra
                  conference_id,
  attendees                        conf_surname       331         Using index
                  conf_surname

             Outside of caching, the fastest way to get your data *
                                                                  *Not a guarantee

21

Moving forward...

Just because your queries are fast now, doesn't mean that they will stay
that way forever

Enable MySQL's Slow Query Log
 ● --log-slow-queries=/var/lib/mysql/slow-query.log

 ● Defaults to logging queries which take more than 10 seconds

 ● --long_query_time=1

 ● Use Percona's “microslow” patch for values < 1 second

 ● Find the query in the log, EXPLAIN it, improve it, rinse and repeat

22

Moving forward...
Use the command line to identify more general problems
 ● mysqladmin -u dbuser -p -r -i 10 extended-status

 ● Figures are relative, updated every 10 seconds

      ● Slow_queries = number of slow queries in last period

      ● Select_Scan = full table scans

      ● Select_full_join = full scans to complete join operations

      ● Created_tmp_disk_tables = filesorts

      ● Key_read_requests/Key_write_requests

        ● Determine write/read weighting of our application and alter your

          indexes accordingly

23

MySQL Resources

    http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
●



    High Performance MySQL - Baron Schwartz
●



          ISBN 0596101716
      –

          £20 (Money well spent)
      –

    http://www.mysqlperformanceblog.com
●



              Regular posts
          –

More Related Content

Mysql Explain Explained

  • 1. MySQL EXPLAIN Explained Quick and Easy Query Optimisation Adrian Hardy <talks@fuzzee.co.uk>
  • 2. Before we begin... What you need to know How and why we add indexes to tables ● The benefits of correct field typing ● Understanding of the ideals of 3NF ● Basic understanding of SQL JOINs ● This presentation Very quick introduction to EXPLAIN ● Improve understanding of MySQL and indexing ● Simplified examples / results ●
  • 3. Introduction - Using MySQL EXPLAIN Prefix a SELECT query with EXPLAIN MySQL won't actually execute the query, just analyse it ● EXPLAIN helps us understand how and when MySQL ● will use indexes EXPLAIN returns a table of data from which you identify ● potential improvements Optimise queries in three ways ● Modify or create indexes ● Modify query structure ● Modify data structure ● Optimised queries = faster results, lower server load... ●
  • 4. Introduction - Review of Indexing Fast, compact structure for identifying row locations ● Keep indexes in memory by trimming the fat: ● Can I reduce the characters in that VARCHAR index? ● Can I use a TINYINT instead of a BIGINT? ● Can I use an INTEGER to describe a status or flag (rather ● than a textual description)? Chop down your result set as quickly as possible ● MySQL will only use one index per query/table – it cannot ● combine two separate indexes to make a useful one * Understanding and preparation brings about Indexing Strategy * Not strictly true - look up “Index Merge” operations
  • 5. Booking application schema attendees attendee_id surname conference_id registration_status INTEGER (PK) VARCHAR INTEGER (FK) TINYINT conferences conference_id location_id topic_id date INTEGER (PK) INTEGER (FK) INTEGER (FK) DATE
  • 6. EXPLAIN – Worked Example EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 The three most important columns returned by EXPLAIN 1)Possible keys All the possible indexes which MySQL could have used ● Based on a series of very quick lookups and ● calculations 2)Chosen key 3)Rows scanned Indication of effort required to identify your result set ●
  • 7. EXPLAIN – Worked Example EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 0 table possible_keys key rows attendees NULL NULL 14052 Interpreting the results No suitable indexes for this query ● MySQL had to do a full table scan ● Full table scans are almost always the slowest query ● Full table scans, while not always bad, are usually an ● indication that an index is required
  • 8. EXPLAIN – Worked Example ALTER TABLE ADD INDEX conf (conference_id); ALTER TABLE ADD INDEX reg (registration_status); EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 1; table possible_keys key rows attendees conf, reg conf 331 MySQL had two indexes to choose from, but discarded “reg” ● “reg” isn't sufficiently unique ● The spread of values can also be a factor (e.g when 99% of ● rows contain the same value) Index “uniqueness” is called cardinality ● There is scope for some performance increase... ● Lower server load, quicker response ●
  • 9. EXPLAIN – Worked Example ALTER TABLE ADD INDEX reg_conf_index (registration_status, conference_id); EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status > 1; table possible_keys key rows reg, conf, attendees reg_conf_index 204 reg_conf_index reg_conf_index is a much better choice ● Note that the other two keys are still available, just ● not as effective Our query is now served well by the new index ●
  • 10. EXPLAIN – Worked Example DELETE INDEX conf; DELETE INDEX reg; EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 table possible_keys key rows attendees NULL NULL 14052 Without the “conf” index, we're back to square one ● The order in which fields were defined in a composite index ● affects whether it is available for use in a query ● Remember, we defined our index : (registration_status, conference_id) Potential workaround: EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 AND registration_status >= -1 table possible_keys key rows attendees reg_conf_index reg_conf_index 204
  • 11. EXPLAIN – Example 2 EXPLAIN SELECT * FROM attendees WHERE surname LIKE 'har%'; table possible_keys key rows attendees surname surname 234 MySQL uses an index on surname – which is good. EXPLAIN SELECT * FROM attendees WHERE surname LIKE '%har%'; table possible_keys key rows attendees NULL NULL 14052 MySQL doesn't even try to use an index!
  • 12. EXPLAIN – Example 3 EXPLAIN SELECT * FROM conferences WHERE location_id = 2 OR topic_id IN (4,6,1) table possible_keys key rows location_id, conferences NULL 5043 topic_id MySQL doesn't use an index, because of the OR ALTER TABLE ADD INDEX location_topic (location_id, topic_id); EXPLAIN SELECT * FROM conferences WHERE location_id = 2 OR topic_id IN (4,6,1) table possible_keys key rows location_id, conferences location_topic 15 topic_id, location_topic Full table scan avoided – could also use UNION (ALL) trick
  • 13. EXPLAIN – Example 4 EXPLAIN SELECT * FROM attendees WHERE MD5(conference_id) = MD5(123) table possible_keys key rows attendees NULL NULL 14052 Understandably, MySQL has to do a full table scan A more realistic example? EXPLAIN SELECT * FROM conferences WHERE DATE_FORMAT(date,'%a') = 'Sat' table possible_keys key rows conferences NULL NULL 5043 A good candidate for Optimisation #3 – Modify Data Structure
  • 14. JOINs JOINing together large data sets (>= 100,000) is really ● where EXPLAIN becomes useful Each JOIN in a query gets its own row in EXPLAIN ● Make sure each JOIN condition is FAST ● Make sure each joined table is getting to its result set ● as quickly as possible ● The benefits compound if each join requires less effort
  • 15. JOINs – Simple Example EXPLAIN SELECT * FROM conferences INNER JOIN attendees USING (conference_id) WHERE conferences.location_id = 2 AND conferences.topic_id IN (4,6,1) AND attendees.registration_status > 1 table type possible_keys key rows conferences ref conference_topic conference_topic 15 attendees ALL NULL NULL 14052 Looks like I need an index on attendees.conference_id There are 13 different values for “type” ● Another indication of effort, aside from rows scanned ● Here, “ALL” is bad – we should be aiming for “ref” ● Common values are “const”, “ref”, and “all” ● http://dev.mysql.com/doc/refman/5.0/en/using-explain.html
  • 16. The “extra” column With every EXPLAIN, you get an “extra” column, which shows additional operations invoked to get your result set. table possible_keys key rows extra Using where attendees conf conf 331 Using filesort Some example “extra” values: Using where ● Using temporary table ● Using filesort ● Using index ● There are many more “extra” values which are discussed in the MySQL manual.
  • 17. “Using filesort” Avoid, because: ● Doesn't use an index ● Involves a full scan of your result set ● Employs a generic (i.e. one size fits all) algorithm ● Uses the filesystem (eeek) ● Will get slower with more data It's not all bad... Perfectly acceptable provided you get to your ● result set as quickly as possible, and keep it predictably small Sometimes unavoidable - ORDER BY RAND() ● ORDER BY operations can use indexes to do the ● sorting!
  • 18. “Using filesort” – Example EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 ORDER BY surname table possible_keys key rows Extra attendees conference_id conference_id 331 Using filesort MySQL is using an index, but it's sorting the results slowly ALTER TABLE attendees ADD INDEX conf_surname (conference_id, surname); EXPLAIN SELECT * FROM attendees WHERE conference_id = 123 ORDER BY surname table possible_keys key rows Extra conference_id, attendees conf_surname 331 conf_surname We've avoided a filesort
  • 19. “Using index” Celebrate, because: MySQL got your results just by consulting the index, ● ● Which could well have been sat in memory ●MySQL didn't need to even look at the table to get you your results ● Opening a table can be an expensive operation. ●MySQL can answer the next query more quickly ●The fastest way for you to get your data? Particularly useful... When you're just interested in a single date or an id ● ●Or the COUNT(), SUM(), AVG() etc. of a field
  • 20. “Using index” – Example EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123 table possible_keys key rows Extra attendees conference_id conference_id 331 Nothing is actually wrong with this query – it could just be quicker! ALTER TABLE attendees ADD INDEX conf_age (conference_id, age); EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123 table possible_keys key rows Extra conference_id, attendees conf_surname 331 Using index conf_surname Outside of caching, the fastest way to get your data * *Not a guarantee
  • 21. Moving forward... Just because your queries are fast now, doesn't mean that they will stay that way forever Enable MySQL's Slow Query Log ● --log-slow-queries=/var/lib/mysql/slow-query.log ● Defaults to logging queries which take more than 10 seconds ● --long_query_time=1 ● Use Percona's “microslow” patch for values < 1 second ● Find the query in the log, EXPLAIN it, improve it, rinse and repeat
  • 22. Moving forward... Use the command line to identify more general problems ● mysqladmin -u dbuser -p -r -i 10 extended-status ● Figures are relative, updated every 10 seconds ● Slow_queries = number of slow queries in last period ● Select_Scan = full table scans ● Select_full_join = full scans to complete join operations ● Created_tmp_disk_tables = filesorts ● Key_read_requests/Key_write_requests ● Determine write/read weighting of our application and alter your indexes accordingly
  • 23. MySQL Resources http://dev.mysql.com/doc/refman/5.0/en/using-explain.html ● High Performance MySQL - Baron Schwartz ● ISBN 0596101716 – £20 (Money well spent) – http://www.mysqlperformanceblog.com ● Regular posts –