Swan(sea) Song – personal research during my six years at Swansea ... and bey...
San diegophp
1. San Diego
PHP
Aug 2
Goldilocks And The Three Queries –
MySQL's EXPLAIN Explained
Dave Stokes
MySQL Community Manager, North America
David.Stokes@Oracle.Com
2. Please Read
The following is intended to outline our general
product direction. It is intended for information
purposes only, and may not be incorporated into any
contract. It is not a commitment to deliver any
material, code, or functionality, and should not be
relied upon in making purchasing decisions.
The development, release, and timing of any
features or functionality described for Oracle’s
products remains at the sole discretion of Oracle.
3. Simple Introduction
EXPLAIN & EXPLAIN EXTENDED are tools to help
optimize queries. As tools there are only as
good as the crafts persons using them. There is
more to this subject than can be covered here in
a single presentation. But hopefully this session
will start you out on the right path for using
EXPLAIN.
4. Why worry about the optimizer?
Client sends statement to server
Server checks the query cache to see if it has already run
statement. If so, it retrieves stored result and sends it back
to the Client.
Statement is parsed, preprocessed and optimized to make
a Query Execution Plan.
The query execution engine sends the QEP to the storage
engine API.
Results sent to the Client.
5. Once upon a time ...
There was a PHP Programmer named Goldilocks who wanted to
get the phone number of her friend Little Red Riding Hood in
Networking’s phone number. She found an old, dusty piece of
code in the enchanted programmers library. Inside the code was
a special chant to get all the names and phone numbers of the
employees of Grimm-Fayre-Tails Corp. And so, Goldi tried that
special chant!
SELECT name, phone
FROM employees;
6. Oh-No!
But the special chant kept running, and running,
and running.
Eventually Goldi control-C-ed when she realized
that Grimm hired many, many folks after
hearing that the company had 10^10 employees
in the database.
7. A second chant
Goldi did some searching in the library and learned she could
add to the chant to look only for her friend Red.
SELECT name, phone
FROM employees
WHERE name LIKE 'Red%';
Goldi crossed her fingers, held her breath, and let 'er rip.
8. What she got
Name, phone
Redford 1234
Redmund 2323
Redlegs 1234
Red Sox 1914
Redding 9021
●
But this was not what Goldilocks needed. So she asked a
kindly old Database Owl for help
9. The Owl's chant
'Ah, you want the nickname field!' He re-
crafted her chant.
SELECT first, nick, last, phone, group
FROM employees
WHERE nick LIKE '%red%';
10. Still too much data … but better
Betty, Big Red, Lopez, 4321, Accounting
Ethel, Little Red, Riding-Hoode, 127.0.0.1, Networks
Agatha, Red Herring, Christie, 007, Public Relations
Johnny, Reds Catcher, Bench, 421, Gaming
11. 'We can tune the query better'
Cried the Owl.
SELECT first, nick, name, phone, group
WHERE nick LIKE 'Red%'
AND group = 'Networking';
But Goldi was too busy after she got the
data she needed to listen.
12. The preceding were
obviously flawed queries
•
But how do you check if queries are running
efficiently?
•
What does the query the MySQL server runs
really look like? (the dreaded Query Execution
Plan). What is cost based optimization?
•
How can you make queries faster?
14. What is being EXPLAINed
Prepending EXPLAIN to a statement* asks the optimizer how it would
plan to execute that statement (and sometimes it guesses wrong) at
lowest cost (measures in disk page seeks*).
What it can tell you:
--Where to add INDEXes to speed row access
--Check JOIN order
And Optimizer Tracing (more later) has been recently introduced!
* SELECT, DELETE, INSERT, REPLACE & UPDATE as of 5.6, only SELECT 5.5 & previous
* Does not know if page is in memory, on disk (storage engine's problem, not optimizer), see
MySQL Manual 7.8.3
15. The Columns
id Which SELECT
select_type The SELECT type
table Output row table
type JOIN type
possible_keys Potential indexes
key Actual index used
key_ken Length of actual
index
ref Columns used
against index
rows Estimate of rows
extra Additional Info
16. A first look at EXPLAIN
Will read all 4079
rows – all the
rows in this table
17. EXPLAIN EXTENDED -> query plan
Filtered: Estimated % of rows filtered
By condition
The query as seen by server (kind of, sort of, close)
19. Time for a quick review of indexes
Advantages Disadvantages
●
Go right to desired ●
Overhead*
row(s) instead of – CRUD
reading ALL ●
Not used on full
ROWS table scans
●
Smaller than whole
table (read from
disk faster) * May need to run
●
Can 'carry' other ANALYZE TABLE to
update statistics such as
data with cardinality to help
compound optimizer make better
choices
indexes
21. Information in the type Column
ALL – full table scan (to be avoided when possible)
CONST – WHERE ID=1
EQ_REF – WHERE a.ID = b.ID (uses indexes, 1 row returned)
REF – WHERE state='CA' (multiple rows for key values)
REF_OR_NULL – WHERE ID IS NULL (extra lookup needed for NULL)
INDEX_MERGE – WHERE ID = 10 OR state = 'CA'
RANGE – WHERE x IN (10,20,30)
INDEX – (usually faster when index file < data file)
UNIQUE_SUBQUERY –
INDEX-SUBQUERY –
SYSTEM – Table with 1 row or in-memory table
22. Full table scans VS Index
So lets create a copy of
the World.City table that
has no indexes. The
optimizer estimates that
it would require 4,279
rows to be read to find
the desired record – 5%
more than actual rows.
And the table has only
4,079 rows.
23. How does NULL change things?
Taking NOT NULL away
from the ID field (plus
the previous index)
increases the estimated
rows read to 4296!
Roughly 5.5% more
rows than actual in file.
Running ANALYZE TABLE
reduces the count to
3816 – still > 1
27. Latin1 versus UTF8
Create a copy of the City
table but with UTF8
character set replacing
Latin1. The three
character key_len grows
to nine characters. That
is more data to read and
more to compare which
is pronounced 'slower'.
28. INDEX Length
If a new index on
CountryCode with
length of 2 characters,
does it work as well as
the original 3 chars?
29. Forcing use of new shorter index ...
Still generates a
guesstimate that 39
rows must be read.
In some cases there is
performance to be
gained in using shorter
indexes.
30. Subqueries
Run as part of EXPLAIN
execution and may
cause significant
overhead. So be careful
when testing.
Note here that #1 is not
using an index. And that
is why we recommend
rewriting sub queries as
joins.
31. EXAMPLE of covering Indexing
In this case, adding an
index reduces the reads
from 239 to 42.
Can we do better for this
query?
32. Index on both Continent and
Government Form
With both Continent and
GovernmentForm indexed
together, we go from 42 rows
read to 19.
Using index means the data is
retrieved from index not table
(good)
Using index condition means
eval pushed down to storage
engine. This can reduce
storage engine read of table
and server reads of storage
engine (not bad)
33. Extra ***
USING INDEX – Getting data from the index rather
than the table
USING FILESORT – Sorting was needed rather than
using an index. Uses file system (slow)
ORDER BY can use indexes
USING TEMPORARY – A temp table was created –
see tmp_table_size and max_heap_table_size
USING WHERE – filter outside storage engine
Using Join Buffer -- means no index used.
37. Index Hints
Use only as a last resort –
index_hint:
shifts in data can make
USE {INDEX|KEY}
this the 'long way
[{FOR {JOIN|ORDER BY|
GROUP BY}] ([index_list]) around'.
| IGNORE {INDEX|KEY}
[{FOR {JOIN|ORDER BY|
GROUP BY}] (index_list)
| FORCE {INDEX|KEY}
[{FOR {JOIN|ORDER BY|
GROUP BY}] (index_list) http://dev.mysql.com/doc/refman/5.6/en/index-
hints.html
38. Controlling the Optimizer
mysql> SELECT @@optimizer_switchG
You can turn on or off
*************************** 1. row
*************************** certain optimizer
@@optimizer_switch:
index_merge=on,index_merge_union=on,
settings for
index_merge_sort_union=on, GLOBAL or
index_merge_intersection=on, SESSION
engine_condition_pushdown=on,
index_condition_pushdown=on,
mrr=on,mrr_cost_based=on, See MySQL Manual
block_nested_loop=on,batched_key_access=off
7.8.4.2 and know
your mileage may
vary.
39. Things to watch
mysqladmin -r -i 10 extended-status
Slow_queries – number in last period
Select_scan – full table scans
Select_full_join full scans to complete
Created_tmp_disk_tables – fielsorts
Key_read_requerts/Key_wrtie_requests – read/write
weighting of application, may need to modify
application
40. Optimizer Tracing (6.5.3 onward)
SET optimizer_trace="enabled=on";
SELECT Name FROM City WHERE ID=999;
SELECT trace into dumpfile '/tmp/foo' FROM
INFORMATION_SCHEMA.OPTIMIZER_TRACE;
Shows more logic than EXPLAIN
The output shows much deeper detail on how the
optimizer chooses to process a query. This level of
detail is well past the level for this presentation.
41. Sample from the trace – but no clues
on optimizing for Joe Average DBA
42. Final Thoughts on EXPLAIN
1. READ chapter 7 of the MySQL Manual
2. Add index on columns in WHERE clause
3. Run ANALYZE TABLE periodically
4. Adjust buffer pool size, minimize disk I/o
44. 2. Check return codes
$result = mysqli_query(“SELECT Id FROM City
WHERE Id = '$city_id'”);
If ($result) {
// Code where query executed
} else {
// Code when query did not execute
}
45. 3. Use Prepared Statements
If ($stmt = mysqli_>prepare(“INSERT INTO FOO
VALUES (?,?,?)”);
$stmt->bind_param('ssd',$first,$last,$age);
$first = 'Joe';
$last='Jones';
$age='22';
$stmt->execute();
If (!mysqli_stmt_affected_rows($stmt)) {
// PROBLEM
}
46. 4. Be careful reporting problems
mysqli_query(“DROP TABLE foo.bar”);
$result = mysqli_stmt_execute($stmt);
If (!$result) { // Did NOT EXECUTE
printf("Error: %s.n",
mysqli_stmt_error($stmt));
}
Can give hackers
clues!
47. 5. Ask for what you need for speed
SELECT * SELECT Name, Phone,
FROM foo Customer_id
WHERE id = $id FROM foo
WHERE id=$id
SLOW! FASTER!!
A single 40 minute session can not turn you into a good DBA than the same time to turn a novice into a good system admin. We have been promoting MySQL on generic hardware – but sometimes you need better performance and that may require specialized hardware.
Most Linux based services are CPU bound – not database servers
Databases are the heart of the LAMP stack. Poor hardware choice is like an unhealthy heart.
This session will cover much of what you need to know for a solid foundation to make your MySQL database per optimally but …
That last 20% is where a good DBA comes in to play Indy Car example
Gross over simplification! Client wants phone number for Joe from database
Database server parses query, finds record in memory, returns phone number
Client program proceeds with phone number
Rule 1
The database needs to get data not in memory off storage Also updates need to get written to disk Disk is MUCH slower than memory
Another vast over simplification
I/O subsystem is critical to performance. Spend money here instead of latest speedy processor.