1. Explain that “Explain”
The Road to Understanding
“You Should Not Fight with the Database, the Database is your Friend”
put together by Fabrizio Parrella
PARTS ARE QUOTED OR COPIED OR REF FROM:
http://devzone.zend.com/1436/the-zendcon-sessions-episode-17-sql-query-tuning-the-legend-of-drunken-query-master/
http://www.slideshare.net/phpcodemonkey/mysql-explain-explained
2. get to know your friend
➲ Recognize the strengths and also the weaknesses of
your database
➲ No database is perfect -- deal with it, you're not
perfect either
➲ Think of both big things and small things
BIG: Architecture, surrounding servers, caching
SMALL: SQL coding, join rewrites, server config
3. becoming friends
➲ Understand storage engine abilities and weaknesses
➲ Understand how the query cache and important
buffers works
➲ Understand optimizer's limitations
➲ Understand what should and should not be done at
the application level
➲ If you understand the above, you'll start to see the
database as a friend and not an enemy
4. the schema
➲ Basic foundation of performance
➲ Everything else depends on it
➲ Choose your data types wisely
➲ “Divide et Impera” the schema through partitioning
A divide and conquer (D&C) algorithm works by recursively break down a problem into two or more sub-
problems of the same (or related) type, until there become simple enough to be solved directly. The
solution to the sub-problems is then combined to give a solution to the original problem.
http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
5. size does matter !!
smaller, smaller, SMALLER
The more records you can fit into a single page of
memory/disk, the faster your seeks and scans will be
➲ Do you really need that BIGINT?
➲ Use INT UNSIGNED for IPv4 addresses
➲ Use VARCHAR carefully
Converted to CHAR when used in a temporary table
➲ Use TEXT sparingly
Consider separate tables
➲ Use BLOBs very sparingly
Use the filesystem for what it was intended
6. real life example... handling IPv4 addresses
CREATE TABLE Sessions (
session_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ip_address INT UNSIGNED NOT NULL, // Compare to CHAR(15)...
session_data TEXT NOT NULL,
PRIMARY KEY (session_id),
INDEX (ip_address)
) ENGINE=InnoDB;
// Insert a new dummy record
INSERT INTO Sessions VALUES
(NULL, INET_ATON('192.168.0.2'), 'some session data');
SELECT
session_id,
ip_address as ip_raw,
INET_NTOA(ip_address) as ip,
session_data
FROM Sessions
WHERE
ip_address BETWEEN INET_ATON('192.168.0.1') AND INET_ATON('192.168.0.255');
+------------+------------+-------------+-------------------+
| session_id | ip_raw | ip | session_data |
+------------+------------+-------------+-------------------+
| 1 | 3232235522 | 192.168.0.2 | some session data |
+------------+------------+-------------+-------------------+
7. SETs and ENUMs
➲ Often sign of poor schema design
➲ Changing the definition will most likely require a full
rebuild of the table
➲ Search functions like FIND_IN_SET() are inefficient
compared to index operation on a join
9. vertical partitioning
➲ Never mix frequently and infrequently
accessed fields in a single table
➲ Splitting tables allows main records to consume the buffer
pages without the extra data taking up space in memory
➲ Do you need FULLTEXT on your text columns (PRE 5.6.4)?
CREATE TABLE Users (
user_id INT NOT NULL AUTO_INCREMENT,
email VARCHAR(80) NOT NULL,
display_name VARCHAR(50) NOT NULL,
password CHAR(41) NOT NULL,
first_name VARCHAR(25) NOT NULL,
last_name VARCHAR(25) NOT NULL,
address VARCHAR(80) NOT NULL,
city VARCHAR(30) NOT NULL,
province CHAR(2) NOT NULL,
postcode CHAR(7) NOT NULL,
interests TEXT NULL,
bio TEXT NULL,
signature TEXT NULL,
skills TEXT NULL,
PRIMARY KEY (user_id),
UNIQUE INDEX (email)
) ENGINE=InnoDB;
CREATE TABLE Users (
user_id INT NOT NULL AUTO_INCREMENT,
email VARCHAR(80) NOT NULL,
display_name VARCHAR(50) NOT NULL,
password CHAR(41) NOT NULL,
PRIMARY KEY (user_id),
UNIQUE INDEX (email)
) ENGINE=InnoDB;
CREATE TABLE UserExtra (
user_id INT NOT NULL
first_name VARCHAR(25) NOT NULL
last_name VARCHAR(25) NOT NULL
address VARCHAR(80) NOT NULL
city VARCHAR(30) NOT NULL
province CHAR(2) NOT NULL
postcode CHAR(7) NOT NULL
interests TEXT NULL
bio TEXT NULL
signature TEXT NULL
skills TEXT NULL
PRIMARY KEY (user_id)
FULLTEXT KEY (interests, skills)
) ENGINE=MyISAM;
10. understand MySQL query cache
➲ You must understand your application's read/write
patterns
➲ Internal query cache design is a compromise between
CPU usage and read performance
➲ Stores the MYSQL_RESULT of a SELECT along with a hash
of the SELECT SQL statement
➲ Any modification to any table involved in the SELECT
invalidates the stored result
➲ Write applications to be aware of the query cache
Use SELECT SQL_NO_CACHE
11. coding like a master
➲ Be consistent (for crying out loud)
➲ Use ANSI SQL coding style (vs. Theta)
➲ Stop thinking in terms of iterators, for loops, while
loops, etc
➲ Instead, think in terms of sets
➲ Break complex SQL statements (or business requests)
into smaller, manageable chunks
12. Consistency, consistency, CONSISTENCY !!
➲ Tabs and Spacing
➲ Upper and Lower Case
➲ Keywords, function names
Nothing pisses offthe query master likeinconsistent SQL code!
SELECT
a.first_name,
a.last_name,
COUNT(*) as num_rentals
FROM actor a
INNER JOIN film f ON a.actor_id = f.actor_id
GROUP BY a.actor_id
ORDER BY
num_rentals DESC,
a.last_name,
a.first_name
LIMIT 10;
vs.
select first_name, a.last_name,
count(*) AS num_rentals
FROM actor a join film on a.actor_id = film.actor_id
group by a.actor_id order by
num_rentals DESC, a.last_name, a.first_name
LIMIT 10;
➲ Aliases
➲ Consider your
teammates
➲ Like your code, SQL is
meant to be read, not
written
13. guidelines
➲ Beware of join hints
“force index” can get “out of date”
➲ Just because it can be done in a single SQL
statement doesn't meat it should
➲ ALWAYS test and benchmark your solution
14. ANSI vs. THETA
SELECT
a.first_name,
a.last_name,
COUNT(*) as num_rentals
FROM actor a
INNER JOIN film_actor fa ON a.actor_id = fa.actor_id
INNER JOIN film f ON fa.film_id = f.film_id
INNER JOIN inventory I ON f.film_id = i.film_id
INNER JOIN rental r ON r.inventory_id = i.inventory_id
GROUP BY a.actor_id
ORDER BY
num_rentals DESC,
a.last_name,
a.first_name
LIMIT 10; SELECT
a.first_name,
a.last_name,
COUNT(*) as num_rentals
FROM
actor a,
film f,
film_actor fa,
inventory i,
rental r
WHERE
a.actor_id = fa.actor_id
AND fa.film_id = f.film_id
AND f.film_id = i.film_id
AND r.inventory_id = i.inventory_id
GROUP BY a.actor_id
ORDER BY
num_rentals DESC,
a.last_name,
a.first_name
LIMIT 10;
ANSI STYLE
Explicitly declare JOIN conditions
using the ON clause
THETA STYLE
Implicitly declare JOIN conditions
in the WHERE clause
15. why ANSI style kicks THETA style's A55
➲ MySQL THETA style only supports INNER and CROSS
join
But MySQL ANSI style supports INNER, CROSS, LEFT, RIGHT,
and NATURAL joins
Mixing and matching both styles can lead to hard-to-read
SQL code
➲ It is extremely easy to miss a join condition with
THETA style
Especially when joining many tables
Forgetting a Join will produce a cartesian product (NOT
GOOD !!!)
16. WITHOUT THE STRENGHT OF
EXPLAIN
YOU WILL GET LOST IN THE FIELDS
OF MISUNDERSTANDING
how to test our SQL
17. EXPLAIN the basics
➲ Provides the execution plan chosen by the MySQL
optimizer
➲ Simply prepend the word EXPLAIN in front of your
SELECT statement
➲ Each row represent a set of information for each
table used in the SELECT
18. EXPLAIN the columns
➲ select_type - type of “set” the data in this row
contains (SIMPLE, DERIVATE, SUBQUERY, etc..)
➲ table - alias (or full table name if no alias) of the table
or derived table from which the data in this set
comes
➲ type - “access strategy” used to grab the data in this
set (ALL, CONST, REF, etc...)
➲ possible_keys - keys available to optimizer for query
➲ keys - keys chosen by the optimizer
➲ key_len – number of bytes used from the keys
➲ ref - shows the column used in join relations
➲ rows - estimate of the number of rows in this set
➲ Extra - information the optimizer chooses to give you
19. EXPLAIN the output
EXPLAIN
SELECT
a.first_name,
a.last_name,
COUNT(*) as num_rentals
FROM film f
INNER JOIN film_category fc ON f.film_id = fc.film_id
INNER JOIN category c ON fc.category_id = c.category_id
WHERE f.title LIKE 'T%'G
*************************** 1. row ***************************
select_type: SIMPLE
table: c
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 16
Extra:
*************************** 2. row ***************************
select_type: SIMPLE
table: fc
type: ref
possible_keys: PRIMARY, fk_film_category_category
key: fk_film_category_category
key_len: 1
ref: c.category_id
rows: 1
Extra: using index
*************************** 2. row ***************************
select_type: SIMPLE
table: f
type: eq_ref
possible_keys: PRIMARY, idx_title
key: PRIMARY
key_len: 2
ref: fc.film_id
rows: 1
Extra: using where
estimate row count
available indexes and
the chosen one
a covering index
was used
20. EXPLAIN a real world example
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
AND registration_status > 0
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: NULL
key: NULL
rows: 14052
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
➲ The three most important columns returned by EXPLAIN
possible_keys
All possible indexes which MYSQL could have used
Based on a series of very quick lookups and calculations
key: chosen key
rows: estimate of the scanned rows
21. EXPLAIN a real world example
➲ Interpreting the result:
No suitable indexes for this query
MySQL has to do a full scan of the table
Full table scans are almost always the slowest
Full table scans are usually an indication that an index is
needed
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
AND registration_status > 0
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: NULL
key: NULL
rows: 14052
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
22. EXPLAIN a real world example
➲ MySQL has two indexes to choose from
➲ “reg” is not “sufficently unique”
the spread of the values can also be a factor (e.g. when 99% of
rows contain the same value)
➲ Index “uniqueness” is called cardinality
➲ There is space for performance increase
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
AND registration_status > 0
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: conf, reg
key: conf
rows: 331
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
ADD INDEX conf (conference_id),
ADD INDEX reg (registration_status);
23. EXPLAIN a real world example
➲ “reg_conf_index” is a much better choice
➲ Other keys are still available, just not as effective
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
AND registration_status > 0
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: reg, conf, reg_conf_index
key: reg_conf_index
rows: 204
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
ADD INDEX reg_conf_index (registration_status, conference_id);
24. EXPLAIN a real world example
➲ Seems like that also without the “reg” index everything is
working just as expected
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
registration_status = 2
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: reg_conf_index
key: reg_conf_index
rows: 372
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
DELETE INDEX reg,
DELETE INDEX conf;
25. EXPLAIN a real world example
➲ Without the “conf” index we are at square one
➲ The orders in which the fields are defined in a composite index
affects whether is available in a query
➲ Potential workaround
SELECT * FROM attendees WHERE conference_id = 123 AND
registration_id > 0;
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: NULL
key: NULL
rows: 14502
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
DELETE INDEX reg,
DELETE INDEX conf;
26. EXPLAIN a real world example
➲ Great, MySQL it is using the index on “lastname”, which is good
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
lastname LIKE “parr%”
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: lastname
key: lastname
rows: 234
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
ADD INDEX lastname (lastname);
27. EXPLAIN a real world example
➲ MySQL doesn't even try to use an index !
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM attendees
WHERE
lastname LIKE “%arr%”
//Let's only show the important parts for now
*************************** 1. row ***************************
table: attendees
possible_keys: NULL
key: NULL
rows: 14052
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE attendees
ADD INDEX lastname (lastname);
28. EXPLAIN a real world example (pre MySQL 5.1)
➲ MySQL doesn't use an index because of the OR
➲ MySQL perform a full table scan
➲ Workaround, use “UNION”
➲ Workaround, add a composite INDEX
ALTER TABLE conference
ADD INDEX location_topic (location_id, topic_id);
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM conferences
WHERE
location_id = 2
OR topic_id IN (4,6,1)
//Let's only show the important parts for now
*************************** 1. row ***************************
table: conferences
possible_keys: location_id, topic_id
key: NULL
rows: 5043
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
ALTER TABLE conferences
ADD INDEX location_id (location_id)
ADD INDEX topic_id (topic_id);
29. EXPLAIN a real world example
➲ Looks like we need an index on “conference_id” on attendees
➲ How many total ROWS are estimate ?
CREATE TABLE `attendees` (
`attendee_id` int(11) NOT NULL,
`lastname` varchar(50) NOT NULL,
`conference_id` int(11) NOT NULL,
`registration_status` tinyint(4) NOT NULL,
PRIMARY KEY (`attendee_id`)
) ENGINE=InnoDB;
EXPLAIN
SELECT *
FROM conferences c
INNER JOIN attendees a USING (conference_id)
WHERE
c.location_id = 2
AND c.topic_id IN (4,6,1)
AND a.registration_status > 1
//Let's only show the important parts for now
*************************** 1. row ***************************
table: c
possible_keys: conference_topic
key: conference_topic
rows: 15
*************************** 1. row ***************************
table: a
possible_keys: NULL
key: NULL
rows: 14502
CREATE TABLE `conferences` (
`conference_id` int(11) NOT NULL,
`location_id` int(11) NOT NULL,
`topic_id` int(11) NOT NULL,
`date` date NOT NULL,
PRIMARY KEY (`conference_id`)
) ENGINE=InnoDB;
15 x 14502
31. EXPLAIN the type
➲ CONST: SELECT * FROM table WHERE field = “value”;
The field needs to be indexed with a unique non-nullable key
If non-unique or nullable the type will be “ref”
It refers to when a table with a single row is referenced in the SELECT
Can be propagate across multiple joined columns:
EXPLAIN
SELECT r.*
FROM rental r
INNER JOIN customer c ON r.customer_id = c.customer_id
WHERE r.rental_id = 13G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: r
type: const
possible_keys: PRIMARY,idx_fk_customer_id
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: c
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 2
ref: const /* Here is where the propagation occurs...*/
rows: 1
Extra:
2 rows in set (0.00 sec)
32. EXPLAIN the type
➲ RANGE: SELECT * FROM table WHERE field BETWEEN “value” AND
“value”;
The field needs to be indexed
It too many records are estimated, it won't be used
EPLAIN
SELECT *
FROM rental
WHERE rental_date BETWEEN '2005-06-14' AND '2005-06-16'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: range
possible_keys: rental_date
key: rental_date
key_len: 8
ref: NULL
rows: 364
Extra: Using where
1 row in set (0.00 sec)
33. EXPLAIN the type
➲ ALL: SELECT * FROM table WHERE field BETWEEN “value” AND
“far away from starting value”;
No WHERE condition (duh)
No index on the field in the WHERE condition
Poor selectivity on the indexed field
Too many records meet the WHERE condition
SEEK: jumps into random places to fetch the data and repeat for each
piece of data needed
SCAN: jump to the start and sequentially read the data
For large amount of data, SCAN operations tends to be more efficient than
multiple SEEK operations
Using SELECT * FROM
EPLAIN
SELECT *
FROM rental
WHERE rental_date BETWEEN '2001-01-14' AND '2012-12-31'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: ALL
possible_keys: rental_date /* large range force full scan */
key: NULL
key_len: NULL
ref: NULL
rows: 16298
Extra: Using where
1 row in set (0.00 sec)
34. EXPLAIN the type
➲ INDEX_MERGE: SELECT * FROM table WHERE field = “value”
AND field1 = “value”;
Introduced with the optimizer on MySQL 5.0
Allows the optimizer to use more than one index to satisfy a join condition
Prior to MySQL 5.0, only one index
In case of OR conditions, MySQL < 5.0 would use a full table scan
EXPLAIN
SELECT *
FROM rental
WHERE
rental_id IN (10,11,12)
OR rental_date = '2006-02-01' G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index_merge
possible_keys: PRIMARY,rental_date
key: rental_date,PRIMARY
key_len: 8,4
ref: NULL
rows: 4
Extra: Using sort_union(rental_date,PRIMARY); Using where
1 row in set (0.02 sec)
36. EXPLAIN the Extra
➲ “Extra” shows additional operations invoked to get your result set
➲ Some common values are (more are discussed in the MySQL
manual):
Using where
Using temporary table
Using filesort
Using index
EXPLAIN
SELECT *
FROM rental
WHERE
rental_id IN (10,11,12)
OR rental_date = '2006-02-01' G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: rental
type: index_merge
possible_keys: PRIMARY,rental_date
key: rental_date,PRIMARY
key_len: 8,4
ref: NULL
rows: 4
Extra: Using sort_union(rental_date,PRIMARY); Using where
1 row in set (0.02 sec)
37. EXPLAIN the Extra
➲ Using filesort: AVOID
➲ Avoid because
Doesn't Use Index
Involves a full scan
Uses a generic algorithm (one fits all)
Uses filesystem (BAD !!)
Gets slower with more data
➲ It's not all that bad
Sometime unavoidable - ORDER BY RAND()
Acceptable provided you get to your result as quickly as possible, and
keep it predictably small
EXPLAIN
SELECT *
FROM attendees
WHERE
conference_id = 123
ORDER BY lastname
*************************** 1. row ***************************
table: attendees
possible_keys: conference_id
key: conference_id
rows: 331
Extra: Using filesort
38. EXPLAIN the Extra
➲ Using index: GOOD
➲ Celebrate because
MySQL got your results just by consulting the index
MySQL didn't need to look at the table to get the results (open table is
expensive)
Fastest way to get your data
➲ Particularly useful...
When you are interested in a single data or id
When you are interested in COUNT(), SUM(), AVG(), etc. of a field
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123
*************************** 1. row ***************************
table: attendees
possible_keys: conference_id
key: conference_id
rows: 331
Extra:
ALTER TABLE attendees ADD INDEX conf_age (conference_id, age);
EXPLAIN SELECT AVG(age) FROM attendees WHERE conference_id = 123
*************************** 1. row ***************************
table: attendees
possible_keys: conference_id, conf_surname
key: conf_surname
rows: 331
Extra: Using index
Nothing is actually wrong with this query, it just could be quicker
Outside from caching, this is the fastest way to get your data
39. INDEXES... your schema's phone book
➲ Speed up SELECTs, but slow down modifications
➲ Make sure you have indexes on columns used in
WHERE, ON, and GROUP BY clauses
➲ Always ensure that JOIN conditions are indexed AND
have identical data types
➲ Good keys:
Selectivity:
% of distinct values
= distinct values / number rows
unique or primary always 1
Low selectivity:
Maybe you can put it in a multi-column index
Prefix ? Suffix ? It depends on your application
40. indexed columns and functions don't mix
A full table scan is used because a function (LEFT) is operating on
the lastname column.
Let's Fix this...
EXPLAIN
SELECT *
FROM attendees
WHERE
LEFT(lastname.2) = “Pa”
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 951
Extra: Using where
EXPLAIN
SELECT *
FROM attendees
WHERE
lastname LIKE “Pa%”
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: film
type: range
possible_keys: idx_title
key: idx_title
key_len: 767
ref: NULL
rows: 15
Extra: Using where
41. let's fix multiple issues with a SELECT query
First, we are operating on an index column (order_created) with a function – let's fix that:
SELECT * FROM orders WHERE TO_DAYS(CURRENT_DATE()) - TO_DAYS(order_created) <= 7;
Even if we removed the function in the WHERE expression, we still have a non-
deterministic function in the statement which eliminates this query from being places in
the query cache – let's fix that:
SELECT * FROM orders WHERE order_created >= CURRENT_DATE() - INTERVAL 7 DAYS;
We replaced the function with a constant, however we are specifying a SELECT * instead
than the actual fields that we need.
What is there is a TEXT field in the table that we don't seen to see ? Having it included in
the result means a larger result set which may not fit in the query cache and may force a
disk-based temporary table – let's fix that:
SELECT * FROM orders WHERE order_created >= '2013-01-13' - INTERVAL 7 DAYS;
SELECT
order_id,
customer_id,
order_total,
date_created
FROM orders
WHERE order_created >= '2013-01-13' - INTERVAL 7 DAYS;
42. good indexes vs. bad indexes
Don't forget that MySQL string indexes allow only 1000 characters (333
using UTF-8).
Let's say you have 11,000,000 records in a table called “USERS” with
the following fields:
➲ user, firstname, lastname, gender, email, age, country_id
Our application perform searched on the following fields:
➲ user
➲ firstname, lastname, gender
➲ email
It is obvious to create indexes on user and email, especially if they are
unique, but what about the other fields?
➲ “gender” can be M or F, selectivity is very low 2/11,000,000 = 0.
Best would be to remove the index on gender if you have it
➲ “firstname”/”lastname” depend on the uniqueness of the values
stores.
SELECT DISTINCT to calculate the selectivity
if it is above 15% keep it
below 15% you might want to create a composite INDEX
43. removing crappy or redundant indexes
SELECT
t.TABLE_SCHEMA AS `db`,
t.TABLE_NAME AS `table`,
s.INDEX_NAME AS `index name`,
s.COLUMN_NAME AS `field name`,
s.SEQ_IN_INDEX `seq in index`,
s2.max_columns AS `# cols,
s.CARDINALITY AS `card`,
t.TABLE_ROWS AS `est rows`,
ROUND(((s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) * 100), 2) AS `sel %`
FROM
INFORMATION_SCHEMA.STATISTICS s
INNER JOIN INFORMATION_SCHEMA.TABLES t ON s.TABLE_SCHEMA = t.TABLE_SCHEMA AND s.TABLE_NAME = t.TABLE_NAME
INNER JOIN (
SELECT
TABLE_SCHEMA,
TABLE_NAME,
INDEX_NAME,
MAX(SEQ_IN_INDEX) AS max_columns
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA != 'mysql'
GROUP BY
TABLE_SCHEMA,
TABLE_NAME,
INDEX_NAME
) AS s2 ON s.TABLE_SCHEMA = s2.TABLE_SCHEMA AND s.TABLE_NAME = s2.TABLE_NAME AND s.INDEX_NAME = s2.INDEX_NAME
WHERE
t.TABLE_SCHEMA != 'mysql' /* Filter out the mysql system DB */
AND t.TABLE_ROWS > 10 /* Only tables with some rows */
AND s.CARDINALITY IS NOT NULL /* Need at least one non-NULL value in the field */
AND (s.CARDINALITY / IFNULL(t.TABLE_ROWS, 0.01)) < 1.00 /* unique indexes are perfect anyway */
ORDER BY
`sel %`, /* DESC for best non-unique indexes */
s.TABLE_SCHEMA,
s.TABLE_NAME
LIMIT 100