MySQL Performance Optimization #JDNL13

MYSQL PERFORMANCE
OPTIMIZATION
(MYSQL PRESTATIE-OPTIMALISATIE)
ALS JE DIT MOEST LEZEN, DAN BEN JE IN DE VERKEERDE KAMER… 
JOOMLADAY NETHERLANDS 2013
JOOMLADAY NETHERLANDS 2013 - ELI ASCHKENASY - @ELIASCHKENASY 1

INTRODUCTION

INTRODUCTION
• Eli Aschkenasy

INTRODUCTION
• Eli Aschkenasy
• themodularway.com

INTRODUCTION
• Eli Aschkenasy
• Oracle certified (doesn’t really mean anything)

INTRODUCTION
• Eli Aschkenasy
• GE sourcing database project

INTRODUCTION
• Eli Aschkenasy
• GE sourcing database project
• Agenda

AGENDA

AGENDA
• Introduction (5min)

AGENDA
• Strategic Overview (5min)

AGENDA
• Design Considerations – Schema Optimization – Strategic (10min)

AGENDA
• Design Considerations – Schema Optimization – Normalization
(10min hopefully)

AGENDA
(10min hopefully)
• Indexing (5min)

AGENDA
(10min hopefully)
• Indexing (5min)
• Query Optimization (10min)

STRATEGIC OVERVIEW

STRATEGIC OVERVIEW
• Find smart people – Learn from them

STRATEGIC OVERVIEW
• Find less knowledgeable people – Teach them (great design
benefits)

STRATEGIC OVERVIEW
• Find less knowledgeable people – Teach them (great design benefits)
• Talk to the business people! – Become their marketing specialist

STRATEGIC OVERVIEW
• Benchmark – Do It !!!!!!!!!!

STRATEGIC OVERVIEW
• Benchmark – Do It !!!!!!!!!!
• Decide early

AGENDA
(10min hopefully)
• Indexing (5min)

DESIGN CONSIDERATIONS
SCHEMA OPTIMIZATION – STRATEGIC

• DB Engines (InnoDB vs. MyISAM) – not including MySQL 5.6

InnoDB
• Transactional
• Hot (Online) Backup
• Crash Safe(er)
MyISAM
• Full-Text Indexing
• Compression
• Low(er) Space Consumption

InnoDB
• Transactional
• Crash Safe(er)
MyISAM
• Compression
• Always!
• Transactional

InnoDB
• Transactional
• Crash Safe(er)
MyISAM
• Compression
• Always!
• Transactional
• Logging
• Read Only

• Data Types (Numbers, Strings, Special)

NUMBERS STRINGS SPECIAL

NUMBERS
INT(1) vs. INT(20) ? (trick question)
UNSIGNED vs. SIGNED
FLOAT isn’t accurate but fast
DECIMAL in MySQL<4.1 calculated
as FLOAT
DECIMAL needs additional byte to
store decimal point
Think if BIGINT could be used even
for precise calculations
(x*1’000’000)
PLEASE, PLEASE, use unsigned
integer as primary index. (we’ll
talk about it later again).
STRINGS SPECIAL

NUMBERS
UNSIGNED vs. SIGNED
as FLOAT
store decimal point
(x*1’000’000)
STRINGS
Use CHAR for fixed length columns
US States is the best example
State code are perfect for
TINYINT UNSIGNED
MD5 Hash values are
another good candidate for
CHAR usage
VARCHAR requires length byte!
VARCHAR allocates space in chunks
so don’t be overly generous
TEXT is problematic in SORTs
Trick: Use SUBSTRING(x, fixed) to
alleviate the problem
SPECIAL

NUMBERS
UNSIGNED vs. SIGNED
as FLOAT
store decimal point
(x*1’000’000)
STRINGS
Use CHAR for fixed length columns
US States is the best example
State code are perfect for
TINYINT UNSIGNED
MD5 Hash values are
another good candidate for
CHAR usage
VARCHAR requires length byte!
VARCHAR allocates space in chunks
so don’t be overly generous
TEXT is problematic in SORTs
Trick: Use SUBSTRING(x, fixed) to
alleviate the problem
SPECIAL
Avoid NULL if possible
Harder for MySQL to
optimize query
Use TIMESTAMP instead of DATETIME
Limit 2038
EST
TIMESTAMP uses less
space
(I hardly ever use TIMESTAMP…)
BIT (actually string form)
Don’t use ENUM or SET

CODE EXAMPLES - BIT
mysql> CREATE TABLE bittest(a bit(8));
mysql> INSERT INTO bittest VALUES(b'00111001');
mysql> SELECT a, a + 0 FROM bittest;
+------+-------+
| a | a + 0 |
+------+-------+
| 9 | 57 |
+------+-------+

CODE EXAMPLES - ENUM
mysql> CREATE TABLE enum_test(e ENUM('fish', 'apple', 'dog') NOT NULL);
mysql> INSERT INTO enum_test(e) VALUES('fish'), ('dog'), ('apple');
mysql> SELECT e + 0 FROM enum_test; mysql> SELECT e FROM enum_test ORDER
BY e;
+-------+
| e + 0 |
+-------+
| 1 |
| 3 |
| 2 |
+-------+

CODE EXAMPLES - ENUM
mysql> CREATE TABLE enum_test(e ENUM('fish', 'apple', 'dog') NOT NULL);
mysql> INSERT INTO enum_test(e) VALUES('fish'), ('dog'), ('apple');
mysql> SELECT e + 0 FROM enum_test; mysql> SELECT e FROM enum_test ORDER
BY e;
+-------+
| e + 0 |
+-------+
| 1 |
| 3 |
| 2 |
+-------+
+-------+
| e |
+-------+
| fish |
| apple |
| dog |
+-------+

CODE EXAMPLES – DATETIME NOT NULL
DATETIME NOT NULL DEFAULT '0000-00-00 00:00:00‘

AGENDA
(10min hopefully)
• Indexing (5min)

SCHEMA OPTIMIZATION – NORMALIZATION

• Normalized data is non-redundant
(Text book says this is the best option)

• Normalized data is non-redundant
(Text book says this is the best option)
• Reality introduces de-normalization
Welcome to Summary and Cache Tables

CODE EXAMPLES – SUMMARY TABLE
mysql> CREATE TABLE msg_per_hr (
hr DATETIME NOT NULL,
cnt INT UNSIGNED NOT NULL,
PRIMARY KEY(hr)
);
mysql> SELECT SUM(cnt) FROM msg_per_hr stored 23
hour
WHERE hr BETWEEN CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 23 HOUR concat rounds
to nearest hr.
AND CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 1 HOUR;
mysql> SELECT COUNT(*) FROM message first hour
WHERE posted >= NOW() - INTERVAL 24 HOUR
AND posted < CONCAT(LEFT(NOW(), 14), '00:00') - INTERVAL 23 HOUR;
mysql> SELECT COUNT(*) FROM message last hour
WHERE posted >= CONCAT(LEFT(NOW(), 14), '00:00');

CODE EXAMPLES – CACHE TABLE
mysql> DROP TABLE IF EXISTS my_cache_new, my_cache_old;
mysql> CREATE TABLE my_cache_new LIKE my_cache;
-- populate my_cache_new as desired
mysql> RENAME TABLE my_cache TO my_cache_old, my_cache_new TO my_cache;

CODE EXAMPLES – COUNTER TABLE
CONCURRENCY ISSUE
mysql> CREATE TABLE hit_counter (
cnt INT UNSIGNED NOT NULL
) ENGINE=InnoDB;
mysql> UPDATE hit_counter SET cnt = cnt + 1;
mysql> SELECT cnt FROM hit_counter;

CODE EXAMPLES – COUNTER TABLE
CONCURRENCY SOLUTION
mysql> CREATE TABLE hit_counter (
slot TINYINT UNSIGNED NOT NULL PRIMARY KEY,
cnt INT UNSIGNED NOT NULL
) ENGINE=InnoDB;
mysql> INSERT INTO hit_counter VALUES
(0,0),
(1,0),
(2,0),
(3,0),
…
(99,0);
mysql> UPDATE hit_counter SET cnt = cnt + 1 WHERE slot = RAND() * 100;
mysql> SELECT SUM(cnt) FROM hit_counter;

AGENDA
(10min hopefully)
• Indexing (5min)

INDEXING
HASH INDEX

INDEXING
HASH INDEX
• InnoDB creates adaptive hash indexes on frequent queries

INDEXING
HASH INDEX
• Main downside is that index doesn’t help sorting

INDEXING
HASH INDEX
• Main downside is that index doesn’t help sorting
• Hash indexes can’t speed up range queries (WHERE age > 18)

INDEXING
HASH INDEX – HOW TO – INCLUDING HASH
COLLISION
mysql> SELECT id FROM url WHERE url="http://www.joomladagen.nl";
CREATE TABLE pseudohash (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
url VARCHAR(255) NOT NULL,
url_crc INT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY(id)
);
CREATE TRIGGER pseudohash_crc_ins BEFORE INSERT ON pseudohash FOR EACH ROW BEGIN
SET NEW.url_crc=crc32(NEW.url);
CREATE TRIGGER pseudohash_crc_upd BEFORE UPDATE ON pseudohash FOR EACH ROW BEGIN
SET NEW.url_crc=crc32(NEW.url);
mysql> SELECT id FROM url WHERE url_crc=CRC32("http://www.joomladagen.nl") AND
url="http://www.joomladagen.nl";

INDEXING
INDEX PROBLEMS
SELECT * FROM Orgchart SELECT a, c FROM Orgchart SELECT a, c FROM Orgchart
WHERE lft - rgt = 1; WHERE lft - rgt = 1; WHERE lft =(rgt – 1);
SELECT name FROM people mysql> ALTER TABLE people ADD KEY (idx_name(6));
WHERE name = ‘Hans’;
CREATE TABLE t ( KEY(c1,c2,c3) ? KEY(c1,c3) Specificity!
c1 INT,
c2 INT,
c3 INT,
KEY(c1),
KEY(c2),
KEY(c3)
);

INDEXING
INDEX PROBLEMS
c1 INT,
c2 INT,
c3 INT,
KEY(c1),
KEY(c2),
KEY(c3)
);
SELECT cc FROM payment WHERE staff_id = 2 AND customer_id =
584;

INDEXING
INDEX PROBLEMS
c1 INT,
c2 INT,
c3 INT,
KEY(c1),
KEY(c2),
KEY(c3)
);
584;
KEY(staff_id,customer_id) ?

INDEXING
INDEX PROBLEMS
c1 INT,
c2 INT,
c3 INT,
KEY(c1),
KEY(c2),
KEY(c3)
);
584;
SELECT SUM(staff_id = 2), SUM(customer_id = 584) FROM
paymentG
*************************** 1. row ***************************
SUM(staff_id = 2): 7992
SUM(customer_id = 584): 30

INDEXING
INDEX PROBLEMS
c1 INT,
c2 INT,
c3 INT,
KEY(c1),
KEY(c2),
KEY(c3)
);
584;
SELECT SUM(staff_id = 2), SUM(customer_id = 584) FROM
paymentG
*************************** 1. row ***************************
SUM(staff_id = 2): 7992
SUM(customer_id = 584): 30
ALTER TABLE payment ADD KEY(customer_id, staff_id);

INDEXING
EXAMPLE
CREATE TABLE profile(
sex CHAR(1) NOT NULL,
age TINYINT NOT NULL,
country VARCHAR(255) NOT NULL,
region VARCHAR(255) NOT NULL DEFAULT ‘’,
city VARCHAR(255) NOT NULL DEFAULT ‘’,
color_hair VARCHAR(255) NOT NULL DEFAULT ‘’,
color_eyes VARCHAR(255) NOT NULL DEFAULT ‘’,
name
…
…
rating TINYINT NOT NULL DEFAULT 1,
PRIMARY KEY(id)
);

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
WHERE age BETWEEN 18 AND 25
ORDER BY rating ASC

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
ORDER BY rating ASC
MySQL can’t use added index if
primary index uses range criterion

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
ORDER BY rating ASC
KEY(sex, country)

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
ORDER BY rating ASC
KEY(sex, country)
NO Selectivity!!!

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
ORDER BY rating ASC
KEY(sex, country)
NO Selectivity!!!
KEY(sex, country)
Assumption: all searches will include sex
and most will include country

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
ORDER BY rating ASC
KEY(sex, country)
NO Selectivity!!!
KEY(sex, country)
Assumption: all searches will include sex
and most will include country
Trick: AND sex IN('m', 'f')

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
(sex, country, age)
(sex, country, region, age)
(sex, country, region, city, age)

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
(sex, country, age)
Using the IN() trick, we can implement
just the
(sex, country, region, city, age) index.

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
(sex, country, age)
just the
Why is age at end?

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
(sex, country, age)
just the
Why is age at end?
Remember our range problem?
MySQL uses indexes from left to right
unit the first range query
Trick: Convert WHERE age BETWEEN 18 and 25
to
WHERE age IN(18,19,20,21,22,23,24,25)

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
SELECT age, country, …. , name FROM profiles
WHERE sex=‘F‘
ORDER BY rating
LIMIT 100000, 10;
Retrieving 100’010 rows, discarding
100’000
If data load per row is 15kb calculated data is
18.32Mb and we’re discarding 18.31Mb!
INNER JOIN (
SELECT id FROM profiles
WHERE x.sex='M‘
ORDER BY rating LIMIT 100000, 10
) AS x
USING(id);

INDEXING
EXAMPLE
name
…
…
PRIMARY KEY(id)
);
KEY(sex, rating)
WHERE sex=‘F‘
ORDER BY rating
LIMIT 100000, 10;
Retrieving 100’010 rows, discarding
100’000
If data load per row is 15kb calculated data is
18.32Mb and we’re discarding 18.31Mb!
INNER JOIN (
SELECT id FROM profiles
WHERE x.sex='M‘
ORDER BY rating LIMIT 100000, 10
) AS x
USING(id);

AGENDA
(10min hopefully)
• Indexing (5min)

QUERY OPTIMIZATION
TOO MUCH DATA
SELECT * FROM Orgchart
WHERE lft - rgt = 1;

QUERY OPTIMIZATION
TOO MUCH DATA
SELECT * FROM recipes.ingredient
INNER JOIN recipes.recipe_ingredient
USING(ingredient_id)
INNER JOIN recipes.recipe USING(recipe_id)
WHERE recipes.recipe.name = ‘Stroopwafel';

QUERY OPTIMIZATION
TOO MUCH DATA
All I want is the ingredients for a Stroopwafel
This Query returns all columns for all three tables!

QUERY OPTIMIZATION
TOO MUCH DATA
SELECT recipes.ingredient.* FROM recipes.ingredient
...
;

QUERY OPTIMIZATION
TOO MUCH DATA
...
;
SELECT img, name, …, comment FROM
comments
WHERE user_id = 123983;

QUERY OPTIMIZATION
TOO MUCH DATA
...
;
comments
What, did the image and name change
in the last 2microseconds?!?

QUERY OPTIMIZATION
TOO MUCH DATA
...
;
comments
What, did the image and name change
in the last 2microseconds?!?
SELECT img, name, …, FROM comments
SELECT comment FROM comments

QUERY OPTIMIZATION
DIVIDE AND CONQUER

QUERY OPTIMIZATION
DIVIDE AND CONQUER
DELETE FROM messages
WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH);

QUERY OPTIMIZATION
DIVIDE AND CONQUER
WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH);
<?php
…
$rows_affected = 0;
do {
$rows_affected = do_query("
WHERE created < DATE_SUB(NOW(),INTERVAL 3 MONTH)
LIMIT 10000
");
} while $rows_affected > 0;

QUERY OPTIMIZATION
DIVIDE AND CONQUER

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
JOIN tag_post ON tag_post.tag_id=tag.id
JOIN post ON tag_post.post_id=post.id
WHERE tag.tag=‘joomladagen';

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
SELECT * FROM tag WHERE tag='joomladagen';
SELECT * FROM tag_post WHERE tag_id=1234;
SELECT * FROM post WHERE post.id IN(123,456,567,9098,8904);

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
WHY? DENORMALIZED ?!?

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
• Query cache validation of 3 tables instead of one

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
• Lock contention (?)

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
• Query index lookup of IN() is better than with JOINS

QUERY OPTIMIZATION
DIVIDE AND CONQUER
SELECT * FROM tag
• Query index lookup of IN() is better than with JOINS
• Potential for application caching

AGENDA
(10min hopefully)
• Indexing (5min)

MySQL Performance Optimization #JDNL13

Recommended

Recommended

More Related Content

Similar to MySQL Performance Optimization #JDNL13

Similar to MySQL Performance Optimization #JDNL13 (20)

Recently uploaded

Recently uploaded (20)

MySQL Performance Optimization #JDNL13