"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - CTO @ CAPSiDE

This is not a DB
optimization talk
It’s a talk about how doing things in a specific way leads to getting way
better results by default

“Preoptimization”
 Don’t denormalize by default
 Maintenance is King (by default)
 1 Thread Performance != Parallel
Performance
MyISAM vs InnoDB
Fast != Scalable

I’ll just tune MySQL Parameters…
 Will get you out of SOME trouble
 But not a good “default” solution, specially if the
base is flawed
 Please do tune MySQLs defaults
 Not the theme for today 

Start with your Schema
 Your schema is probably the root cause of your
“My DB doesn’t scale” problems
 The solution is not “have a loose/no schema”
 How to fake a DB Design (Curtis Ovid Poe)
 https://www.youtube.com/watch?v=y1tcbhWLiUM

Data types: how (not) to bloat your DB
 Selecting data types with a bit of care is very
productive
 It makes more data fit in less space
 Optimizes use of InnoDB Buffer Pool, MyISAM Key
Buffer, Join Buffer, Sort Buffer, Smaller indexes

Your new best friend
 http://dev.mysql.com/doc/refman/5.6/en/storage-requirements.html
 http://dev.mysql.com/doc/refman/5.5/en/storage-requirements.html
 …

DataTypes: NULL vs NOT NULL
 I don’t care about the NULL debate
 Saves Space
 Gives the DB hints
 Use NULLs wisely

DataTypes: Integers
INT(1) == INT(10) == 4 bytes
That’s it!

DataTypes: Integer family
TINYINT: 1 byte
SMALLINT: 2 bytes
MEDIUMINT: 3 bytes
INT
BIGINT:8 bytes
UNSIGNED: “double”
the range

INTS and surrogate keys
 It’s good to have a surrogate key
 Just unique index the unique column
 Why?
 InnoDB stores the PK on the leafs of all indexes
 Indexes bloat if they’re “big keys”

Id columns for your tables
 INT by default for “things that can get big”
 Smaller INTs for smaller sets
 Do you really need a BIGINT?
 The magnitude comparison trick:
 Epochs in Unix have 4 bytes
 The epoch has been counting since 1970. Will run out in 2038
 An INT can identify ONE THING HAPPENING EVERY SECOND for 68 YEARS!
 Do you STILL need a BIGINT?

DataTypes: Texts
 Texts are strings of bytes with a charset and a collation
BINARY and VARBINARY
CHAR and VARCHAR

DataTypes: Texts
BINARY(10) == 10 bytes
CHAR(10) ==? 10 bytes

DataTypes: Texts
CHAR(10) latin1: 10 bytes
CHAR(19) utf-8: 30 bytes

DataTypes: Texts
VARBINARY(10) == 1 to 11 bytes
VARCHAR(10) ==? 1 to 11 bytes

DataTypes: Texts
VARBINARY(10) == 1 to 11 bytes
VARCHAR(10) ==? 1 to 11 bytes
VARCHAR(10) latin 1: 1 to 11 bytes
VARCHAR(10) utf-8: 1 to 31 bytes

So I’ll just go for VARCHAR(255) on all
text columns
 “After all… I’ll just consume the number of bytes + 1”

text columns
 “After all… I’ll just consume the number of bytes + 1”
 When we need a temporary table
 https://dev.mysql.com/doc/refman/5.6/en/memory-storage-engine.html
 “MEMORY tables use a fixed-length row-storage format. Variable-length types such
as VARCHAR are stored using a fixed length”

text columns
Congrats! All your VARCHAR(255) are now
CHAR(255) in memory
That’s 255 bytes latin-1. Or 765 bytes utf-8
O_o
Note: do the maths on VARCHAR(65535)

DataTypes: BLOBS
 TEXT == BLOB == problems
 TEXT = BLOB + charset + collation
 TEXT fields are not unlimited!
 TINYTEXT and TINYBLOB (up to 256 bytes == x chars)
 TEXT / BLOB (up to 65KB == x chars)
 MEDIUMBLOB / MEDIUMTEXT (up to 16MB == x chars)
 LONGTEXT / LONGBLOB (up to 2GB == x chars)

SELECT * IS YOUR FRIEND?
 IF a SELECT contains a BLOB/TEXT column
Temporary tables go DIRECTLY to DISK

SELECT * IS YOUR FRIEND?
 IF a SELECT contains a BLOB/TEXT column
Temporary tables go DIRECTLY to DISK
Note: “Big” columns belong on a filesystem or an object store

DataTypes: Small Sets
 ENUM
 One value out of the possibilities (‘big’, ‘small’)
 SET
 A set of possible values (‘pool’,’terrace’,’fence’)
 SETS are NOT good for finding stuff
 FIND_IN_SET is a function. No indexes 
 Default to a separate table + relation

DataTypes: Dates
 YEAR = 1 byte (range from 1901 to 2155)
 DATE = 3 bytes
 TIME = 3 bytes
 DATETIME = 8 bytes
 TIMESTAMP = 4 bytes
Timestamp is an epoch. Ideal for “things that
happen now” (sold time, renewal date, etc)

Masked Types
An IP Address
234.34.123.92
CHAR(15)? VARCHAR(15)?

INSERT INTO table (col) VALUES (INET_ATON(’10.10.10.10’));
SELECT INET_NTOA(col) FROM table;

Masked Types
MD5("The quick brown fox jumps over the lazy cat")
 71bd588d5ad9b6abe87b831b45f8fa95
 CHAR(32)

Masked Types
 UUIDs
 HASH functions
 Network masks

Index the two sides of relations
PK

Index this guy too!

Index this guy too!
And this guy!

PK is (contract_id, customer_id)
(Implied uniqueness)
Index “both ways”:
(customer_id, contract_id)
InnoDB optimization: Don’t index the full (customer_id, contract_id). The
index ALREADY HAS customer_id in it’s leafs. So just index (customer_id)
N-M relation: Junction tables

Don’t operate on fields
 Because they can’t use indexes
 WHERE column = ‘x’
WHERE column > 2000
WHERE column LIKE ‘prefix%’
 WHERE column + 2000 > 2013
WHERE FIND_IN_SET(column)
WHERE CONCAT(f1,f2) = “xxxx.com”
WHERE YEAR(date) = 2015
WHERE column LIKE ‘%.com’

Polish your maths: Algebra
Doesn’t use index
WHERE column + 2000 > 2013
WHERE FIND_IN_SET(‘pool’,column)
WHERE CONCAT(f1,’.’,f2) =
“xxxx.com”
Uses index
WHERE column > 13
?
WHERE f1 = ‘xxxx’ AND f2 = ‘com’
WHERE date BETWEEN ’01-01-2015’
and ’31-12-2015’
?

The old switcheroo…
Doesn’t use index
“xxxx.com”
Uses index
WHERE has_pool = 1
WHERE column_rev LIKE ‘moc.%’

The old switcheroo…
Doesn’t use index
“xxxx.com”
Uses index
WHERE has_pool = 1
UPDATE t SET has_pool =
FIND_IN_SET(‘pool’,column);
WHERE column_rev LIKE ‘moc.%’
UPDATE t SET
column_rev=REVERSE(column)

Wrap up
Schema
Data Types
How to select them
Masked Types
Indexes
Relations
Using them

Wrap up
 Smaller is better
VS

"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - CTO @ CAPSiDE

More Related Content

Similar to "MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - CTO @ CAPSiDE

More from CAPSiDE

Recently uploaded

"MySQL Boosting - DB Best Practices & Optimization" by José Luis Martínez - CTO @ CAPSiDE