By: Jehad Keriaki
DBA
MySQL: Indexing for Better Performance1
MySQL: Indexing for Better Performance
Jehad Keriaki 2014
What is an Index
 Data structure to improve the speed of data
retrieval from DBs.
MySQL: Indexing for Better Performance2
Jehad Keriaki 2014
Why Would We Use Indexes
 Speed, Speed, and Speed
 Constraints (Uniqueness)
 IO Optimization
 MAX, MIN
 Sorting, Grouping
MySQL: Indexing for Better Performance3
Jehad Keriaki 2014
Index Types
 Primary Key (PK), Unique, Key
 Primary Key vs Unique
 Unique can be NULL
 InnoDB is clustered based on PK
MySQL: Indexing for Better Performance4
Jehad Keriaki 2014
Types (Algorithm)
 B-Tree, R-Tree, Hash, Full text
 R-Tree: Geo-spatial
 Hash: Memory only, fast for equality, whole key is used,
no range
 Full-text:
 For MyISAM, and as of 5.6 for InnoDB too.
 SELECT * WHERE MATCH(description) AGAINST ('toshiba')
 boolean , with query expansion, stop words, short words,
50% rule
 A better choice would be to use a search server like Sphinx
MySQL: Indexing for Better Performance5
Jehad Keriaki 2014
Types (Algorithm) [cont'd]
 B-Tree:
 For comparison operations (<>=..etc)
 Range (Between)
 Like, which is a special case of range when used with %
 It is the DEFAULT in MySQL
 In B-Tree, data are stored in the leaf nodes
MySQL: Indexing for Better Performance6
Jehad Keriaki 2014
Types (Structure)
 One column
 Multi-Column [composite]
 Partial [prefix]
 Any one of them can be "Covering Index", except
'partial'
MySQL: Indexing for Better Performance7
Jehad Keriaki 2014
What Indexes to Create?
 PK is a must
 Best to be unsigned [smallest int] auto increment
 PK and InnoDB (Clustered)
 InnoDB tables are clustered based on PKs
 Each secondary index has the PK in it. example:
INDEX(name) is in fact (name, id)
 AVOID long PKs. Why?
 AVOID md5(), uuid(), etc.
MySQL: Indexing for Better Performance8
Jehad Keriaki 2014
MyISAM and InnoDB
 In MyISAM:
 Index entry tells the physical offset of the row in the
data file
 In InnoDB:
 PK index has the data. Secondary indexes store PK as
a pointer. Key on field F is (F, PK) - good for sorting
and covering index
MySQL: Indexing for Better Performance9
Jehad Keriaki 2014
Cardinality and Selectivity
 Cardinality: Number of distinct values
 Selectivity: Cardinality / total number of rows
 What values are better
 Optimize  Stats Update
MySQL: Indexing for Better Performance10
Jehad Keriaki 2014
One Column Index
 This index is on one column only
 Query example:
 SELECT * FROM employee WHERE first_name LIKE 'stephane';
 Index solution:
 ALTER TABLE employee ADD INDEX (first_name);
 Notes:
 Index the first n char of the char/varchar/text fields
 Do not use a function. i.e.
 WHERE md5(field)='1bc29b36f623ba82aaf6724fd3b16718'
MySQL: Indexing for Better Performance11
Jehad Keriaki 2014
Multi Column Index
 What is it:
 Index that involves more than one column.
 Higher cardinality field goes first, with exceptions.
 What 'left most' term is. [INDEX (A, B, C)]
 Query example:
 SELECT * FROM employee
WHERE department = 5 AND last_name LIKE 'tran';
 Index solution:
 ALTER TABLE employee ADD INDEX (last_name, department);
{WHY NOT (department, last_name)??}
MySQL: Indexing for Better Performance12
Jehad Keriaki 2014
Multi Column Index [Cont’d]
 Query example:
 SELECT * FROM employee WHERE department = 5 and
hiring_date>='2014-01-01';
 Index solution:
 ALTER TABLE employee ADD INDEX (department, hiring_date);
 Notes
 Should it be (hiring_date, department)? Is this an
exception?
 Order of columns IS important
 WILL NOT USE THE INDEX:
 SELECT * FROM employee WHERE hiring_date>='2014-01-01';
MySQL: Indexing for Better Performance13
Jehad Keriaki 2014
Partial Index
 What is it: Index on the first n char of a field.
 Query example:
 email: varchar(255);
 SELECT * FROM users WHERE email like 'richardmelo@yahoo.com';
 Index solution
 ALTER TABLE users ADD INDEX (email(12));
vs
 ALTER TABLE users ADD INDEX (email);
 Notes:
 Save space, efficient writing, same performance
 SELECT COUNT(DISTINCT(LEFT(field, 20))) FROM table
 85% threshold? 90% maybe?
MySQL: Indexing for Better Performance14
Jehad Keriaki 2014
Joins and Indexes
 Linking two or more tables to get related rows
 Query example:
 SELECT employee.first_name, employee.last_name,
FROM department
INNER JOIN employee ON departmant.id = employee.department
WHERE department.location='MTL';
 Index solution:
 ALTER TABLE department ADD INDEX (location);
 ALTER TABLE employee ADD INDEX (department);
 Notes: The join could be on a non-indexed field on
department, but an index has to exist on "employee's field"
MySQL: Indexing for Better Performance15
Jehad Keriaki 2014
Multiple Indexes OR Multi-Col Index
 What is it:
 ALTER TABLE ADD INDEX(field1), ADD INDEX(field2)
 ALTER TABLE ADD INDEX(field1, field2)
 Query example:
 WHERE field1=1 OR field2=2 [multiple indexes]
 WHERE field1=1 AND field2=2 [multi-col index]
MySQL: Indexing for Better Performance16
Jehad Keriaki 2014
Covering Index
 When the index has the required data, no need to
read data from table’s data!
 Example:
 employee(id, first_name, last_name, email, phone, hiring_date)
 SELECT email FROM employee WHERE phone='123456789';
 ALTER TABLE employee ADD INDEX(phone, email);
 min(), max() functions use the index only.
MySQL: Indexing for Better Performance17
Jehad Keriaki 2014
Covering Index - Note
 only in InnoDB:
 myindex(col1,col2)
 SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index
 SELECT * FROM table1 where col2 = 200 <<-- will NOT use index.
MySQL: Indexing for Better Performance18
Jehad Keriaki 2014
ICP (Index Condition Pushdown) [5.6]
 Lets the optimizer check in the index instead of checking in the
table's data.
 employee(id, first_name, last_name, department, phone, email, address)
 INDEX(department, email)
 SELECT * FROM employee
WHERE department=5
AND email LIKE '%@beta.example%'
[and address LIKE '%montreal%'];
 Instead of stopping at department and then use where to check for
email in the table's data, it will actually check in the index to see if
the 2nd condition is satisfied, and then if yes, it will fetch the data
from the table
MySQL: Indexing for Better Performance19
Jehad Keriaki 2014
Using Index for Sorting
 ORDER BY x  (index on x)
 WHERE x ORDER BY y  (index on x, y)
 WHERE x ORDER BY x DESC, y DESC (index on x, y)
 WHERE x ORDER BY x ASC, y DESC  (Can't use index)
MySQL: Indexing for Better Performance20
Jehad Keriaki 2014
Exceptions
 E.g. Date index with other less cardinal field.
 Status or Gender special cases
MySQL: Indexing for Better Performance21
Jehad Keriaki 2014
Overhead of indexing
 IO: Each DML operation will modify the indexes
 Disk space
 More indexes => Higher possibility of deadlock
MySQL: Indexing for Better Performance22
Jehad Keriaki 2014
ABOUT EXPLAIN
 It lets us know the plan of query execution
 What index would be used, if any
 Rows to be scanned
MySQL: Indexing for Better Performance23
MySQL: Indexing for Better Performance24
QUESTIONS & EXAMPLES
MySQL: Indexing for Better Performance25
mysql> explain select * from md_table where id=50000G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: md_table
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
1 row in set (0.00 sec)
mysql> explain select id from md_table where id=50000G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: md_table
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra: Using index
1 row in set (0.00 sec)
MySQL: Indexing for Better Performance26
mysql> explain select id from md_table where hashed_id='1017bfd4673955ffee4641ad3d481b1c'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: md_table
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 100000
Extra: Using where
1 row in set (0.00 sec)
mysql> alter table md_table add index (hashed_id(15));
Query OK, 100000 rows affected (0.77 sec)
Records: 100000 Duplicates: 0 Warnings: 0
mysql> explain select id from md_table where hashed_id='1017bfd4673955ffee4641ad3d481b1c'G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: md_table
type: ref
possible_keys: hashed_id
key: hashed_id
key_len: 46
ref: const
rows: 1
Extra: Using where
1 row in set (0.01 sec)

MySQL: Indexing for Better Performance

  • 1.
    By: Jehad Keriaki DBA MySQL:Indexing for Better Performance1 MySQL: Indexing for Better Performance
  • 2.
    Jehad Keriaki 2014 Whatis an Index  Data structure to improve the speed of data retrieval from DBs. MySQL: Indexing for Better Performance2
  • 3.
    Jehad Keriaki 2014 WhyWould We Use Indexes  Speed, Speed, and Speed  Constraints (Uniqueness)  IO Optimization  MAX, MIN  Sorting, Grouping MySQL: Indexing for Better Performance3
  • 4.
    Jehad Keriaki 2014 IndexTypes  Primary Key (PK), Unique, Key  Primary Key vs Unique  Unique can be NULL  InnoDB is clustered based on PK MySQL: Indexing for Better Performance4
  • 5.
    Jehad Keriaki 2014 Types(Algorithm)  B-Tree, R-Tree, Hash, Full text  R-Tree: Geo-spatial  Hash: Memory only, fast for equality, whole key is used, no range  Full-text:  For MyISAM, and as of 5.6 for InnoDB too.  SELECT * WHERE MATCH(description) AGAINST ('toshiba')  boolean , with query expansion, stop words, short words, 50% rule  A better choice would be to use a search server like Sphinx MySQL: Indexing for Better Performance5
  • 6.
    Jehad Keriaki 2014 Types(Algorithm) [cont'd]  B-Tree:  For comparison operations (<>=..etc)  Range (Between)  Like, which is a special case of range when used with %  It is the DEFAULT in MySQL  In B-Tree, data are stored in the leaf nodes MySQL: Indexing for Better Performance6
  • 7.
    Jehad Keriaki 2014 Types(Structure)  One column  Multi-Column [composite]  Partial [prefix]  Any one of them can be "Covering Index", except 'partial' MySQL: Indexing for Better Performance7
  • 8.
    Jehad Keriaki 2014 WhatIndexes to Create?  PK is a must  Best to be unsigned [smallest int] auto increment  PK and InnoDB (Clustered)  InnoDB tables are clustered based on PKs  Each secondary index has the PK in it. example: INDEX(name) is in fact (name, id)  AVOID long PKs. Why?  AVOID md5(), uuid(), etc. MySQL: Indexing for Better Performance8
  • 9.
    Jehad Keriaki 2014 MyISAMand InnoDB  In MyISAM:  Index entry tells the physical offset of the row in the data file  In InnoDB:  PK index has the data. Secondary indexes store PK as a pointer. Key on field F is (F, PK) - good for sorting and covering index MySQL: Indexing for Better Performance9
  • 10.
    Jehad Keriaki 2014 Cardinalityand Selectivity  Cardinality: Number of distinct values  Selectivity: Cardinality / total number of rows  What values are better  Optimize  Stats Update MySQL: Indexing for Better Performance10
  • 11.
    Jehad Keriaki 2014 OneColumn Index  This index is on one column only  Query example:  SELECT * FROM employee WHERE first_name LIKE 'stephane';  Index solution:  ALTER TABLE employee ADD INDEX (first_name);  Notes:  Index the first n char of the char/varchar/text fields  Do not use a function. i.e.  WHERE md5(field)='1bc29b36f623ba82aaf6724fd3b16718' MySQL: Indexing for Better Performance11
  • 12.
    Jehad Keriaki 2014 MultiColumn Index  What is it:  Index that involves more than one column.  Higher cardinality field goes first, with exceptions.  What 'left most' term is. [INDEX (A, B, C)]  Query example:  SELECT * FROM employee WHERE department = 5 AND last_name LIKE 'tran';  Index solution:  ALTER TABLE employee ADD INDEX (last_name, department); {WHY NOT (department, last_name)??} MySQL: Indexing for Better Performance12
  • 13.
    Jehad Keriaki 2014 MultiColumn Index [Cont’d]  Query example:  SELECT * FROM employee WHERE department = 5 and hiring_date>='2014-01-01';  Index solution:  ALTER TABLE employee ADD INDEX (department, hiring_date);  Notes  Should it be (hiring_date, department)? Is this an exception?  Order of columns IS important  WILL NOT USE THE INDEX:  SELECT * FROM employee WHERE hiring_date>='2014-01-01'; MySQL: Indexing for Better Performance13
  • 14.
    Jehad Keriaki 2014 PartialIndex  What is it: Index on the first n char of a field.  Query example:  email: varchar(255);  SELECT * FROM users WHERE email like 'richardmelo@yahoo.com';  Index solution  ALTER TABLE users ADD INDEX (email(12)); vs  ALTER TABLE users ADD INDEX (email);  Notes:  Save space, efficient writing, same performance  SELECT COUNT(DISTINCT(LEFT(field, 20))) FROM table  85% threshold? 90% maybe? MySQL: Indexing for Better Performance14
  • 15.
    Jehad Keriaki 2014 Joinsand Indexes  Linking two or more tables to get related rows  Query example:  SELECT employee.first_name, employee.last_name, FROM department INNER JOIN employee ON departmant.id = employee.department WHERE department.location='MTL';  Index solution:  ALTER TABLE department ADD INDEX (location);  ALTER TABLE employee ADD INDEX (department);  Notes: The join could be on a non-indexed field on department, but an index has to exist on "employee's field" MySQL: Indexing for Better Performance15
  • 16.
    Jehad Keriaki 2014 MultipleIndexes OR Multi-Col Index  What is it:  ALTER TABLE ADD INDEX(field1), ADD INDEX(field2)  ALTER TABLE ADD INDEX(field1, field2)  Query example:  WHERE field1=1 OR field2=2 [multiple indexes]  WHERE field1=1 AND field2=2 [multi-col index] MySQL: Indexing for Better Performance16
  • 17.
    Jehad Keriaki 2014 CoveringIndex  When the index has the required data, no need to read data from table’s data!  Example:  employee(id, first_name, last_name, email, phone, hiring_date)  SELECT email FROM employee WHERE phone='123456789';  ALTER TABLE employee ADD INDEX(phone, email);  min(), max() functions use the index only. MySQL: Indexing for Better Performance17
  • 18.
    Jehad Keriaki 2014 CoveringIndex - Note  only in InnoDB:  myindex(col1,col2)  SELECT col1 FROM table1 WHERE col2 = 200 <<-- will use index  SELECT * FROM table1 where col2 = 200 <<-- will NOT use index. MySQL: Indexing for Better Performance18
  • 19.
    Jehad Keriaki 2014 ICP(Index Condition Pushdown) [5.6]  Lets the optimizer check in the index instead of checking in the table's data.  employee(id, first_name, last_name, department, phone, email, address)  INDEX(department, email)  SELECT * FROM employee WHERE department=5 AND email LIKE '%@beta.example%' [and address LIKE '%montreal%'];  Instead of stopping at department and then use where to check for email in the table's data, it will actually check in the index to see if the 2nd condition is satisfied, and then if yes, it will fetch the data from the table MySQL: Indexing for Better Performance19
  • 20.
    Jehad Keriaki 2014 UsingIndex for Sorting  ORDER BY x  (index on x)  WHERE x ORDER BY y  (index on x, y)  WHERE x ORDER BY x DESC, y DESC (index on x, y)  WHERE x ORDER BY x ASC, y DESC  (Can't use index) MySQL: Indexing for Better Performance20
  • 21.
    Jehad Keriaki 2014 Exceptions E.g. Date index with other less cardinal field.  Status or Gender special cases MySQL: Indexing for Better Performance21
  • 22.
    Jehad Keriaki 2014 Overheadof indexing  IO: Each DML operation will modify the indexes  Disk space  More indexes => Higher possibility of deadlock MySQL: Indexing for Better Performance22
  • 23.
    Jehad Keriaki 2014 ABOUTEXPLAIN  It lets us know the plan of query execution  What index would be used, if any  Rows to be scanned MySQL: Indexing for Better Performance23
  • 24.
    MySQL: Indexing forBetter Performance24 QUESTIONS & EXAMPLES
  • 25.
    MySQL: Indexing forBetter Performance25 mysql> explain select * from md_table where id=50000G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: md_table type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: 1 row in set (0.00 sec) mysql> explain select id from md_table where id=50000G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: md_table type: const possible_keys: PRIMARY key: PRIMARY key_len: 4 ref: const rows: 1 Extra: Using index 1 row in set (0.00 sec)
  • 26.
    MySQL: Indexing forBetter Performance26 mysql> explain select id from md_table where hashed_id='1017bfd4673955ffee4641ad3d481b1c'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: md_table type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 100000 Extra: Using where 1 row in set (0.00 sec) mysql> alter table md_table add index (hashed_id(15)); Query OK, 100000 rows affected (0.77 sec) Records: 100000 Duplicates: 0 Warnings: 0 mysql> explain select id from md_table where hashed_id='1017bfd4673955ffee4641ad3d481b1c'G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: md_table type: ref possible_keys: hashed_id key: hashed_id key_len: 46 ref: const rows: 1 Extra: Using where 1 row in set (0.01 sec)