SlideShare a Scribd company logo
1 of 43
Download to read offline
MySQL Indexing Crash
Course
Aaron Silverman
Why Use Indexes
How does MySQL store stuff?
● InnoDB Stores Data on Disk
● Multiple Tables are grouped together (by default) to reduce penalty of filesystem access
● Row data stored together unless too large (768B+), then stored “off-table”
● Order of the rows stored on disk is how the data is “clustered”
● Data is indexed using sorted B+ trees that (ideally) fit into memory with data pointers in
leaf nodes
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
How do indexes work?
B-Tree Indexes work similarly to looking something up in a physical dictionary, iteratively
comparing if what you are searching for comes before or after certain subsets of data
Dictionary Example:
● Flip to page near the middle
● Word earlier? Try a page closer to front
● Word later? Try a page closer to end
● Repeat until you find page with word
Example: Looking up “Index” in a dictionary
Melon
Memo
Mercy
Grape
Graph
Grate
Jaded
Jaunt
Java
Index
Indoor
Indeed70 35 47 44
B-Tree is Great for Single and Range Based Lookups
GOOD BAD
WHERE id = 5
WHERE id > 7
WHERE id < 5
WHERE id BETWEEN 6 AND 8
WHERE id <> 8
Another way to think about it: If you had a physical copy of a dictionary it would be easy for you to find
words that start with “ch”, but would be very hard (involve reading every page) to find words that have a
“ch” in the middle or end.
O(lg(n)) O(n)
log(n) performance makes a huge difference with
large datasets!
n log(n)
10 1
100 2
1000 3
10000 4
100000 5
1000000 6
Example Queries
WHERE id = 5
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
WHERE id > 7
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
WHERE id < 5
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
WHERE id BETWEEN 6 AND 8
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
Querying with a negation necessitates table scan
BAD
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
WHERE id <> 8
Querying based on a value not indexed results in a
table scan
BAD
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
WHERE unindexed_column = ‘uh oh’
WHERE myFunction(id) > 10
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
BAD
Querying based on a function output value is also not
indexed and requires full index scan
DEMO
Uses dummy database from https://github.com/datacharmer/test_db
You can try these out yourself!
SQL commands in green
Responses and/or times in gray (some parts omitted)
Example:
use employees;
Database changed
DEMO
Two relevant tables we’ll be using:
CREATE TABLE `employees` (
`emp_no` int(11) NOT NULL PRIMARY KEY,
`birth_date` date NOT NULL,
`first_name` varchar(14) NOT NULL,
`last_name` varchar(16) NOT NULL,
`gender` enum('M','F') NOT NULL,
`hire_date` date NOT NULL
);
CREATE TABLE `salaries` (
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY (`emp_no`,`from_date`),
CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`)
);
Employees - 300,024 rows
Salaries - 2,844,047 rows
DEMO
emp_no birth_date first_name last_name gender hire_date
10001 9/2/53 Georgi Facello M 6/26/86
10002 6/2/64 Bezalel Simmel F 11/21/85
10003 12/3/59 Parto Bamford M 8/28/86
emp_no salary from_date to_date
10001 85097 6/22/01 6/22/02
10001 88958 6/22/02 1/1/99
10002 65828 8/3/96 8/3/97
Employees
Salaries
DEMO
SELECT * FROM employees WHERE emp_no = 20000;
1 row in set (0.01 sec)
SELECT * FROM employees WHERE last_name = 'Matzke' AND first_name = 'Jenwei';
1 row in set (0.10 sec)
SELECT min(emp_no), max(emp_no) FROM salaries;
1 row in set (0.00 sec)
SELECT min(salary), max(salary) FROM salaries;
1 row in set (0.85 sec)
SELECT count(*) FROM salaries WHERE emp_no BETWEEN 20000 AND 25000;
1 row in set (0.03 sec)
SELECT count(*) FROM salaries WHERE salary BETWEEN 70000 AND 75000;
1 row in set (0.59 sec)
Why not index every column?
VERY BAD
You will be creating a spread out copy of your table, wasting space and memory.
Each write (UPDATE, INSERT, DELETE) will require every index to be updated,
Which makes writing a lot of work...
CREATE INDEX (id)
CREATE INDEX (make)
CREATE INDEX (model)
CREATE INDEX (type)Ω
2
1
Cessna
152
N
1
3
Cessna
172
S
2
Piper
Archer
II
4
Piper
Dakota
Null
2 3 4
Cessna
Cessna Piper
172
152 172 Archer Dakota
II
NULL II N S
Note: in reality secondary index leaf nodes are pointers to primary key not the records
Compound Indexes
● Allow sorting based on multiple columns
● Store the data in the index
3 C
1 D
DATA
2 A 3 C
4 E
3 F 4 B 4 E 5 A 5 B 6 C
DATA DATA DATA DATA DATA DATA DATA DATA
Subsets of compound Indexes can be used only from
left to right
GOOD BAD
WHERE num = 4
WHERE num = 5 AND letter < ‘C’ WHERE letter = ‘C’
CREATE INDEX (num, letter)
WHERE num = 4
3 C
1 D
DATA
2 A 3 C
4 E
3 F 4 B 4 E 5 A 5 B 6 C
DATA DATA DATA DATA DATA DATA DATA DATA
WHERE num = 5 AND letter < ‘C’
3 C
1 D
DATA
2 A 3 C
4 E
3 F 4 B 4 E 5 A 5 B 6 C
DATA DATA DATA DATA DATA DATA DATA DATA
WHERE letter = ‘c’
3 C
1 D
DATA
2 A 3 C
4 E
3 F 4 B 4 E 5 A 5 B 6 C
DATA DATA DATA DATA DATA DATA DATA DATA
BAD
If the data you need is in the index, then you don’t
even have to fetch the rows! Its a “Covering Index”
VERY GOOD
SELECT letter WHERE num = 5
3 C
1 D
DATA
2 A 3 C
4 E
3 F 4 B 4 E 5 A 5 B 6 C
DATA DATA DATA DATA DATA DATA DATA DATA
Why not use a compound index to cover your whole
table?
2 Piper Archer II
1
Cessna
152
N
VERY BAD
You will just be creating a more complicated copy of your table. It probably won’t
fit into memory so MySQL will still have to go to disk.
1 Cessna 152 N 2 Piper Archer II 3 Cessna 172 S 4 Piper Dakota NULL
2
Piper
Archer
II
3
Cessna
172
S
4
Piper
Dakota
Null
CREATE INDEX (id, make, model, type)
DEMO
SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date <
'1990-11-01';
1 row in set (0.82 sec)
CREATE INDEX ix_salaries_from_date ON salaries(from_date);
Query OK, 0 rows affected (5.35 sec)
SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date <
'1990-11-01';
1 row in set (0.04 sec)
ALTER TABLE salaries DROP INDEX ix_salaries_from_date;
Query OK, 0 rows affected (0.02 sec)
DEMO
CREATE INDEX ix_salaries_from_date_to_date ON salaries(from_date, to_date DESC);
Query OK, 0 rows affected (5.65 sec)
SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date <
'1990-11-01';
1 row in set (0.01 sec)
SELECT max(to_date) FROM salaries WHERE year(from_date) = 1990 AND month(from_date) =
10;
1 row in set (0.54 sec)
Clustering places sequential data sequential on the
disk, resulting in faster data retrieval as the disk has
less places to go to get the data
3 6 9
1 2 3 4 5 6 7 8 9 10 11
DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
DEMO
We’ll create our own salaries table clustered on a numeric arbitrary id:
CREATE TABLE salaries_unclustered (
`id` int(11) NOT NULL AUTO_INCREMENT,
`emp_no` int(11) NOT NULL,
`salary` int(11) NOT NULL,
`from_date` date NOT NULL,
`to_date` date NOT NULL,
PRIMARY KEY pk_salaries_unclustered (id),
UNIQUE KEY uq_salaries_unclustered_emp_from_date(`emp_no`,`from_date`),
CONSTRAINT `salaries_unclustered_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`)
);
INSERT INTO salaries_unclustered (emp_no, salary, from_date, to_date)
SELECT emp_no, salary, from_date, to_date
FROM salaries
ORDER BY rand();
DEMO
id emp_no salary from_date to_date
1 83580 6/3/46 1/7/00 1/6/01
2 478007 3/17/30 9/21/91 9/20/92
3 412998 11/27/15 9/1/99 8/31/00
… ... ... ... ...
396770 83580 4/30/10 1/8/93 1/8/94
Salaries Unclustered
DEMO
SELECT emp_no, salary, from_date, to_date FROM salaries WHERE emp_no BETWEEN 23456 AND 24467;
9601 rows in set (0.02 sec)
SELECT emp_no, salary, from_date, to_date FROM salaries_unclustered WHERE emp_no BETWEEN
23456 AND 24467;
9601 rows in set (0.05 sec)
UPDATE salaries SET salary = salary*2 WHERE emp_no BETWEEN 23456 AND 24467;
Query OK, 9601 rows affected (0.05 sec)
UPDATE salaries_unclustered SET salary = salary*2 WHERE emp_no BETWEEN 23456 AND 24467;
Query OK, 9601 rows affected (0.12 sec)
UPDATE salaries SET salary = salary/2 WHERE emp_no BETWEEN 23456 AND 24467;
Query OK, 9601 rows affected (0.06 sec)
UPDATE salaries_unclustered SET salary = salary/2 WHERE emp_no BETWEEN 23456 AND 24467;
Query OK, 9601 rows affected (0.10 sec)
Additional Index Info
How are Indexes Used?
This could be a whole presentation...
Rules of thumb:
● Indexes help with SELECT, GROUP, and ORDER in that order
● Indexes help with JOIN on the column being joined with
● Indexes only help when cardinality (uniqueness) is high
● Compound indexes should first have columns used by WHERE and then the
columns used for ORDER
● Usually better to extend an existing index than create a new one
● Indexes are better when on smaller columns; ideally whole index fits in memory
○ Integers and dates (stored as integers under the hood) good
○ Floats and smaller character columns bad
○ Large character columns very bad
DEMO
SELECT count(distinct left(last_name, 1)) / count(*) AS s1,
count(distinct left(last_name, 2)) / count(*) AS s2,
count(distinct left(last_name, 3)) / count(*) AS s3,
count(distinct left(last_name, 4)) / count(*) AS s4,
count(distinct left(last_name, 5)) / count(*) AS s5,
count(distinct left(last_name, 6)) / count(*) AS s6,
count(distinct left(last_name, 7)) / count(*) AS s7,
count(distinct left(last_name, 8)) / count(*) AS s8,
count(distinct left(last_name, 9)) / count(*) AS s9,
count(distinct left(last_name, 10)) / count(*) AS s10,
count(distinct left(last_name, 11)) / count(*) AS s11,
count(distinct left(last_name, 12)) / count(*) AS s12,
count(distinct left(last_name, 13)) / count(*) AS s13,
count(distinct left(last_name, 14)) / count(*) AS s14,
count(distinct left(last_name, 15)) / count(*) AS s15,
count(distinct left(last_name, 16)) / count(*) AS s16
FROM employees;
Characters Selectivity Characters Selectivity
1 0.0001 9 0.0055
2 0.0007 10 0.0055
3 0.0028 11 0.0055
4 0.0045 12 0.0055
5 0.0051 13 0.0055
6 0.0054 14 0.0055
7 0.0054 15 0.0055
8 0.0054 16 0.0055
Note: This dummy data had lots of repeat last names.
Typical text datasets (names, email) see selectivity
approach 1.0
92% efficacy of full index achieved with just 5 characters
100% efficacy of full index achieved with 9 of the
characters
DEMO
CREATE INDEX ix_employees_last_name ON employees (last_name);
Query OK, 0 rows affected (0.72 sec)
ANALYZE TABLE employees;
1 row in set (0.01 sec)
SELECT stat_value FROM mysql.innodb_index_stats WHERE index_name = 'ix_employees_last_name'
AND stat_name = 'size';
1 row in set (0.01 sec) #result: 417
CREATE INDEX ix_employees_last_name_first5 ON employees (last_name(5));
Query OK, 0 rows affected (1.26 sec)
ANALYZE TABLE employees;
1 row in set (0.01 sec)
SELECT stat_value FROM mysql.innodb_index_stats WHERE index_name =
'ix_employees_last_name_first5' AND stat_name = 'size';
1 row in set (0.01 sec) #result: 353
Implicit Indexes
● Primary key is automatically indexed
● Left parts of compound indexes can be used
● All secondary indexes have id appended
● Foreign key is automatically indexed in parent and child tables (if no index present)
● Unique keys create index
CREATE TABLE …
CREATE INDEX (make, model)
FOREIGN KEY(model) REFERENCES models (id)
UNIQUE KEY (make, model, name)
INDEX (id)
INDEX (make, model, id)
INDEX (make, id)
INDEX (model, id)
INDEX (make, model, name, id)
INDEX (make, model, id)
INDEX (make, id)
Query Optimization
This could also be a whole presentation...
Rules of thumb:
● Create indexes based on how you know you will be querying the table
● Don’t add indexes “just in case”
● If your new query needs a new index, think if you can rewrite it so it does not, or if
you can extend an existing index
● Need speed? Select only columns in covering index
● Long (> ~768) varchars, text, and blobs stored off table only select when needed
● Avoid nested subqueries
● Avoid functions in WHERE, GROUP BY, HAVING, and ORDER BY clauses
● Temporary tables (used by subqueries, and UNION) will force disk use if large
● Stuck? Run ANALYZE TABLE and then use EXPLAIN and dive in
DEMO
EXPLAIN SELECT * FROM employees WHERE last_name like 'a%';
#Index used
EXPLAIN SELECT * FROM employees WHERE last_name like '%a';
#No indexes available
EXPLAIN SELECT *
FROM employees e
JOIN salaries s
ON e.emp_no = s.emp_no
WHERE last_name LIKE 'a%';
#Indexed used for each table
EXPLAIN SELECT *
FROM employees e
JOIN salaries s
ON s.from_date < date_add(e.hire_date, INTERVAL 3 month)
WHERE last_name LIKE 'a%';
#Index only used for first table
Appendix
MySQL Indexing Crash Course
MySQL Indexing Crash Course

More Related Content

What's hot

Introduction To MySQL Lecture 1
Introduction To MySQL Lecture 1Introduction To MySQL Lecture 1
Introduction To MySQL Lecture 1Ajay Khatri
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with PipesRsquared Academy
 
Simple Nested Sets and some other DB optimizations
Simple Nested Sets and some other DB optimizationsSimple Nested Sets and some other DB optimizations
Simple Nested Sets and some other DB optimizationsEli Aschkenasy
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowAlex Zaballa
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)Hemant Kumar Singh
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeSergey Petrunya
 
[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06AnusAhmad
 
Sorting Alpha Numeric Data in MySQL
Sorting Alpha Numeric Data in MySQLSorting Alpha Numeric Data in MySQL
Sorting Alpha Numeric Data in MySQLAbdul Rahman Sherzad
 
DBMS information in detail || Dbms (lab) ppt
DBMS information in detail || Dbms (lab) pptDBMS information in detail || Dbms (lab) ppt
DBMS information in detail || Dbms (lab) pptgourav kottawar
 
DBMS lab manual
DBMS lab manualDBMS lab manual
DBMS lab manualmaha tce
 

What's hot (14)

Introduction To MySQL Lecture 1
Introduction To MySQL Lecture 1Introduction To MySQL Lecture 1
Introduction To MySQL Lecture 1
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Writing Readable Code with Pipes
Writing Readable Code with PipesWriting Readable Code with Pipes
Writing Readable Code with Pipes
 
R for you
R for youR for you
R for you
 
Do You Have the Time
Do You Have the TimeDo You Have the Time
Do You Have the Time
 
Simple Nested Sets and some other DB optimizations
Simple Nested Sets and some other DB optimizationsSimple Nested Sets and some other DB optimizations
Simple Nested Sets and some other DB optimizations
 
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should KnowOTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
OTN TOUR 2016 - DBA Commands and Concepts That Every Developer Should Know
 
zekeLabs sql-slides
zekeLabs sql-slideszekeLabs sql-slides
zekeLabs sql-slides
 
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)MySQL Indexing : Improving Query Performance Using Index (Covering Index)
MySQL Indexing : Improving Query Performance Using Index (Covering Index)
 
MariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit holeMariaDB Optimizer - further down the rabbit hole
MariaDB Optimizer - further down the rabbit hole
 
[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06[Www.pkbulk.blogspot.com]dbms06
[Www.pkbulk.blogspot.com]dbms06
 
Sorting Alpha Numeric Data in MySQL
Sorting Alpha Numeric Data in MySQLSorting Alpha Numeric Data in MySQL
Sorting Alpha Numeric Data in MySQL
 
DBMS information in detail || Dbms (lab) ppt
DBMS information in detail || Dbms (lab) pptDBMS information in detail || Dbms (lab) ppt
DBMS information in detail || Dbms (lab) ppt
 
DBMS lab manual
DBMS lab manualDBMS lab manual
DBMS lab manual
 

Similar to MySQL Indexing Crash Course

Migration from mysql to elasticsearch
Migration from mysql to elasticsearchMigration from mysql to elasticsearch
Migration from mysql to elasticsearchRyosuke Nakamura
 
Ppt INFORMATIVE PRACTICES for class 11th chapter 14
Ppt INFORMATIVE PRACTICES for class 11th chapter 14Ppt INFORMATIVE PRACTICES for class 11th chapter 14
Ppt INFORMATIVE PRACTICES for class 11th chapter 14prashant0000
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014Dave Stokes
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Dave Stokes
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data ModelingBen Knear
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQLMahir Haque
 
98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.pptHastavaramDineshKuma
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQLSimon Elliston Ball
 
Database development coding standards
Database development coding standardsDatabase development coding standards
Database development coding standardsAlessandro Baratella
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsDave Stokes
 
Complex Queries using MYSQL00123211.pptx
Complex Queries using MYSQL00123211.pptxComplex Queries using MYSQL00123211.pptx
Complex Queries using MYSQL00123211.pptxmetriohanzel
 
Database optimization
Database optimizationDatabase optimization
Database optimizationEsraaAlattar1
 
New fordevelopersinsql server2008
New fordevelopersinsql server2008New fordevelopersinsql server2008
New fordevelopersinsql server2008Aaron Shilo
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterizationRiteshkiit
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6Mahesh Vallampati
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceHeroku
 

Similar to MySQL Indexing Crash Course (20)

Migration from mysql to elasticsearch
Migration from mysql to elasticsearchMigration from mysql to elasticsearch
Migration from mysql to elasticsearch
 
Ppt INFORMATIVE PRACTICES for class 11th chapter 14
Ppt INFORMATIVE PRACTICES for class 11th chapter 14Ppt INFORMATIVE PRACTICES for class 11th chapter 14
Ppt INFORMATIVE PRACTICES for class 11th chapter 14
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
 
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
Open Source 1010 and Quest InSync presentations March 30th, 2021 on MySQL Ind...
 
Cassandra Data Modeling
Cassandra Data ModelingCassandra Data Modeling
Cassandra Data Modeling
 
Chapter08
Chapter08Chapter08
Chapter08
 
Introduction to SQL
Introduction to SQLIntroduction to SQL
Introduction to SQL
 
98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt98765432345671223Intro-to-PostgreSQL.ppt
98765432345671223Intro-to-PostgreSQL.ppt
 
Sql 2006
Sql 2006Sql 2006
Sql 2006
 
mis4200notes4_2.ppt
mis4200notes4_2.pptmis4200notes4_2.ppt
mis4200notes4_2.ppt
 
MySQL for beginners
MySQL for beginnersMySQL for beginners
MySQL for beginners
 
When to NoSQL and when to know SQL
When to NoSQL and when to know SQLWhen to NoSQL and when to know SQL
When to NoSQL and when to know SQL
 
Database development coding standards
Database development coding standardsDatabase development coding standards
Database development coding standards
 
Advanced MySQL Query Optimizations
Advanced MySQL Query OptimizationsAdvanced MySQL Query Optimizations
Advanced MySQL Query Optimizations
 
Complex Queries using MYSQL00123211.pptx
Complex Queries using MYSQL00123211.pptxComplex Queries using MYSQL00123211.pptx
Complex Queries using MYSQL00123211.pptx
 
Database optimization
Database optimizationDatabase optimization
Database optimization
 
New fordevelopersinsql server2008
New fordevelopersinsql server2008New fordevelopersinsql server2008
New fordevelopersinsql server2008
 
Query parameterization
Query parameterizationQuery parameterization
Query parameterization
 
SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6SQL Optimization With Trace Data And Dbms Xplan V6
SQL Optimization With Trace Data And Dbms Xplan V6
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 

Recently uploaded

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 

Recently uploaded (20)

Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 

MySQL Indexing Crash Course

  • 3. How does MySQL store stuff? ● InnoDB Stores Data on Disk ● Multiple Tables are grouped together (by default) to reduce penalty of filesystem access ● Row data stored together unless too large (768B+), then stored “off-table” ● Order of the rows stored on disk is how the data is “clustered” ● Data is indexed using sorted B+ trees that (ideally) fit into memory with data pointers in leaf nodes 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 4. How do indexes work? B-Tree Indexes work similarly to looking something up in a physical dictionary, iteratively comparing if what you are searching for comes before or after certain subsets of data Dictionary Example: ● Flip to page near the middle ● Word earlier? Try a page closer to front ● Word later? Try a page closer to end ● Repeat until you find page with word Example: Looking up “Index” in a dictionary Melon Memo Mercy Grape Graph Grate Jaded Jaunt Java Index Indoor Indeed70 35 47 44
  • 5. B-Tree is Great for Single and Range Based Lookups GOOD BAD WHERE id = 5 WHERE id > 7 WHERE id < 5 WHERE id BETWEEN 6 AND 8 WHERE id <> 8 Another way to think about it: If you had a physical copy of a dictionary it would be easy for you to find words that start with “ch”, but would be very hard (involve reading every page) to find words that have a “ch” in the middle or end. O(lg(n)) O(n)
  • 6. log(n) performance makes a huge difference with large datasets! n log(n) 10 1 100 2 1000 3 10000 4 100000 5 1000000 6
  • 8. WHERE id = 5 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 9. WHERE id > 7 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 10. WHERE id < 5 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 11. WHERE id BETWEEN 6 AND 8 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 12. Querying with a negation necessitates table scan BAD 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA WHERE id <> 8
  • 13. Querying based on a value not indexed results in a table scan BAD 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA WHERE unindexed_column = ‘uh oh’
  • 14. WHERE myFunction(id) > 10 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA BAD Querying based on a function output value is also not indexed and requires full index scan
  • 15. DEMO Uses dummy database from https://github.com/datacharmer/test_db You can try these out yourself! SQL commands in green Responses and/or times in gray (some parts omitted) Example: use employees; Database changed
  • 16. DEMO Two relevant tables we’ll be using: CREATE TABLE `employees` ( `emp_no` int(11) NOT NULL PRIMARY KEY, `birth_date` date NOT NULL, `first_name` varchar(14) NOT NULL, `last_name` varchar(16) NOT NULL, `gender` enum('M','F') NOT NULL, `hire_date` date NOT NULL ); CREATE TABLE `salaries` ( `emp_no` int(11) NOT NULL, `salary` int(11) NOT NULL, `from_date` date NOT NULL, `to_date` date NOT NULL, PRIMARY KEY (`emp_no`,`from_date`), CONSTRAINT `salaries_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ); Employees - 300,024 rows Salaries - 2,844,047 rows
  • 17. DEMO emp_no birth_date first_name last_name gender hire_date 10001 9/2/53 Georgi Facello M 6/26/86 10002 6/2/64 Bezalel Simmel F 11/21/85 10003 12/3/59 Parto Bamford M 8/28/86 emp_no salary from_date to_date 10001 85097 6/22/01 6/22/02 10001 88958 6/22/02 1/1/99 10002 65828 8/3/96 8/3/97 Employees Salaries
  • 18. DEMO SELECT * FROM employees WHERE emp_no = 20000; 1 row in set (0.01 sec) SELECT * FROM employees WHERE last_name = 'Matzke' AND first_name = 'Jenwei'; 1 row in set (0.10 sec) SELECT min(emp_no), max(emp_no) FROM salaries; 1 row in set (0.00 sec) SELECT min(salary), max(salary) FROM salaries; 1 row in set (0.85 sec) SELECT count(*) FROM salaries WHERE emp_no BETWEEN 20000 AND 25000; 1 row in set (0.03 sec) SELECT count(*) FROM salaries WHERE salary BETWEEN 70000 AND 75000; 1 row in set (0.59 sec)
  • 19. Why not index every column? VERY BAD You will be creating a spread out copy of your table, wasting space and memory. Each write (UPDATE, INSERT, DELETE) will require every index to be updated, Which makes writing a lot of work... CREATE INDEX (id) CREATE INDEX (make) CREATE INDEX (model) CREATE INDEX (type)Ω
  • 20. 2 1 Cessna 152 N 1 3 Cessna 172 S 2 Piper Archer II 4 Piper Dakota Null 2 3 4 Cessna Cessna Piper 172 152 172 Archer Dakota II NULL II N S Note: in reality secondary index leaf nodes are pointers to primary key not the records
  • 21. Compound Indexes ● Allow sorting based on multiple columns ● Store the data in the index 3 C 1 D DATA 2 A 3 C 4 E 3 F 4 B 4 E 5 A 5 B 6 C DATA DATA DATA DATA DATA DATA DATA DATA
  • 22. Subsets of compound Indexes can be used only from left to right GOOD BAD WHERE num = 4 WHERE num = 5 AND letter < ‘C’ WHERE letter = ‘C’ CREATE INDEX (num, letter)
  • 23. WHERE num = 4 3 C 1 D DATA 2 A 3 C 4 E 3 F 4 B 4 E 5 A 5 B 6 C DATA DATA DATA DATA DATA DATA DATA DATA
  • 24. WHERE num = 5 AND letter < ‘C’ 3 C 1 D DATA 2 A 3 C 4 E 3 F 4 B 4 E 5 A 5 B 6 C DATA DATA DATA DATA DATA DATA DATA DATA
  • 25. WHERE letter = ‘c’ 3 C 1 D DATA 2 A 3 C 4 E 3 F 4 B 4 E 5 A 5 B 6 C DATA DATA DATA DATA DATA DATA DATA DATA BAD
  • 26. If the data you need is in the index, then you don’t even have to fetch the rows! Its a “Covering Index” VERY GOOD SELECT letter WHERE num = 5 3 C 1 D DATA 2 A 3 C 4 E 3 F 4 B 4 E 5 A 5 B 6 C DATA DATA DATA DATA DATA DATA DATA DATA
  • 27. Why not use a compound index to cover your whole table? 2 Piper Archer II 1 Cessna 152 N VERY BAD You will just be creating a more complicated copy of your table. It probably won’t fit into memory so MySQL will still have to go to disk. 1 Cessna 152 N 2 Piper Archer II 3 Cessna 172 S 4 Piper Dakota NULL 2 Piper Archer II 3 Cessna 172 S 4 Piper Dakota Null CREATE INDEX (id, make, model, type)
  • 28. DEMO SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date < '1990-11-01'; 1 row in set (0.82 sec) CREATE INDEX ix_salaries_from_date ON salaries(from_date); Query OK, 0 rows affected (5.35 sec) SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date < '1990-11-01'; 1 row in set (0.04 sec) ALTER TABLE salaries DROP INDEX ix_salaries_from_date; Query OK, 0 rows affected (0.02 sec)
  • 29. DEMO CREATE INDEX ix_salaries_from_date_to_date ON salaries(from_date, to_date DESC); Query OK, 0 rows affected (5.65 sec) SELECT max(to_date) FROM salaries WHERE from_date >= '1990-10-01' AND from_date < '1990-11-01'; 1 row in set (0.01 sec) SELECT max(to_date) FROM salaries WHERE year(from_date) = 1990 AND month(from_date) = 10; 1 row in set (0.54 sec)
  • 30. Clustering places sequential data sequential on the disk, resulting in faster data retrieval as the disk has less places to go to get the data 3 6 9 1 2 3 4 5 6 7 8 9 10 11 DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA DATA
  • 31. DEMO We’ll create our own salaries table clustered on a numeric arbitrary id: CREATE TABLE salaries_unclustered ( `id` int(11) NOT NULL AUTO_INCREMENT, `emp_no` int(11) NOT NULL, `salary` int(11) NOT NULL, `from_date` date NOT NULL, `to_date` date NOT NULL, PRIMARY KEY pk_salaries_unclustered (id), UNIQUE KEY uq_salaries_unclustered_emp_from_date(`emp_no`,`from_date`), CONSTRAINT `salaries_unclustered_ibfk_1` FOREIGN KEY (`emp_no`) REFERENCES `employees` (`emp_no`) ); INSERT INTO salaries_unclustered (emp_no, salary, from_date, to_date) SELECT emp_no, salary, from_date, to_date FROM salaries ORDER BY rand();
  • 32. DEMO id emp_no salary from_date to_date 1 83580 6/3/46 1/7/00 1/6/01 2 478007 3/17/30 9/21/91 9/20/92 3 412998 11/27/15 9/1/99 8/31/00 … ... ... ... ... 396770 83580 4/30/10 1/8/93 1/8/94 Salaries Unclustered
  • 33. DEMO SELECT emp_no, salary, from_date, to_date FROM salaries WHERE emp_no BETWEEN 23456 AND 24467; 9601 rows in set (0.02 sec) SELECT emp_no, salary, from_date, to_date FROM salaries_unclustered WHERE emp_no BETWEEN 23456 AND 24467; 9601 rows in set (0.05 sec) UPDATE salaries SET salary = salary*2 WHERE emp_no BETWEEN 23456 AND 24467; Query OK, 9601 rows affected (0.05 sec) UPDATE salaries_unclustered SET salary = salary*2 WHERE emp_no BETWEEN 23456 AND 24467; Query OK, 9601 rows affected (0.12 sec) UPDATE salaries SET salary = salary/2 WHERE emp_no BETWEEN 23456 AND 24467; Query OK, 9601 rows affected (0.06 sec) UPDATE salaries_unclustered SET salary = salary/2 WHERE emp_no BETWEEN 23456 AND 24467; Query OK, 9601 rows affected (0.10 sec)
  • 35. How are Indexes Used? This could be a whole presentation... Rules of thumb: ● Indexes help with SELECT, GROUP, and ORDER in that order ● Indexes help with JOIN on the column being joined with ● Indexes only help when cardinality (uniqueness) is high ● Compound indexes should first have columns used by WHERE and then the columns used for ORDER ● Usually better to extend an existing index than create a new one ● Indexes are better when on smaller columns; ideally whole index fits in memory ○ Integers and dates (stored as integers under the hood) good ○ Floats and smaller character columns bad ○ Large character columns very bad
  • 36. DEMO SELECT count(distinct left(last_name, 1)) / count(*) AS s1, count(distinct left(last_name, 2)) / count(*) AS s2, count(distinct left(last_name, 3)) / count(*) AS s3, count(distinct left(last_name, 4)) / count(*) AS s4, count(distinct left(last_name, 5)) / count(*) AS s5, count(distinct left(last_name, 6)) / count(*) AS s6, count(distinct left(last_name, 7)) / count(*) AS s7, count(distinct left(last_name, 8)) / count(*) AS s8, count(distinct left(last_name, 9)) / count(*) AS s9, count(distinct left(last_name, 10)) / count(*) AS s10, count(distinct left(last_name, 11)) / count(*) AS s11, count(distinct left(last_name, 12)) / count(*) AS s12, count(distinct left(last_name, 13)) / count(*) AS s13, count(distinct left(last_name, 14)) / count(*) AS s14, count(distinct left(last_name, 15)) / count(*) AS s15, count(distinct left(last_name, 16)) / count(*) AS s16 FROM employees; Characters Selectivity Characters Selectivity 1 0.0001 9 0.0055 2 0.0007 10 0.0055 3 0.0028 11 0.0055 4 0.0045 12 0.0055 5 0.0051 13 0.0055 6 0.0054 14 0.0055 7 0.0054 15 0.0055 8 0.0054 16 0.0055 Note: This dummy data had lots of repeat last names. Typical text datasets (names, email) see selectivity approach 1.0 92% efficacy of full index achieved with just 5 characters 100% efficacy of full index achieved with 9 of the characters
  • 37. DEMO CREATE INDEX ix_employees_last_name ON employees (last_name); Query OK, 0 rows affected (0.72 sec) ANALYZE TABLE employees; 1 row in set (0.01 sec) SELECT stat_value FROM mysql.innodb_index_stats WHERE index_name = 'ix_employees_last_name' AND stat_name = 'size'; 1 row in set (0.01 sec) #result: 417 CREATE INDEX ix_employees_last_name_first5 ON employees (last_name(5)); Query OK, 0 rows affected (1.26 sec) ANALYZE TABLE employees; 1 row in set (0.01 sec) SELECT stat_value FROM mysql.innodb_index_stats WHERE index_name = 'ix_employees_last_name_first5' AND stat_name = 'size'; 1 row in set (0.01 sec) #result: 353
  • 38. Implicit Indexes ● Primary key is automatically indexed ● Left parts of compound indexes can be used ● All secondary indexes have id appended ● Foreign key is automatically indexed in parent and child tables (if no index present) ● Unique keys create index CREATE TABLE … CREATE INDEX (make, model) FOREIGN KEY(model) REFERENCES models (id) UNIQUE KEY (make, model, name) INDEX (id) INDEX (make, model, id) INDEX (make, id) INDEX (model, id) INDEX (make, model, name, id) INDEX (make, model, id) INDEX (make, id)
  • 39. Query Optimization This could also be a whole presentation... Rules of thumb: ● Create indexes based on how you know you will be querying the table ● Don’t add indexes “just in case” ● If your new query needs a new index, think if you can rewrite it so it does not, or if you can extend an existing index ● Need speed? Select only columns in covering index ● Long (> ~768) varchars, text, and blobs stored off table only select when needed ● Avoid nested subqueries ● Avoid functions in WHERE, GROUP BY, HAVING, and ORDER BY clauses ● Temporary tables (used by subqueries, and UNION) will force disk use if large ● Stuck? Run ANALYZE TABLE and then use EXPLAIN and dive in
  • 40. DEMO EXPLAIN SELECT * FROM employees WHERE last_name like 'a%'; #Index used EXPLAIN SELECT * FROM employees WHERE last_name like '%a'; #No indexes available EXPLAIN SELECT * FROM employees e JOIN salaries s ON e.emp_no = s.emp_no WHERE last_name LIKE 'a%'; #Indexed used for each table EXPLAIN SELECT * FROM employees e JOIN salaries s ON s.from_date < date_add(e.hire_date, INTERVAL 3 month) WHERE last_name LIKE 'a%'; #Index only used for first table