MySQL Indexes
Why use indexes?
Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in b-trees
B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential
access, insertions, and deletions in predictable time
B-tree
Time complexity:
Full table scan = O(n)
Using index = O(log(n))
Selectivity
Selectivity is the ratio of unique values within a certain column
The more unique the values, the higher the selectivity
The query engine likes highly selective key columns
The higher the selectivity, the faster the query engine can reduce the size of the
result set
Selectivity and Cardinality
Cardinality is number of unique values in the index.
In simple words:
Max cardinality: all values are unique
Min cardinality: all values are the same
Selectivity of index = cardinality/(number of records) * 100%
The perfect selectivity is 100%. Can be reached by unique indexes on NOT NULL columns.
Query optimization
The main idea is not to try to tune your database, but optimize
your query based on the data you have
Selectivity by example
Example:
Table of 10,000 rows with column `gender` (number of males ~ number of females)
Let’s count selectivity for the `gender` column
Selectivity = 2/10000 * 100% = 0.02% which is very low
When selectivity can be neglected
Selectivity can be neglected when values are distributed unevenly
Example:
If our query select rows with stat IN (0,1) then we can still use index.
As a general idea, we should create indexes on tables that are often queried for less than 15% of the
table's rows
How MySQL uses indexes
• Data Lookups
• Sorting
• Avoiding reading “data”
• Special Optimizations
Data Lookups
SELECT * FROM employees WHERE lastname=“Smith”
The classical use of index on (lastname)
Can use Multiple column indexes
SELECT * FROM employees WHERE lastname=“Smith” AND
dept=“accounting”
Use cases
Index (a,b,c) - order of columns matters
Will use Index for lookup (all listed keyparts)
a>5
a=5 AND b>6
a=5 AND b=6 AND c=7
a=5 AND b IN (2,3) AND c>5
Will NOT use Index
b>5 – Leading column is not referenced
b=6 AND c=7 - Leading column is not referenced
Will use Part of the index
The thing with ranges
MySQL will stop using key parts in multi part index as soon as
it met the real range (<,>, bETWEEN), it however is able to
continue using key parts further to the right if IN(…) range is
used
Sorting
SELECT * FROM players ORDER BY score DESC LIMIT 10
Will use index on SCORE column
Without index MySQL will do “filesort” (external sort) which is very expensive
Often Combined with using Index for lookup
SELECT * FROM players WHERE country=“US” ORDER BY score DESC LIMIT 10
Best served by Index on (country, score)
Use Cases
It becomes even more restricted!
KEY(a,b)
Will use Index for Sorting
ORDER BY a - sorting by leading column
a=5 ORDER BY b - EQ filtering by 1st and sorting by 2nd
ORDER BY a DESC, b DESC - Sorting by 2 columns in same order
a>5 ORDER BY a - Range on the column, sorting on the same
Will NOT use Index for Sorting
Sorting rules
You can’t sort in different order by 2 columns
You can only have Equality comparison (=) for columns which
are not part of ORDER BY
Not even IN() works in this case
Avoid reading the data
“Covering Index”
Applies to index use for specific query, not type of index.
Reading Index ONLY and not accessing the “data”
SELECT status FROM orders WHERE customer_id=123
KEY(customer_id, status)
Index is typically smaller than data
Access is a lot more sequential
Aggregation functions
Index help MIN()/MAX() aggregate functions
But only these
SELECT MAX(id) FROM table;
SELECT MAX(salary) FROM employee GROUP BY dept_id
Will benefit from (dept_id, salary) index
“Using index for group-by”
Joins
MySQL Performs Joins as “Nested Loops”
SELECT * FROM posts p, comments c WHERE p.author=“Peter” AND c.post_id=p.id
Scan table `posts` finding all posts which have Peter as an author
For every such post go to `comments` table to fetch all comments
Very important to have all JOINs Indexed
Index is only needed on table which is being looked up
The index on posts.id is not needed for this query performance
Multiple indexes
MySQL Can use More than one index
“Index Merge”
SELECT * FROM table WHERE a=5 AND b=6
Can often use Indexes on (a) and (b) separately
Index on (a,b) is much better
SELECT * FROM table WHERE a=5 OR b=6
2 separate indexes is as good as it gets
String indexes
There is no difference… really
Sort order is defined for strings (collation)
“AAAA” < “AAAB”
Prefix LIKE is a special type of Range
LIKE “ABC%” means
“ABC[LOWEST]”<KEY<“ABC[HIGHEST]”
LIKE “%ABC” can’t be optimized by use of the index
Real case: Problem
Lets take example from real world (Voltu first page campaigns list)
Real case: Timing
Initially it was like 1m 20sec seconds to run for the first time
After mysql cached the response, it was about 20sec
Real case: Query
SELECT wk2_campaign.*,
wk2_campaignGroup.category_id as group_category_id,
wk2_campaignGroup.subcategory_id as group_subcategory_id,
wk2_campaignGroup.summary as group_summary,
IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) category_id
FROM `wk2_campaign`
LEFT JOIN wk2_resource_status ON( wk2_resource_status.id = wk2_campaign.CaID)
LEFT JOIN campaign_has_group ON( wk2_campaign.CaID = campaign_has_group.campaign_id)
LEFT JOIN wk2_campaignGroup ON( campaign_has_group.campaign_group_id = wk2_campaignGroup.GrID)
LEFT JOIN si_private_campaigns pc ON( pc.campaign_id = wk2_campaign.CaID)
WHERE
(wk2_campaign.tracking_active = '1') AND
((IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) IS NOT NULL)
AND (IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) NOT IN(SELECT id FROM campaign_categories WHERE name IN(
'Mobile Content Subscription'
)))
AND(countries REGEXP 'US')) AND(
((wk2_campaign.stat IN('0', '1')) AND(
wk2_resource_status.resource_type =
'ca') AND(
wk2_resource_status.status =
'1') AND(wk2_campaign.access !=
'0') AND(wk2_campaign.external_id IS NULL) AND(
wk2_campaign.name IS NOT NULL
) AND(wk2_campaign.countries IS NOT NULL) AND(
trim(wk2_campaign.countries) IS NOT NULL
)) OR(pc.campaign_id IS NOT NULL)
);
Steps to optimize
1. Add missing indexes for the joined tables
2. Check the selectivity for different columns of the main table wk2_campaign
The `tracking_active`, `stat` columns have the best selectivity (the low number
of possible values) which can be indexed fast and boost query response time.
Steps to optimize
3. Add index on these columns:
ALTER TABLE wk2_campaign ADD INDEX(tracking_active, stat);
4. We needed just to move some conditions so that they would fit the index
Result of optimization
With these manipulations we made the query use only indexes
The explain select of this query:
Query run before after Performance
increase
First time 1m 20s 0m 2s 4000%
Subsequent (cached by
mysql)
20s 0.26s 7692%
Another example with “or”
Before
SELECT `wk2_campaign`.*
FROM `wk2_campaign`
LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid)
WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') OR mobile_app_id LIKE '%buscape%' OR caid in
('89630','89632');
130 rows in set (7.43 sec)
After
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%')
UNION
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE mobile_app_id LIKE '%buscape%'
UNION
SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id =
caid) WHERE caid in ('89630','89632');
130 rows in set (4.12 sec)
> SELECT text
FROM questions
LIMIT 5;
> EXPLAIN

MySQL Indexes

  • 1.
  • 3.
    Why use indexes? MostMySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in b-trees B-tree is a self-balancing tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in predictable time
  • 4.
    B-tree Time complexity: Full tablescan = O(n) Using index = O(log(n))
  • 5.
    Selectivity Selectivity is theratio of unique values within a certain column The more unique the values, the higher the selectivity The query engine likes highly selective key columns The higher the selectivity, the faster the query engine can reduce the size of the result set
  • 6.
    Selectivity and Cardinality Cardinalityis number of unique values in the index. In simple words: Max cardinality: all values are unique Min cardinality: all values are the same Selectivity of index = cardinality/(number of records) * 100% The perfect selectivity is 100%. Can be reached by unique indexes on NOT NULL columns.
  • 7.
    Query optimization The mainidea is not to try to tune your database, but optimize your query based on the data you have
  • 8.
    Selectivity by example Example: Tableof 10,000 rows with column `gender` (number of males ~ number of females) Let’s count selectivity for the `gender` column Selectivity = 2/10000 * 100% = 0.02% which is very low
  • 9.
    When selectivity canbe neglected Selectivity can be neglected when values are distributed unevenly Example: If our query select rows with stat IN (0,1) then we can still use index. As a general idea, we should create indexes on tables that are often queried for less than 15% of the table's rows
  • 10.
    How MySQL usesindexes • Data Lookups • Sorting • Avoiding reading “data” • Special Optimizations
  • 11.
    Data Lookups SELECT *FROM employees WHERE lastname=“Smith” The classical use of index on (lastname) Can use Multiple column indexes SELECT * FROM employees WHERE lastname=“Smith” AND dept=“accounting”
  • 12.
    Use cases Index (a,b,c)- order of columns matters Will use Index for lookup (all listed keyparts) a>5 a=5 AND b>6 a=5 AND b=6 AND c=7 a=5 AND b IN (2,3) AND c>5 Will NOT use Index b>5 – Leading column is not referenced b=6 AND c=7 - Leading column is not referenced Will use Part of the index
  • 13.
    The thing withranges MySQL will stop using key parts in multi part index as soon as it met the real range (<,>, bETWEEN), it however is able to continue using key parts further to the right if IN(…) range is used
  • 14.
    Sorting SELECT * FROMplayers ORDER BY score DESC LIMIT 10 Will use index on SCORE column Without index MySQL will do “filesort” (external sort) which is very expensive Often Combined with using Index for lookup SELECT * FROM players WHERE country=“US” ORDER BY score DESC LIMIT 10 Best served by Index on (country, score)
  • 15.
    Use Cases It becomeseven more restricted! KEY(a,b) Will use Index for Sorting ORDER BY a - sorting by leading column a=5 ORDER BY b - EQ filtering by 1st and sorting by 2nd ORDER BY a DESC, b DESC - Sorting by 2 columns in same order a>5 ORDER BY a - Range on the column, sorting on the same Will NOT use Index for Sorting
  • 16.
    Sorting rules You can’tsort in different order by 2 columns You can only have Equality comparison (=) for columns which are not part of ORDER BY Not even IN() works in this case
  • 17.
    Avoid reading thedata “Covering Index” Applies to index use for specific query, not type of index. Reading Index ONLY and not accessing the “data” SELECT status FROM orders WHERE customer_id=123 KEY(customer_id, status) Index is typically smaller than data Access is a lot more sequential
  • 18.
    Aggregation functions Index helpMIN()/MAX() aggregate functions But only these SELECT MAX(id) FROM table; SELECT MAX(salary) FROM employee GROUP BY dept_id Will benefit from (dept_id, salary) index “Using index for group-by”
  • 19.
    Joins MySQL Performs Joinsas “Nested Loops” SELECT * FROM posts p, comments c WHERE p.author=“Peter” AND c.post_id=p.id Scan table `posts` finding all posts which have Peter as an author For every such post go to `comments` table to fetch all comments Very important to have all JOINs Indexed Index is only needed on table which is being looked up The index on posts.id is not needed for this query performance
  • 20.
    Multiple indexes MySQL Canuse More than one index “Index Merge” SELECT * FROM table WHERE a=5 AND b=6 Can often use Indexes on (a) and (b) separately Index on (a,b) is much better SELECT * FROM table WHERE a=5 OR b=6 2 separate indexes is as good as it gets
  • 21.
    String indexes There isno difference… really Sort order is defined for strings (collation) “AAAA” < “AAAB” Prefix LIKE is a special type of Range LIKE “ABC%” means “ABC[LOWEST]”<KEY<“ABC[HIGHEST]” LIKE “%ABC” can’t be optimized by use of the index
  • 22.
    Real case: Problem Letstake example from real world (Voltu first page campaigns list)
  • 23.
    Real case: Timing Initiallyit was like 1m 20sec seconds to run for the first time After mysql cached the response, it was about 20sec
  • 24.
    Real case: Query SELECTwk2_campaign.*, wk2_campaignGroup.category_id as group_category_id, wk2_campaignGroup.subcategory_id as group_subcategory_id, wk2_campaignGroup.summary as group_summary, IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) category_id FROM `wk2_campaign` LEFT JOIN wk2_resource_status ON( wk2_resource_status.id = wk2_campaign.CaID) LEFT JOIN campaign_has_group ON( wk2_campaign.CaID = campaign_has_group.campaign_id) LEFT JOIN wk2_campaignGroup ON( campaign_has_group.campaign_group_id = wk2_campaignGroup.GrID) LEFT JOIN si_private_campaigns pc ON( pc.campaign_id = wk2_campaign.CaID) WHERE (wk2_campaign.tracking_active = '1') AND ((IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) IS NOT NULL) AND (IFNULL(wk2_campaign.category_id, wk2_campaignGroup.category_id) NOT IN(SELECT id FROM campaign_categories WHERE name IN( 'Mobile Content Subscription' ))) AND(countries REGEXP 'US')) AND( ((wk2_campaign.stat IN('0', '1')) AND( wk2_resource_status.resource_type = 'ca') AND( wk2_resource_status.status = '1') AND(wk2_campaign.access != '0') AND(wk2_campaign.external_id IS NULL) AND( wk2_campaign.name IS NOT NULL ) AND(wk2_campaign.countries IS NOT NULL) AND( trim(wk2_campaign.countries) IS NOT NULL )) OR(pc.campaign_id IS NOT NULL) );
  • 25.
    Steps to optimize 1.Add missing indexes for the joined tables 2. Check the selectivity for different columns of the main table wk2_campaign The `tracking_active`, `stat` columns have the best selectivity (the low number of possible values) which can be indexed fast and boost query response time.
  • 26.
    Steps to optimize 3.Add index on these columns: ALTER TABLE wk2_campaign ADD INDEX(tracking_active, stat); 4. We needed just to move some conditions so that they would fit the index
  • 27.
    Result of optimization Withthese manipulations we made the query use only indexes The explain select of this query: Query run before after Performance increase First time 1m 20s 0m 2s 4000% Subsequent (cached by mysql) 20s 0.26s 7692%
  • 28.
    Another example with“or” Before SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') OR mobile_app_id LIKE '%buscape%' OR caid in ('89630','89632'); 130 rows in set (7.43 sec) After SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE (name LIKE '%buscape%' OR caid LIKE 'buscape%') UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE mobile_app_id LIKE '%buscape%' UNION SELECT `wk2_campaign`.* FROM `wk2_campaign` LEFT JOIN campaign_summary ON (campaign_summary.campaign_id = caid) WHERE caid in ('89630','89632'); 130 rows in set (4.12 sec)
  • 29.
    > SELECT text FROMquestions LIMIT 5; > EXPLAIN